Skip to content

Conversation

@XiangpengHao
Copy link
Contributor

@XiangpengHao XiangpengHao commented Dec 1, 2025

Rationale for this change

Basically a helper to simplify this:

let shredding_type = ShredTypeBuilder::default()
    .with_path("a", &DataType::Int64)
    .with_path("b.c", &DataType::Utf8)
    .with_path("b.d", &DataType::Float64)
    .build();

assert_eq!(
    shredding_type,
    DataType::Struct(Fields::from(vec![
        Field::new("a", DataType::Int64, true),
        Field::new(
            "b",
            DataType::Struct(Fields::from(vec![
                Field::new("c", DataType::Utf8, true),
                Field::new("d", DataType::Float64, true),
            ])),
            true
        ),
    ]))
);

What changes are included in this PR?

  1. Added ShredTypeBuilder
  2. Updated existing tests cases to use this new primitive

Are these changes tested?

Yes

Are there any user-facing changes?

Add a new public interface

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Dec 1, 2025
Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, LGTM, left some nit comments to be considered.

if child_fields.is_empty() {
None
} else {
Some(DataType::Struct(Fields::from(child_fields)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we can construct DataType::Struct and primitive types (use "" as the path), and no List types, not sure if we need to add the information struct information in the builder name or anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean by this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShredTypeBuilder is primarily used for building Struct data types. Since Variants can be shredded as either Structs or Arrays, the previous comment raises the question of whether we should inform users that this cannot be used to build Array schemas.

}

#[test]
fn test_variant_schema_builder_conflicting_path() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests!

@alamb
Copy link
Contributor

alamb commented Dec 3, 2025

FYI @mhilton -- this PR may be interesting to you

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klion26 and @XiangpengHao -- this is great

The only thing I think we should consider is supporting FieldRefs directly rather than assuming DataType / nullable fields.

I left some other comments that might be useful but are not necessary in my opinion

if child_fields.is_empty() {
None
} else {
Some(DataType::Struct(Fields::from(child_fields)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean by this

Field::new("age", DataType::Int64, true),
]));
let schema2 = ShredTypeBuilder::default()
.with_path("id", &DataType::Int32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this certainly is much nicer now

@alamb alamb changed the title Add variant ShredTypeBuilder Add builder to help create Schemas for shredding (ShredTypeBuilder) Dec 3, 2025
@alamb
Copy link
Contributor

alamb commented Dec 3, 2025

FYI @scovich in case you have seen similar APIs or have ideas

Copy link
Contributor

@mhilton mhilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice if this builder had a similar structure to VariantBuilder with new_object and new_list. A list can only have one type so it wouldn't work exactly the same though. However such an API and the proposed API aren't mutually exclusive.


/// Internal tree node structure for building variant schemas.
#[derive(Clone)]
enum VariantSchemaNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if this type also supported an Array type too. Although the path syntax would need to be extended a little. Possibly you could use .0 to indicate an array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow List is not supported by shred_variant yet, maybe we can wait until array shredding is implemented

@scovich
Copy link
Contributor

scovich commented Dec 4, 2025

Traveling this week but would love to take a look when I get a chance, hopefully next week

@XiangpengHao XiangpengHao changed the title Add builder to help create Schemas for shredding (ShredTypeBuilder) Add builder to help create Schemas for shredding (ShreddedSchemaBuilder) Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] easier way to construct a shredded schema

5 participants