-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add builder to help create Schemas for shredding (ShreddedSchemaBuilder)
#8940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
klion26
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, LGTM, left some nit comments to be considered.
| if child_fields.is_empty() { | ||
| None | ||
| } else { | ||
| Some(DataType::Struct(Fields::from(child_fields))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we can construct DataType::Struct and primitive types (use "" as the path), and no List types, not sure if we need to add the information struct information in the builder name or anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean by this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ShredTypeBuilder is primarily used for building Struct data types. Since Variants can be shredded as either Structs or Arrays, the previous comment raises the question of whether we should inform users that this cannot be used to build Array schemas.
| } | ||
|
|
||
| #[test] | ||
| fn test_variant_schema_builder_conflicting_path() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests!
|
FYI @mhilton -- this PR may be interesting to you |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @klion26 and @XiangpengHao -- this is great
The only thing I think we should consider is supporting FieldRefs directly rather than assuming DataType / nullable fields.
I left some other comments that might be useful but are not necessary in my opinion
| if child_fields.is_empty() { | ||
| None | ||
| } else { | ||
| Some(DataType::Struct(Fields::from(child_fields))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean by this
| Field::new("age", DataType::Int64, true), | ||
| ])); | ||
| let schema2 = ShredTypeBuilder::default() | ||
| .with_path("id", &DataType::Int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this certainly is much nicer now
ShredTypeBuilderShredTypeBuilder)
|
FYI @scovich in case you have seen similar APIs or have ideas |
mhilton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice if this builder had a similar structure to VariantBuilder with new_object and new_list. A list can only have one type so it wouldn't work exactly the same though. However such an API and the proposed API aren't mutually exclusive.
|
|
||
| /// Internal tree node structure for building variant schemas. | ||
| #[derive(Clone)] | ||
| enum VariantSchemaNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if this type also supported an Array type too. Although the path syntax would need to be extended a little. Possibly you could use .0 to indicate an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrow List is not supported by shred_variant yet, maybe we can wait until array shredding is implemented
|
Traveling this week but would love to take a look when I get a chance, hopefully next week |
ShredTypeBuilder)ShreddedSchemaBuilder)
Rationale for this change
Basically a helper to simplify this:
What changes are included in this PR?
ShredTypeBuilderAre these changes tested?
Yes
Are there any user-facing changes?
Add a new public interface