Skip to content

Conversation

@jeroko
Copy link

@jeroko jeroko commented Oct 27, 2025

Rationale for this change

Closes #2131

The PR relaxes the constraint that prevented adding any file with field IDs, and replaces it with a constraint that prevents adding files which contain field IDs that are inconsistent with the field IDs of the table. If the field IDs are compatible, then they can be added safely, if not, they will be rejected.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@jeroko jeroko force-pushed the remove-field_id-constraint-on-add_files branch 2 times, most recently from 0b599c6 to d580102 Compare October 27, 2025 14:00
@jeroko jeroko force-pushed the remove-field_id-constraint-on-add_files branch from d580102 to 1addf60 Compare October 27, 2025 14:31
@jeroko jeroko marked this pull request as ready for review October 27, 2025 14:57
Comment on lines 2637 to 2638
requested_id_to_name = requested_schema._lazy_id_to_name
provided_id_to_name = provided_schema._lazy_id_to_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jeroko Thanks for working on this, and adding this check.

However, I don't think we really care about the names; it is not a problem when they differ. However, if you add a file with a different schema, we can brick the table because of issues in the types. Should we check if the file contains the expected type for each of the IDs instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko Right, we should not care about the names if the IDs are provided, and the mapping between the IDs and the types was already checked in the call to _check_schema_compatible at the end of this function. In that case I didn't really need to add any extra check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add files support for parquet field_ids

2 participants