Skip to content

Conversation

@cravani
Copy link
Contributor

@cravani cravani commented Nov 19, 2025

Rationale for this change

When reading Parquet files with large metadata (e.g., files with thousands of columns), the default Thrift message size limit can be insufficient, causing TTransportException: Message size exceeds limit errors. Currently, the Thrift protocol configuration uses default max message size (100MB), preventing users from reading files with exceptionally large metadata footers.

What changes are included in this PR?

Add a new configuration key: parquet.thrift.string.size.limit
Default value: 100 MB (104857600 bytes)
Allow users to override this via Configuration

Are these changes tested?

Yes and added a Test case TestParquetFileReaderMaxMessageSize.java

Are there any user-facing changes?

Not by default, user can set config parquet.thrift.string.size.limit= to increase it based on need.

Closes #GH-3358

@cravani cravani force-pushed the GH-3358 branch 3 times, most recently from e8de6b5 to 2ea4f50 Compare November 19, 2025 18:20
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me 👍

@Fokko Fokko merged commit ee2c751 into apache:master Nov 25, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants