Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 30, 2025

Draft until #18820 is merged

Which issue does this PR close?

Rationale for this change

The parquet 57.1.0 upgrade includes a new adaptive filter from @hhhizzz :

Our testing shows this is faster in all cases, but I want to have an escape valve for people to turn it off if they hit some issue.

I had originally included this in #18820 but @rluvaton suggested it would be easier to understand as its own PR in #18820 (review)

What changes are included in this PR?

  1. Add a force_filter_selections config setting
  2. Add configuration guide
  3. Add tests

Are these changes tested?

Yes

Are there any user-facing changes?

A new boolean flag

@github-actions github-actions bot added documentation Improvements or additions to documentation optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate proto Related to proto crate datasource Changes to the datasource crate labels Nov 30, 2025
/// pushdown_filters is enabled. If false, the reader will automatically
/// choose between a RowSelection and a Bitmap based on the number and
/// pattern of selected rows.
pub force_filter_selections: bool, default = false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rluvaton suggests in #18820 (comment):

What do you think of making it an enum instead to allow for future additions without breaking changes?

(that enum should also be non exhaustive to avoid adding a variant a breaking change)

I also see that the with_row_selection_policy already accept enum.

making it an enum also allow to force mask or configure the threshold in the auto policy. this is also useful for testing to force specific path when creating a reproduction test for a bug

I personally think it is better as a flag (escape valve) as I don't forsee any reason to try and tune the parameters but would be happy to hear other opinions

@alamb alamb force-pushed the alamb/new_arrow_config_setting branch from 6ae5d68 to e5ef31f Compare December 1, 2025 12:03
@github-actions github-actions bot removed the optimizer Optimizer rules label Dec 1, 2025
@alamb alamb marked this pull request as ready for review December 1, 2025 12:04
// reads more than necessary from the cache as then another bitmap is applied
// See https://github.com/apache/datafusion/pull/18820 for setting and workaround
expected_records: 7,
expected_records: 7, // reads more than necessary from the cache as then another bitmap is applied
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test shows the change in behavior after the parquet 57.1.0 upgrade. The previous result with 57.0.0 was 4

The old result can now be obtained by setting force_filter_selections to true

@alamb alamb changed the title Add force_filter_selections to restore pushdown_filters behavior Add force_filter_selections to restore pushdown_filters behavior prior to parquet 57.1.0 upgrade Dec 1, 2025
Copy link
Member

@rluvaton rluvaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants