-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add force_filter_selections to restore pushdown_filters behavior prior to parquet 57.1.0 upgrade
#19003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /// pushdown_filters is enabled. If false, the reader will automatically | ||
| /// choose between a RowSelection and a Bitmap based on the number and | ||
| /// pattern of selected rows. | ||
| pub force_filter_selections: bool, default = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rluvaton suggests in #18820 (comment):
What do you think of making it an enum instead to allow for future additions without breaking changes?
(that enum should also be non exhaustive to avoid adding a variant a breaking change)
I also see that the
with_row_selection_policyalready accept enum.making it an enum also allow to force mask or configure the threshold in the auto policy. this is also useful for testing to force specific path when creating a reproduction test for a bug
I personally think it is better as a flag (escape valve) as I don't forsee any reason to try and tune the parameters but would be happy to hear other opinions
6ae5d68 to
e5ef31f
Compare
| // reads more than necessary from the cache as then another bitmap is applied | ||
| // See https://github.com/apache/datafusion/pull/18820 for setting and workaround | ||
| expected_records: 7, | ||
| expected_records: 7, // reads more than necessary from the cache as then another bitmap is applied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test shows the change in behavior after the parquet 57.1.0 upgrade. The previous result with 57.0.0 was 4
The old result can now be obtained by setting force_filter_selections to true
force_filter_selections to restore pushdown_filters behaviorforce_filter_selections to restore pushdown_filters behavior prior to parquet 57.1.0 upgrade
rluvaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Draft until #18820 is mergedWhich issue does this PR close?
arrow,parquetto57.1.0#18820Rationale for this change
The parquet 57.1.0 upgrade includes a new adaptive filter from @hhhizzz :
Our testing shows this is faster in all cases, but I want to have an escape valve for people to turn it off if they hit some issue.
I had originally included this in #18820 but @rluvaton suggested it would be easier to understand as its own PR in #18820 (review)
What changes are included in this PR?
force_filter_selectionsconfig settingAre these changes tested?
Yes
Are there any user-facing changes?
A new boolean flag