Batch filtering and resampling #270
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
This PR implements batch filtering and resampling. I'm making this PR because I have been using these features in my own work for some time and thought they would be useful to others. Also, filtering has been requested in #158 and #162.
In both operations, the user provides a function that either accepts/rejects a batch (for filtering) or assign a sample weight for each batch (for resampling). Both functions take the dataset and the dict of slice objects, so the user can write those functions strategically to minimize computation on dask arrays. The changes are all in
BatchGeneratorsince it seemed likeBatchSchemais primarily intended as a representation of windowing parameters.I was not able to get the
asvtests to work on my development environment, but there is no change to the original behavior ifresample_fnandfilter_fnare not provided so I do not expect there to be a performance penalty. Filtering and resampling happen independently, but you could approximate doing both in one shot by havingresample_fnreturn 0 for invalid batches. That would be a little faster than "checking" each batch twice in two separate functions.