Support multi-column cursors #1333
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
This change adds support for multi-column cursors.
Fixes #1226
Moving forward, encode cursors as JSON for new runs. Existing runs will continue to use the old cursor encoding mechanism, which was purely taking the cursor value and coercing it to a string via ActiveRecord.
Upon resuming a job, JSON cursors from the run model are parsed before being handed back to the enumerator.
How
This is accomplished by tracking how the cursor is encoded on the run model via a new boolean column,
cursor_is_json.For existing runs, this value will be false-y and for new runs, we're automatically setting it to
truevia anafter_initializecallback.Why
Multi-column cursors can occur when:
#cursor_columns, orSince
#cursor_columnsis a documented feature, it seems like there's some impetus to support multi-column cursor values.Rails added support for models with composite primary keys back in 7.1. As someone that works on a multi-tenant app, being able to iterate over models with composite primary keys is a very important feature 😄
I just added support for batching over ActiveRecord relations with composite primary keys in the
job-iterationgem (see Shopify/job-iteration#650). If we can get multi-column support in themaintenance_tasksgem, then it should "just work" with models that use composite primary keys once that job iteration change lands.Risks
This change could introduce issues for cursor values that are not serializable as JSON, or that lose data/precision after going through the encode/decode process.
If this risk is a deal breaker, I have a few ideas to mitigate it:
If you want me to implement these mitigations (or others), let me know!
It's worth noting that this change increases the risk of producing a cursor value that is too large to fit into the
cursorstring column on themaintenance_tasks_runstable. All databases are different, but I believe that databases like MySQL have a hard character limit on string columns. By supporting multi-column cursors, we're enabling larger cursor values to be produced. There are probably going to be edge cases where a multi-column cursor value will not fit into a 255 character column.I don't have any great ideas to mitigate this risk other than (once again) requiring apps to opt-in to JSON-encoded cursors via a configuration option. An opt-in pattern would make it easier to educate people about the risks and caveats before they enable it.