Skip to content

Conversation

@ClaytonPassmore
Copy link

@ClaytonPassmore ClaytonPassmore commented Nov 30, 2025

What

This change adds support for multi-column cursors.

Fixes #1226

Moving forward, encode cursors as JSON for new runs. Existing runs will continue to use the old cursor encoding mechanism, which was purely taking the cursor value and coercing it to a string via ActiveRecord.

Upon resuming a job, JSON cursors from the run model are parsed before being handed back to the enumerator.

How

This is accomplished by tracking how the cursor is encoded on the run model via a new boolean column, cursor_is_json.

For existing runs, this value will be false-y and for new runs, we're automatically setting it to true via an after_initialize callback.

Why

Multi-column cursors can occur when:

  • A task specifies multiple fields via #cursor_columns, or
  • Iterating over an ActiveRecord collection that has a composite primary key

Since #cursor_columns is a documented feature, it seems like there's some impetus to support multi-column cursor values.

Rails added support for models with composite primary keys back in 7.1. As someone that works on a multi-tenant app, being able to iterate over models with composite primary keys is a very important feature 😄

I just added support for batching over ActiveRecord relations with composite primary keys in the job-iteration gem (see Shopify/job-iteration#650). If we can get multi-column support in the maintenance_tasks gem, then it should "just work" with models that use composite primary keys once that job iteration change lands.

Risks

This change could introduce issues for cursor values that are not serializable as JSON, or that lose data/precision after going through the encode/decode process.

If this risk is a deal breaker, I have a few ideas to mitigate it:

  1. Add a configuration option to the gem that allows apps to opt-in to JSON-encoded cursors.
  2. Do something similar to what the Sidekiq gem did when it started to enforce that job arguments must be serializable as JSON - add in some code that checks the cursor value against the result of it being encoded and decoded. If the values are different, raise an exception.

If you want me to implement these mitigations (or others), let me know!

It's worth noting that this change increases the risk of producing a cursor value that is too large to fit into the cursor string column on the maintenance_tasks_runs table. All databases are different, but I believe that databases like MySQL have a hard character limit on string columns. By supporting multi-column cursors, we're enabling larger cursor values to be produced. There are probably going to be edge cases where a multi-column cursor value will not fit into a 255 character column.

I don't have any great ideas to mitigate this risk other than (once again) requiring apps to opt-in to JSON-encoded cursors via a configuration option. An opt-in pattern would make it easier to educate people about the risks and caveats before they enable it.

Moving forward, encode cursors as JSON for new runs. Existing runs will
continue to use the old cursor encoding mechanism, which was purely
taking the cursor value and coercing it to a string via ActiveRecord.

This is accomplished by tracking how the cursor is encoded on the run
model via a new boolean column, `cursor_is_json`. This value will be
false-y for existing runs.

This change adds support for multi-column cursors, which can occur when:

* A task specifies multiple `cursor_columns`, or
* When iterating over an ActiveRecord collection that has a multi-column
  primary key

Fixes Shopify#1226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cursor is not deserialized correctly with multiple columns

1 participant