Skip to content

Conversation

@bkirwi
Copy link
Contributor

@bkirwi bkirwi commented Nov 4, 2025

This PR:

  • changes the semantics of the Persist shard's version field... instead of indicating the highest version that has ever touched a bit of state, it indicates the lowest version which is guaranteed to be compatible with the existing state.
  • changes the version checks: we now disallow versions from the future, and cap the number of versions from the past we will maintain write compatibility with.
  • moves the version field into StateCollections, to make it available to individual Persist commands. This will allow future Persist changes to check the version before making changes, which is essential for flexible backwards compatibility.
  • adds a new command to upgrade the compatibility version of the shard. This is not yet invoked everywhere it should be; I intend to add the full set of usages in a followup PR.
  • adds metrics that report whether we're interacting with an older version of state or not, which (among other things) will help me track whether we're reliably upgrading shards or not.

Motivation

https://github.com/MaterializeInc/database-issues/issues/9870

@bkirwi bkirwi force-pushed the versioning branch 10 times, most recently from 751819c to e3a343c Compare November 6, 2025 23:02
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, at least as far as I'm grokking the persist details. There is still the piece missing where we call upgrade_version on all existing shards in the environment during leader startup, right?

@def-
Copy link
Contributor

def- commented Nov 7, 2025

I think this Cargo test failure might be caused by this PR? At least I haven't seen it before: https://buildkite.com/materialize/test/builds/111438:

        FAIL [   4.451s] mz-catalog::open test_persist_open
thread 'test_persist_open' panicked at src/catalog/tests/open.rs:549:9:
assertion `left == right` failed

@bkirwi
Copy link
Contributor Author

bkirwi commented Nov 7, 2025

I think this Cargo test failure might be caused by this PR?

Probably related and probably not concerning - looks like a snapshot test that doesn't fully reflect the state of the merged branch - but thanks for linking; will keep an eye on it!

A downside of repurposing the applier_version as the actual state
version is that we can no longer determine what the actual version of
the code was that made the change withought a serious investigation.
Adding it to this string means it will be included in various places in
state.
@bkirwi bkirwi changed the title [persist] Versioning [persist] Compatibility versioning Nov 7, 2025
@bkirwi bkirwi marked this pull request as ready for review November 7, 2025 22:24
@bkirwi bkirwi requested review from a team as code owners November 7, 2025 22:24
@bkirwi bkirwi requested a review from aljoscha November 7, 2025 22:24
@bkirwi
Copy link
Contributor Author

bkirwi commented Nov 7, 2025

This doesn't yet include all the necessary upgrade_version calls, but it seems to contain enough to make CI happy. Hunting down everywhere that we open a shard may take a while, though, so I'd love to get the first batch merged and unblock some other folks.

@bkirwi bkirwi requested a review from DAlperin November 7, 2025 22:26
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Might want a second review from someone with more persist knowledge.

Copy link
Member

@DAlperin DAlperin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine to me. I have one question and I don't think we are using the new apis everywhere we need to but you mentioned thats for a followup PR so I'm fine with that. These are exciting changes!

Comment on lines +397 to +398
// FIXME: this passes the state version but the method requires the build version.
let diff = StateDiff::<T>::decode(&self.collections.version, data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we pass the applier_version here? That is set from the build version, no? In many cases they will be equal but will diverge during upgrades.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a field move - can be seen most clearly in the individual commit.

(And yeah, generally the applier version would match the build version, but not always - eg. if we've just loaded up a state from an old version and haven't modified it yet. I will add fixing this fixme to the list of things I'll try and get in for Thursday...)

@bkirwi bkirwi merged commit c0558e4 into MaterializeInc:main Nov 10, 2025
129 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants