Skip to content

Conversation

@antiguru
Copy link
Member

@antiguru antiguru commented Nov 13, 2025

Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
@bkirwi
Copy link
Contributor

bkirwi commented Nov 13, 2025

Overall looks great! Things I'm wondering:

  • Can we specify the intended semantics of an altered materialized view / in terms of the formalism?
  • How are GIDs handled? (Once a MV is altered, does it maintain the old or new GID, or both so we can separately refer to the collection and the dataflow?)

I can somewhat reverse engineer these from the draft PR etc. but it'd be nice to have them documented here.

Signed-off-by: Moritz Hoffmann <[email protected]>
@ggevay
Copy link
Contributor

ggevay commented Nov 13, 2025

I have a question that explores the design space a bit: Instead of a "replacement materialized view" being its own separate first class concept in the catalog, could we just make the replacement (almost) the same thing as a normal materialized view, and let the interesting action happen only at the ALTER MATERIALIZED VIEW mv_name APPLY REPLACEMENT replacement_name? So, the first command that the user runs would be instead of
CREATE REPLACEMENT replacement_name FOR MATERIALIZED VIEW mv_name AS SELECT ...,
just
CREATE MATERIALIZED VIEW replacement_name AS SELECT ...
And when it's hydrated, then
ALTER MATERIALIZED VIEW mv_name APPLY REPLACEMENT replacement_name
(as in the current design).

This would have the advantage that less new commands would be needed:

  • Instead of DROP REPLACEMENT replacement_name, the user would just use the existing DROP MATERIALIZED VIEW replacement_name
  • Instead of SHOW [REDACTED] CREATE REPLACEMENT <replacement_name>, the user would just use the existing SHOW [REDACTED] CREATE MATERIALIZED VIEW <replacement_name>
  • (We'd lose the functionality of SHOW REPLACEMENTS, though, because the replacement wouldn't be tied to the old MV in any way before the ALTER command is run by the user, i.e., the system wouldn't yet know that the new MV is intended as a replacement to that old MV.)

Also, regarding #34032 (comment) , the user could just use the already existing introspection to know "(a) when the replacement is hydrated and caught up and (b) how many resources it roughly requires compared to the old version". This means less implementation work, and maybe also less concept for the user to keep in mind.

Importantly, this could also eliminate a lot of the code duplication that is currently in #34032.

However, one reason for wanting to not treat the replacement as a completely normal materialized view is that maybe we want to do some schema compatibility validation already when creating it, so that the user is not surprised by a schema incompatibility when running the ALTER command. But this schema check could just be a lightweight thing added to the existing CREATE MATERIALIZED VIEW command. E.g., instead of the design doc's

CREATE REPLACEMENT replacement_name FOR MATERIALIZED VIEW mv_name AS SELECT ...

we'd have

CREATE MATERIALIZED VIEW replacement_name REPLACING mv_name AS SELECT ...

which would create replacement_name in almost the same way as a normal materialized view, except that it would do a schema compatibility check against mv_name during planning time. (With this, we could also recover the functionality of SHOW REPLACEMENTS, because the system could again know about replacement_name being intended as a replacement of mv_name.)

@ggevay
Copy link
Contributor

ggevay commented Nov 13, 2025

Two more thoughts:

Can the user SELECT from a replacement materialized view? With the above suggestion of making the replacement be an (almost) normal materialized view, we'd get this for free.

With the above suggestion of modeling the replacement as a normal materialized view, we'd have the danger that the user creates some further objects that depend on the replacement, before doing the ALTER. We'd probably want to disallow this: either make the ALTER fail if there are already dependants on the replacement, or if the CREATE MATERIALIZED VIEW command lets the system know that it's intended as a replacement for another MV, then don't let dependencies be created on it.

@antiguru
Copy link
Member Author

Instead of a "replacement materialized view" being its own separate first class concept in the catalog, could we just make the replacement (almost) the same thing as a normal materialized view, and let the interesting action happen only at the ALTER MATERIALIZED VIEW mv_name APPLY REPLACEMENT replacement_name?

Unfortunately, this doesn't work. I'll update the design to include a description why, but the gist is that a materialized view names a persist shard, which downstream objects read. If we create a new materialized view, we create a new shard. Then we have no logic that would take two shards, apply the updates from the other to the first, and cut over to the new writer.

The MVP currently uses the read-only mode for replacement MVs, so they read the shard, but do not write any updates. Only after applying the replacement, they start writing. This solves the problem of "writing to the same shard", but, as you point out, it's now not possible to query the replacement MV.

Signed-off-by: Moritz Hoffmann <[email protected]>
@ggevay
Copy link
Contributor

ggevay commented Nov 14, 2025

Then we have no logic that would take two shards, apply the updates from the other to the first, and cut over to the new writer.

How about we don't reconcile at just cutover time, but continuously channel data from the new shard to the old shard, with the same read-only trick as the current PR. That is, we'd render

CREATE MATERIALIZED VIEW <replacement_mv_name> REPLACES <mv_name> AS SELECT ...

as a normal MV dataflow + a small extra dataflow fragment that is reading the replacement MV's normal output shard and writing into the old MV's shard with the read-only trick at first, and then for real after the replacement happens. This would allow SELECTing from the replacement MV before the cutover (from the new shard), but would also allow downstream consumers of the old MV to keep consuming the old shard (even after cutover).

A downside of this approach would be that we'd have two shards associated with the replacement MV after the cutover happens. But maybe this is fine: any new reference to the MV could use the new shard, and only old dependant dataflows would keep consuming the old shard. So, most code that looks up the shard for an MV would not need to be modified.

Also, we'd have somewhat larger resource requirements than just writing to one shard, because we'd have two MV sinks instead of one. But MV sinks have significant resource requirements only at system restarts? If yes, that should be fine, because all these extra dataflow fragments could disappear at system restart, because then all downstream consumers could switch to consuming from the latest shard.

(One more complication with this approach would be that if an MV m1 is replaced by m2, and then m2 is later replaced by m3, then m3's dataflow needs to have the above-mentioned extra dataflow fragment two times: once for writing into m2's shard, and once for writing into m1's shard.)

The advantages of this approach (if it's feasible) could be

  • much less new Adapter code (due to not introducing "replacement materialized view" as a new top-level concept in the catalog),
  • less new commands/introspection for users to learn about,
  • the replacement would be readable already before cutover, so users can check correctness. (Btw., with the original design where users can't SELECT from the replacement before cutover, that is considered ok because users could still check correctness by running the replacement MV's definition as an ad hoc SELECT?)

Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
@jubrad
Copy link
Contributor

jubrad commented Nov 14, 2025

Can a user create multiple replacements at the same time?
If so how would we feel about s/replacement/version?

Could one create a replacement for sources tables? I'm assuming indexes wouldn't work as one would ideally create a replacement on a separate cluster.

@bkirwi
Copy link
Contributor

bkirwi commented Nov 14, 2025

Re: the above... I generally agree that it would be too painful to create a whole new shard and orchestrate a migration to it - the main value of alter-MV in my view is that it minimizes that sort of thing - but:

  • I think there may be value in treating a replacement MV as a "flavour" of materialized view instead of a totally separate concept. That has some of the benefits that Gábor mentions - minimal new syntax and catalog changes, less need for a bunch of new catalog collections - and it doesn't strike me as too hard to explain to users.
  • At any moment, even before cutover, the "correct" value for the replacement MV at some time is recoverable from the contents of the correction buffer plus the original shard at that time. So if we really wanted to support selecting from the candidate replacement MV, it's potentially feasible as long as we're running the query on the cluster that's building the replacement.
  • In the spirit of the principles and making things undoable, ALTER MATERIALIZED VIEW mv_name APPLY REPLACEMENT replacement_name is irrevocable which increases risk - if the new definition breaks some downstream we can't fix them without hydrating a new replacement with the old definition. But! You could imagine instead implementing a ALTER MATERIALIZED VIEW foo SWAP REPLACEMENT replacement_name which atomically promotes the replacement MV to write to the shard and disables writes from the old MV and turns it into a replacement MV; then, if it's bad for whatever reason, you could do another swap to undo your change and restore the old definition. (As long as the schema hasn't changed, I guess.)

@antiguru
Copy link
Member Author

Thanks for the feedback, this is super helpful. Keep it coming!

Can a user create multiple replacements at the same time?

As the MVP stands right now yes. I'm not sure what the right call here is, but there's no requirement from a correctness point to only have one replacement in flight.

If so how would we feel about s/replacement/version?

Could one create a replacement for sources tables? I'm assuming indexes wouldn't work as one would ideally create a replacement on a separate cluster.

Indexes wouldn't work because they're even tighter coupled to their definition than materialized views. For example, we might pick an index because some optimizer decision that's not trivial to understand to the user. Changing the index definition would be very surprising to all parties, so that's not something I'd consider a path worth pursuing (plus a whole lot of decidability problems on the way, such as determining nullability, Rice's theorem.)

A crucial property of a replacement is that it can self-correct to appear consistent with its new definition. This is possible for materialized views. I think the only other object where this would be true are upsert sources, since they need to remember the current state of the world, which we could diff against the expected state.

However, it might be easier to slot in materialized views if one wants to be able to pivot one source to another. For example to switch from one Postgres source to another, the user could create a materialized view on the table, and then use the replacement to switch to another table.

I think there may be value in treating a replacement MV as a "flavour" of materialized view instead of a totally separate concept. That has some of the benefits that Gábor mentions - minimal new syntax and catalog changes, less need for a bunch of new catalog collections - and it doesn't strike me as too hard to explain to users.

I disagree with the complexity argument. I tried this, and while it may work from a syntax perspective, we want the distinction in code. I found it hard to implement replacements as extensions of materialized views without their own catalog entry. Replacements need to behave differently in many cases, so a lot of code that matches on the item enum needs to distinguish behavior based on the context.

This doesn't mean we couldn't unify the syntax, for example by crating a replacement as part of a ALTER MATERIALIZED VIEW command instead of introducing a CREATE REPLACEMENT command. The benefit of CREATE is that it's easier to store in the catalog.

Also, all state of a catalog item needs to be stored in its create_sql statement, which adds another layer of complexity to making the replacement part of the materialized view.

These aren't great reasons, but I feel even on a conceptual level it's nice to have a distinction between a materialized view and its replacement.

At any moment, even before cutover, the "correct" value for the replacement MV at some time is recoverable from the contents of the correction buffer plus the original shard at that time. So if we really wanted to support selecting from the candidate replacement MV, it's potentially feasible as long as we're running the query on the cluster that's building the replacement.

Totally, we could provide this feature in compute. A complication is that the primitives to do this don't exist. We can select from indexes, persist, or build a dataflow to read from either of them, but we can't surface the contents of the MV correction buffer. I'm open to suggestions here!

In the spirit of the principles and making things undoable, ALTER MATERIALIZED VIEW mv_name APPLY REPLACEMENT replacement_name is irrevocable which increases risk - if the new definition breaks some downstream we can't fix them without hydrating a new replacement with the old definition. But! You could imagine instead implementing a ALTER MATERIALIZED VIEW foo SWAP REPLACEMENT replacement_name which atomically promotes the replacement MV to write to the shard and disables writes from the old MV and turns it into a replacement MV; then, if it's bad for whatever reason, you could do another swap to undo your change and restore the old definition. (As long as the schema hasn't changed, I guess.)

Yup, that could be interesting to explore. We'd need to add a transition from read-write to read-only, which seems doable. A problem might be that the new materialized view might overwhelm the original one, which has the potential to take down the replica it's running on. From this perspective, I'm not convinced it'd add a lot of operational confidence compared to decommissioning it immediately. For example, when the diff caused by the replacement took down downstream replicas, they'd restart and likely hydrate successfully, so switching back to the old MV would cause more stress for them.

@antiguru antiguru requested a review from Copilot November 17, 2025 09:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a design document for replacing materialized views without requiring drop-and-recreate operations. The design proposes a stage-and-apply approach where users can create replacement definitions, validate them, and then apply them to existing materialized views while preserving dependencies.

Key changes:

  • Introduces a new "replacement" concept that decouples materialized view definitions from their storage shards
  • Proposes new SQL commands: CREATE REPLACEMENT, ALTER MATERIALIZED VIEW APPLY REPLACEMENT, and DROP REPLACEMENT
  • Defines compatibility rules and timestamp selection mechanisms for applying replacements

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Moritz Hoffmann <[email protected]>
@teskje
Copy link
Contributor

teskje commented Nov 17, 2025

Replacements need to behave differently in many cases, so a lot of code that matches on the item enum needs to distinguish behavior based on the context.

Can you say a bit more about the places where we need to treat them differently? In my naive mind, a replacement MV is just a normal MV in read-only mode and pointing to a shard that's shared with another MV. I see some annoyance about dropping the replacement, you need to be careful to not drop the original MV's shard as well. But this seems like something the storage controller should transparently handle, since both would use different GlobalIds to refer to the shard.

@teskje
Copy link
Contributor

teskje commented Nov 17, 2025

I hacked something together trying out modeling the replacements as MVs as well: #34177. It doesn't feel too bad. Note that this is missing the crucial "apply" step and there are strange bugs. But on the other hand, all the EXPLAIN/SHOW stuff just works.

@antiguru
Copy link
Member Author

Can you say a bit more about the places where we need to treat them differently?

Yes! I think the following should be true:

  • It's not possible to select from a replacement. Unless we specify the semantics what it means (read the pending diffs, read the original MV are some options), I think we should not permit it.
  • No objects can depend on the replacement name. Once we apply the replacement, it should disappear from the catalog. In that sense it's not an alias for the MV it replaces.
  • Replacements form a strong dependency on the replaced object. This is more a policy question, but to me this seems most intuitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants