fix(rpc): stop read-state syncer commit races under finalized-state lag by arya2 · Pull Request #10818 · ZcashFoundation/zebra

arya2 · 2026-06-26T03:33:24Z

Motivation

The co-located read-state syncer (zebra_rpc::sync::TrustedChainSync, used by Zaino's ReadStateService backend and other init_read_state_with_syncer consumers) tears down and re-subscribes to the non_finalized_state_change stream roughly once per second for as long as its secondary finalized state lags the primary, flooding logs and churning subscriptions (#10803).

The root cause is a race, not just backoff cadence. TrustedChainSync shares its secondary RocksDB instance with the update_finalized_chain_tip task. A RocksDB secondary's view only advances when try_catch_up_with_primary is called on it — but both the sync loop and that task call it on the same shared handle. While the secondary lags, update_finalized_chain_tip catches up ~1/sec, so the secondary's view can advance between sync()'s pre-commit check and the commit itself. When that advance finalizes the very block being committed, the non-finalized commit fails with a duplicate-effects error (ValidateContextError::Duplicate{Sprout,Sapling,Orchard}Nullifier { in_finalized_state: true }), the syncer drops the subscription, sleeps COMMIT_RETRY_DELAY, and re-subscribes.

Solution

Stop update_finalized_chain_tip as soon as sync() decodes its first parseable block from the non_finalized_state_change stream, via a dedicated watch::Sender<bool> signal. The task checks the flag at the top of its loop, select!s on it while parked on chain_tip_change.message() (so it stops promptly rather than at the next tip change), and re-checks before catching up / publishing.
From the first streamed block onward, sync() is the sole caller of try_catch_up_with_primary on the secondary, so its view can no longer advance between a check and a commit. This is also the earlier, cleaner handover point for the published chain tip: sync() owns the (higher) non-finalized tip from then on, so the finalized-tip task yielding earlier can't drag the reported tip backwards.
Skip re-inserting an emptied side chain into the chain set during finalization (NonFinalizedState::finalize), which could otherwise leave a zero-length chain behind.
use greater of finalized or non-finalized tip height in fill_finalized_gap() to decide the next block height to get

This removes the concurrent-task source of the race. The in-call fill_finalized_gap catch-up during bootstrap is a separate, narrower path; with the sync task now the single writer of the secondary view, a finalized-check placed immediately before commit() is race-free and can close it in a follow-up if needed. (AI got this wrong, it's fixed in this PR)

Testing

cargo fmt --all -- --check
cargo clippy -p zebra-state -p zebra-rpc --lib -- -D warnings
cargo test -p zebra-state --lib non_finalized (32 passed)
Syncer behavior is exercised end-to-end by the cached-state read-state integration tests.

Related: #10803 (and the prior frequency-reduction work in #10741, #10776).

AI disclosure

Used Claude Code to analyze the race, draft the implementation, and write this PR description. The maintainer reviewed and edited the change (including the non_finalized_state.rs fix). The contributor is the responsible author.

The co-located read-state syncer (`TrustedChainSync`, used by Zaino's `ReadStateService` backend) shares its secondary RocksDB instance with the `update_finalized_chain_tip` task. While the secondary lags the primary, that task calls `try_catch_up_with_primary` ~1/sec, advancing the secondary's view concurrently with `sync()`'s commits. A block can become finalized between `sync()`'s pre-commit check and the commit itself, so the commit fails as a duplicate-effects error (`Duplicate*Nullifier { in_finalized_state: true }`), drops the subscription, and re-subscribes — the per-second churn in #10803. Stop `update_finalized_chain_tip` as soon as `sync()` receives its first parseable block from the `non_finalized_state_change` stream, via a dedicated watch signal. From then on `sync()` is the sole caller of catch-up on the secondary, so its view can't advance between a check and a commit. The task also yields the published chain tip to `sync()`'s (higher) non-finalized tip at that point, so the handover no longer drags the reported tip backwards. Also skip re-inserting an emptied side chain into the chain set during finalization, which could otherwise leave a zero-length chain behind.

Copilot

Pull request overview

This PR fixes a subscription-churn/log-flood problem (#10803) in the co-located read-state syncer (zebra_rpc::sync::TrustedChainSync). The root cause is a race on the shared secondary RocksDB handle: while the secondary finalized state lags the primary, both the sync() loop and the update_finalized_chain_tip task call try_catch_up_with_primary, so the secondary's view can advance between sync()'s pre-commit check and its commit, finalizing the very block being committed and failing it with a duplicate-effects error. The syncer then drops and re-subscribes ~1/sec. The fix makes sync() the sole catch-up caller from its first parseable block onward by stopping the finalized-tip task via a dedicated watch::Sender<bool> signal, moving the chain-tip handover earlier, and skipping re-insertion of an emptied side chain during finalization.

Changes:

Add a started_sync watch channel; sync() flips it on its first parseable streamed block, and update_finalized_chain_tip stops promptly (top-of-loop check, select! on changed() while parked on message(), and a re-check before catch-up/publish).
Replace the prior "non-finalized best chain is populated" handover signal with this earlier "first parseable block" signal, so the finalized-tip task can no longer advance the secondary concurrently with sync()'s commits or drag the reported tip backwards.
In NonFinalizedState::finalize, guard side-chain re-insertion with !side_chain.is_empty() (mirroring the existing best_chain guard) to avoid leaving a zero-length chain; in fill_finalized_gap, use .max(finalized_tip_height()) instead of .or_else(...) for the highest known height.

Risk scan: The change is concurrency-sensitive (task handover on a shared secondary DB) and touches consensus-adjacent finalization logic in zebra-state. A narrow one-shot window remains between send_replace(true) and the task actually returning (the task could be mid-spawn_try_catch_up_with_primary at line 167), but this is at most a single residual occurrence on startup and is explicitly scoped as follow-up in the PR description, rather than the prior persistent ~1/sec churn. No broken references, serialization changes, or protocol/RPC shape changes were found; the .max change is semantically equivalent-or-more-defensive than the prior .or_else in reachable states.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`zebra-rpc/src/sync.rs`	Adds `started_sync` watch channel and uses it to stop `update_finalized_chain_tip` on the first parseable streamed block, making `sync()` the sole secondary catch-up caller; switches `fill_finalized_gap`'s highest-height computation to `.max(finalized_tip_height())`.
`zebra-state/src/service/non_finalized_state.rs`	Skips re-inserting an emptied side chain during `finalize`, consistent with the existing `best_chain` guard.

Copilot AI review requested due to automatic review settings June 26, 2026 03:33

Copilot started reviewing on behalf of arya2 June 26, 2026 03:33 View session

arya2 mentioned this pull request Jun 26, 2026

fix(rpc): stop read-state syncer re-subscribing ~1/sec under finalized lag #10805

Closed

6 tasks

Copilot AI reviewed Jun 26, 2026

View reviewed changes

arya2 linked an issue Jun 26, 2026 that may be closed by this pull request

read-state syncer re-subscribes ~1/sec under finalized-state lag, flooding INFO logs with 'client disconnected, dropping non_finalized_state_change task' #10803

Open

arya2 mentioned this pull request Jun 26, 2026

read-state syncer re-subscribes ~1/sec under finalized-state lag, flooding INFO logs with 'client disconnected, dropping non_finalized_state_change task' #10803

Open

andres-pcg assigned andres-pcg and unassigned andres-pcg Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(rpc): stop read-state syncer commit races under finalized-state lag#10818

fix(rpc): stop read-state syncer commit races under finalized-state lag#10818
arya2 wants to merge 1 commit into
mainfrom
fix-trusted-chain-sync-finalized-churn

arya2 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

arya2 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Testing

AI disclosure

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arya2 commented Jun 26, 2026 •

edited

Loading