Skip to content

fix(rpc): stop read-state syncer re-subscribing ~1/sec under finalized lag#10805

Closed
nuttycom wants to merge 2 commits into
ZcashFoundation:mainfrom
nuttycom:fix/readstate-syncer-resubscribe-churn
Closed

fix(rpc): stop read-state syncer re-subscribing ~1/sec under finalized lag#10805
nuttycom wants to merge 2 commits into
ZcashFoundation:mainfrom
nuttycom:fix/readstate-syncer-resubscribe-churn

Conversation

@nuttycom

Copy link
Copy Markdown
Contributor

Motivation

A co-located read-state follower (zebra_rpc::sync::TrustedChainSync, used by Zaino's ReadStateService backend and other init_read_state_with_syncer consumers) re-subscribes to the non_finalized_state_change indexer stream roughly once per second for the entire time its finalized (secondary) state lags the primary. Each teardown makes the node log one INFO line:

INFO zebra_rpc::indexer::methods: client disconnected, dropping non_finalized_state_change task

so a multi-minute catch-up window produces thousands of them.

The cadence is the consumer's commit-retry backoff, not many clients: sync() subscribes → receives one block → try_commit fails (the secondary's finalized state hasn't caught up, so the streamed block has no parent — ValidateContextError::NotReadyToBeCommitted) → drops the subscription and sleeps COMMIT_RETRY_DELAY (1s) → re-subscribes. Re-subscribing buys nothing: it replays the same backlog from the consumer's unchanged chain tips.

Closes #10803.

Solution

  • zebra-rpc/src/sync.rs: on a commit failure, retry the same block in place (keeping the subscription open) up to MAX_IN_PLACE_COMMIT_RETRIES (30) × COMMIT_RETRY_DELAY (1s). try_commit drives try_catch_up_with_primary + fill_finalized_gap on every attempt, so the block becomes committable as the finalized gap closes — without churning the connection. Re-subscribe only as a bounded backstop, kept under the server's 60s non-finalized send timeout so the syncer resets before the server would drop it as a slow consumer. This is not the reorg path: a healthy syncer commits reorg blocks as they arrive on the open subscription, so the retry loop isn't entered.
  • zebra-rpc/src/indexer/methods.rs: demote the three indexer stream-teardown logs (client disconnected, dropping … task in chain_tip_change, non_finalized_state_change, mempool_change) from info! to debug! — a consumer disconnect is a normal lifecycle event.

Net effect: during a finalized-state catch-up window, re-subscriptions drop from ~1/sec to at most ~1/30s (and only when a block is genuinely stuck), and the residual teardown lines no longer appear at default INFO.

Tests

  • cargo fmt -p zebra-rpc -- --check — clean.
  • cargo clippy -p zebra-rpc --lib -- -D warnings — clean.
  • cargo test -p zebra-rpc --lib indexer — passes (indexer decode tests + server spawn).
  • The happy path is unchanged (first try_commit Okbreak immediately), so the existing zebrad/tests/e2e/trusted_chain.rs tip-change assertions are unaffected.

The failure/retry path is not yet covered by an automated test — there's no mock-indexer harness for TrustedChainSync, and the existing e2e test only exercises the happy path. See Follow-up Work.

Specifications & References

Follow-up Work

  • A focused integration test for the retry-in-place + backstop path (a mock read-state that fails N commits then succeeds) would be valuable but needs a new harness.

AI Disclosure

  • AI tools were used: Claude Code (Opus 4.8) for investigation, implementation, and drafting this description. The contributor reviewed and is responsible for all changes.

PR Checklist

  • The PR title follows conventional commits format: type(scope): description
  • The PR follows the contribution guidelines.
  • This change was discussed in an issue or with the team beforehand.
  • The solution is tested.
  • The documentation and changelogs are up to date.

nuttycom and others added 2 commits June 24, 2026 19:06
…d lag

`TrustedChainSync` tore down its non-finalized block subscription and
re-subscribed on every commit failure, backing off 1s (`COMMIT_RETRY_DELAY`).
While the secondary's finalized state lags the primary the streamed block can't
attach, so this repeated once per second for the whole catch-up window, and each
teardown made the server log `client disconnected, dropping
non_finalized_state_change task` at INFO — thousands of lines. Re-subscribing
buys nothing here: it replays the same backlog from our unchanged chain tips.

Retry the same block in place instead, keeping the subscription open;
`try_commit` advances the secondary's finalized state on each attempt, so the
block becomes committable as the gap closes. Re-subscribe only as a bounded
backstop (`MAX_IN_PLACE_COMMIT_RETRIES`) so a genuinely stuck block (e.g. a
primary reorg our forward-only stream can't observe) still resets the stream.

Also demote the three indexer stream-teardown logs (`client disconnected,
dropping … task`) from info to debug: a consumer going away is a normal
lifecycle event.

Refs ZcashFoundation#10803.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stands up a fully-custom mock indexer gRPC server and a genesis-only follower
db, then drives the "finalized state behind the primary" scenario: the streamed
block's parent (the gap block) is fetched via `get_block`, which fails twice
before succeeding.

Asserts the syncer subscribes exactly once — retrying the streamed block in
place rather than re-subscribing per failure — and commits it once the gap
becomes fillable. The single-subscription assertion is what distinguishes the
new in-place retry from the old re-subscribe-per-failure behavior.

The early mainnet block vectors carry V1/V2 transactions that the non-finalized
state rejects, so the test re-emits each block's coinbase as V4 and re-links the
chain, mirroring zebra-state's own continuous-block test helpers.

Also reword the two `fill_finalized_gap` log messages ("will retry" rather than
"on the next subscription") to match the in-place retry behavior.

Refs ZcashFoundation#10803.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread zebra-rpc/src/sync.rs
@arya2

arya2 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

superseded by #10818

@arya2 arya2 closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants