Skip to content

perf(network): reduce block sync request bookkeeping#285

Merged
p0mvn merged 22 commits into
feat/pre-release-mainfrom
review/blocksync-request-bookkeeping
Jun 27, 2026
Merged

perf(network): reduce block sync request bookkeeping#285
p0mvn merged 22 commits into
feat/pre-release-mainfrom
review/blocksync-request-bookkeeping

Conversation

@evan-forbes

@evan-forbes evan-forbes commented Jun 27, 2026

Copy link
Copy Markdown

Motivation

This is the second block-sync PR in the stack. It builds on the size-hint range-packing slice and focuses on reducing per-request bookkeeping overhead before changing backlog or congestion behavior.

Solution

  • Reduce allocations in block-sync request tracking.
  • Track received range bodies with compact bitmap state.
  • Update peer routine, reactor, request, state, and focused tests around request lifecycle behavior.

Scope boundary:

  • No header-sync changes.
  • No raw body retention, floor-priority shedding, congestion control, or throughput default tuning; those are stacked above this PR.

Tests

Passed locally on the rebuilt split stack:

  • git diff --check origin/review/headersync-roots..review/blocksync-throughput-defaults
  • cargo fmt --all -- --check
  • cargo test -p zebra-network zakura::block_sync --lib (146 passed on the final stack tip)
  • Focused PR review tests for request/window behavior passed.

Attempted but blocked by the local toolchain before test execution:

  • cargo test -p zebrad --test acceptance latest_config_is_stored -- --nocapture
  • Blocker: bundled librocksdb-sys C++ compilation fails on RocksDB headers using uint64_t without <cstdint>.

Specifications & References

Stack context:

Follow-up Work

Raw-body reorder retention, floor-priority scheduling, congestion control, and throughput-default tuning are separate stacked PRs above this one.

AI Disclosure

  • No AI tools were used in this PR
  • AI tools were used: Codex was used to reconstruct the stacked branch, run sequential review agents, fix review findings, draft/update PR text, and run validation commands.

PR Checklist

  • The PR title follows conventional commits format: type(scope): description
  • The PR follows the contribution guidelines.
  • This change was discussed in an issue or with the team beforehand.
  • The solution is tested.
  • The documentation and changelogs are up to date.

@evan-forbes evan-forbes force-pushed the review/blocksync-pre-apply branch from b093378 to b01a423 Compare June 27, 2026 04:56
@evan-forbes evan-forbes force-pushed the review/blocksync-request-bookkeeping branch from a7ac6a0 to def6072 Compare June 27, 2026 04:56
@evan-forbes evan-forbes force-pushed the review/blocksync-pre-apply branch from b01a423 to 9a57aad Compare June 27, 2026 05:01
@evan-forbes evan-forbes force-pushed the review/blocksync-request-bookkeeping branch from def6072 to aa54be2 Compare June 27, 2026 05:01
p0mvn and others added 2 commits June 26, 2026 23:29
…ck writer (#128)

* perf(state): serialize raw transactions in parallel when writing blocks

* perf(state): compute block size in parallel + run block-write batch prep in dedicated pool

* comment
…s reads (#140)

* Update zebra-state/src/request.rs

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>

* Update zebra-state/src/request.rs

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>

* perf(state): parallelize and de-duplicate the committer's UTXO/address reads

Before building the write batch, the checkpoint committer reads every transparent
input's UTXO and every changed address's balance from RocksDB, one `zs_get` at a
time on the writer thread. In the transparent-heavy ranges (~100-330K) these
cache-served but serial point lookups dominate the per-block write time while the
other cores sit idle (CPU ~2/8). The spent-UTXO path also re-derives each input's
transaction location twice: once directly and once inside `utxo()`.

Two changes in `write_block`:

- Read the output location once and reuse it via `utxo_by_location` instead of
  letting `utxo()` look it up again (3 reads/input -> 2).
- Fan the spent-UTXO and address-balance reads across the rayon pool (the writer
  already runs inside COMMIT_COMPUTE_POOL) once a block has enough inputs/addresses
  to amortize the fork-join cost, gated by PARALLEL_BLOCK_READ_THRESHOLD (16).

The reads are read-only and land in order-independent maps, so the committed batch
is byte-identical to the sequential path.

Measured over a full mainnet genesis sync, comparing the same binary with and
without this change, per-100K committer-thread metrics (peer-independent):

  range  prep_reads          write_block_total
  100k   7.57 -> 2.64 ms     15.71 -> 10.38 ms
  200k   8.94 -> 3.75 ms     19.01 -> 14.30 ms
  300k  10.89 -> 3.52 ms     20.32 -> 13.07 ms
  400k   2.33 -> 1.05 ms      4.84 ->  3.05 ms

prep_reads drops 55-68% and write_block_total 25-37% across the transparent band,
moving the bottleneck there onto rocksdb commit. No effect on low-input blocks
(gated to sequential) or the heavy shielded region (few transparent inputs).

* clean up and tests

* comment

* clean up comment

* fix(state): remove duplicate finalized block import

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
p0mvn and others added 18 commits June 26, 2026 23:30
…ResponseService (#183)

The core of the verified-commitment-trees peer source (increment 6a) transport: the
`tree_aux` Zakura stream as a one-shot request/response service. A client sends
`GetRoots{start,count}` and the server answers `Roots{...}` from local state.

- `tree_aux/wire.rs`: `TreeAuxMessage` (Status, GetRoots/Roots, RangeUnavailable) + byte
  codec, DoS bounds (max roots/request, max message bytes), stream kind 7 / capability
  `1<<4`. Roots-only — the final frontier is embedded in the binary, not on the wire.
- `tree_aux/service.rs`: `TreeAuxService` implementing `RequestResponseService`, serving
  `GetRoots` from a `TreeAuxStatePort` (a trait the node implements over `zebra-state`'s
  `produce_block_roots`, so `zebra-network` keeps no dependency on `zebra-state`).

Templated on `legacy_gossip`/`header_sync` but far smaller: a request/response service
needs no ordered-stream reactor or scheduler.

Tests: wire round-trip for every message; over-limit and trailing-byte rejection; the
service serves a held range and reports an unheld range unavailable. fmt + clippy clean.

Still to wire (follow-up): register the service in the handler, the client-side driver
that fills `PeerSource` (header-sync-aligned), startup/config, and the two-node run.
…aux (#185)

Wire the tree_aux serving service into a working peer exchange and prove it with a
two-node integration test — the "proof of peers" for the verified-commitment-trees
peer source (increment 6a).

- Make the outbound request/response path (`write_outbound_request_frame_inner`)
  stream-kind-aware: the legacy request stream keeps its legacy-message-specific
  response budget, while generic streams (tree_aux) read response frames bounded only
  by the stream frame cap and a small response-frame count. Previously every outbound
  request was validated as a legacy request, so a tree_aux `GetRoots` was rejected as
  an "unsupported legacy request message type".
- Teach `app_frame_cap_for_stream_kind` about tree_aux (kind 7) so larger roots
  responses are not capped to the control-frame limit.

Test: `two_nodes_exchange_roots_over_tree_aux` stands up two real Zakura nodes over the
loopback transport, negotiates the tree_aux capability, and has the client fetch
`GetRoots` from the server — asserting the roots received over the wire match the
server's holdings. fmt + clippy clean; the legacy request/response path is unchanged
(a logical no-op for non-tree_aux streams).
…o-node test (#186)

The client side of the verified-commitment-trees peer source: fetch_roots pulls a
height range of verified per-block commitment roots from connected peers (bounded
GetRoots requests, advancing by what each peer returns) and delivers each contiguous
batch to a sink. The node wires that sink to a PeerSource so the fast committer reads
peer-fetched roots through the existing seam; the committer re-verifies every root
against its own headers, so the fetch carries no trust.

Test: client_driver_fetches_a_root_range_over_tree_aux drives fetch_roots over the real
loopback transport against a serving peer and asserts the collected range matches the
server's holdings. fmt + clippy clean.
* perf(zebrad): fast-sync commitment roots from peers via tree_aux

Wire the verified-commitment-trees peer source into a running node and make
it the default committer source on networks with embedded final frontiers.

- network: make TreeAuxStatePort async and thread an optional port through
  init_with_zakura_header_sync -> spawn_zakura_endpoint_with_header_sync_driver
  -> service_registry, registering TreeAuxService under the Zakura sync path.
- state: expose the PeerSource write handle (TreeAuxRootsWriter) via a
  process-global so the driver and committer share one root cache; default
  VctState::from_config to the peer source where embedded frontiers exist
  (Mainnet), keeping explicit VCT_FAST/VCT_CAPTURE overrides and a VCT_LEGACY
  opt-out. A height the peers cannot supply stays bit-identical to legacy.
- zebrad: add StateTreeAuxPort over ReadRequest::BlockRoots and a one-shot
  tree_aux driver spawned alongside the header-sync driver.

Make the note-precompute skip per-next-block (vct_fast_will_apply) so legacy
fallback blocks keep their precompute overlap instead of a coarse fast flag.

* tests

* perf(state): test the source-mode precedence and add fast/legacy commit metrics

Review follow-ups for the tree_aux wiring:

- Factor the from_config source precedence into a pure select_source_mode and
  unit-test it (locks the peer-source-default flip, the VCT_LEGACY opt-out, the
  no-embedded-frontier legacy path, and the fixture/capture overrides) without
  touching process env or the embedded files.
- Make StateTreeAuxPort generic over the read service and unit-test the serve
  mapping: BlockRoots passthrough, and read-error / wrong-response both degrade
  to an empty (unavailable) range.
- Add live observability counters for the commit path: state.vct.fast.block.count,
  state.vct.legacy.block.count, and state.vct.prevalidated.block.count, so the
  fast-vs-legacy ratio is visible at runtime (previously only an in-memory count
  in a VCT_DIGEST shutdown log).
- Drop stale dead_code allows on PeerSource now that it is wired in.
…Regtest frontier override (#191)

* test(zebrad): integration-test tree_aux serving over the wire; add a Regtest frontier override

Closes the one unverified seam in the verified-commitment-trees `tree_aux` peer
source: a real node serving per-block roots from local state over the wire. The
transport (zebra-network two-node + codec) and the committer (zebra-state
PeerSource + handoff) were already unit-tested; this proves the production serving
stack over the real transport on real state.

Integration test `tree_aux_serves_real_state_roots_over_the_wire`
(`zebrad/.../zakura/tree_aux_driver.rs`): a real `populated_state` finalized DB
serves roots through the production `StateTreeAuxPort` -> `TreeAuxService` over the
real loopback Zakura transport (`ZakuraTestNode`); a peer's `fetch_roots` receives
exactly what the state serves via `ReadRequest::BlockRoots`. A negative case fetches
a range above the tip: `fetch_roots` errors, so the committer keeps that range on the
legacy path (safe by construction, never wrong state).

Also adds a Regtest handoff-frontier override so the fast path can be exercised
deterministically on Regtest (whose checkpoint list is derived at runtime, so there
is no committed frontier to embed):

- Loader: `embedded_final_frontiers` gains a Regtest-only arm that loads the frontier
  from the `VCT_REGTEST_FRONTIER` file, validated against the Regtest checkpoint
  height. Mainnet still uses the embedded constant and never reads the env.
- Producer: `VCT_CAPTURE_FRONTIER` (+ `VCT_CAPTURE_FRONTIER_HEIGHT`) dumps the tip
  treestate frontier on the legacy commit path, so a synced node can generate the
  fixture the loader reads. `FinalFrontiers::to_bytes` is now compiled outside tests.

Unit tests cover the loader round-trip and the height-mismatch rejection. The
fast-path-engaged signal needed for the higher e2e layers already exists
(`state.vct.fast.block.count`). The full two-process docker regtest e2e (a node
fast-syncing from a peer over the network) is the production-grade follow-up, now
unblocked by this Regtest override.

* better naming
`rollback_finalized_state` rolled back the block/tx/UTXO/tree/nullifier CFs
but left the Zakura header store (`zakura_header_*`) untouched. Because the
header store races ahead of the body chain and is keyed independently, a
rolled-back database kept header rows -- and a `BestHeaderTip` -- far above
the new body tip.

That inconsistency stalls Zakura block (body) sync on the resulting node:
`missing_block_bodies` only offers heights that already have a stored header,
so the contiguous floor body (`target_height + 1`) is never requestable, the
reorder buffer never drains, the verified tip is frozen, and after the
5-minute body-sync stall timeout the node falls back to legacy ChainSync.

Roll the Zakura header store back too: delete every `zakura_header_*` entry
above `target_height`, scanning from the (possibly higher) Zakura header tip
down. After this a rolled-back DB's `BestHeaderTip` is <= the body tip,
header-sync re-validates contiguously from `target_height + 1`, the floor body
is requestable, and block-sync advances.
…stfmt (#202)

PR #198 added `delete_zakura_headers_above` (rolling the Zakura header store
back with the body chain) without unit coverage, and its CF-handle lines were
left unformatted (`cargo fmt --check` failed on rollback.rs:864).

Add two unit tests against an ephemeral state DB:
- the populated case asserts heights above the target are removed from all four
  zakura_header_* CFs, including the hash->height index, while heights at or
  below the target are retained;
- the empty-store case asserts truncation is a no-op and does not panic on the
  empty-tip lookup.

Run rustfmt over the function so the crate is fmt-clean again.
…is (#201)

The tree_aux driver hard-coded its root fetch to begin at genesis
(`fetch_roots(.., Height(1), ..)`). The committer only ever looks up a
fast root for blocks it is about to commit, i.e. the range
`[verified_tip + 1, checkpoint]`. Heights at or below the verified tip are
already committed and their roots are never queried.

On a node that starts well above genesis (e.g. from a snapshot), fetching
from genesis spends the whole fetch streaming already-committed roots and
never reaches the window the committer actually needs before it commits
those blocks. Every block then falls back to the legacy note-commitment
recompute path (`vct_fast_blocks = 0`), defeating the fast path.

Read the verified tip (`max(finalized_tip, best_tip)`) once at startup and
begin the fetch at `verified_tip + 1`. A genesis-empty node still yields
`Height(1)`, so the from-genesis behavior falls out exactly when the node
really is at genesis. The fetched range stays a superset of what the
committer commits even if the tip advances during the fetch (extra cached
roots below its position are harmless).

Verified end-to-end against a local archive peer from a mid-chain snapshot:
the fetch starts at `from_height = verified_tip + 1`, completes in ~1s, and
the committer reports `vct_fast = 20001, vct_legacy = 0` (previously 0 fast
/ all legacy).
…aders

tree_aux_roots_for_served_header_range returned an empty vec whenever the
available roots did not cover every requested header height. Because served
headers normally run ahead of committed/provisional roots, that empty-on-gap
behavior meant no roots were ever served over the header-sync path, silently
disabling header-carried tree-aux roots. Stop at the first missing or
misaligned height and return the aligned prefix collected so far, matching
the served_header_tree_aux_roots_require_a_complete_aligned_prefix test.
Header-carried tree-aux roots (the header-sync Headers message plus
CommitHeaderRange persistence to zakura_header_commitment_roots_by_height)
fully replace the old separate tree_aux request/response stream, so remove it.

- zebra-network: delete the zakura/tree_aux client/server module, its stream
  kind, capability registration, and the tree_aux_port plumbing through
  init_with_zakura_header_sync / spawn_zakura_endpoint / service_registry.
- zebrad: delete tree_aux_driver (StateTreeAuxPort serving + run_tree_aux_driver
  fetch loop) and its start.rs wiring; drop the now-dead tree_aux_roots_writer
  argument from drive_zakura_header_sync_actions.
- zebra-state: remove TreeAuxRootsWriter, PeerSourceWriter, PeerSourceHandle, and
  the peer-root refetch signal; drop the writer from zebra_state::init's return.
  PeerSource stays as the DB-backed reader the committer uses. A missing or
  rejected root now waits for header sync to deliver a replacement via the
  in-place commit retry, or (on an archive node) recomputes from the per-height
  trees; it no longer refetches over a separate stream.

zakura-commit-bench --with-roots is disabled pending a re-port onto the
CommitHeaderRange path.

(cherry picked from commit 979254f3edcd6d382d986f6a15473cb2cdaa4da6)
@p0mvn p0mvn force-pushed the review/blocksync-pre-apply branch from 9a57aad to 5dddfac Compare June 27, 2026 05:50
@p0mvn p0mvn force-pushed the review/blocksync-request-bookkeeping branch from aa54be2 to 36c2bae Compare June 27, 2026 05:54
@p0mvn p0mvn changed the base branch from review/blocksync-pre-apply to feat/pre-release-main June 27, 2026 06:16
@p0mvn p0mvn marked this pull request as ready for review June 27, 2026 06:17

@p0mvn p0mvn left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Noticing some tests fail but then recalled you mentioned that the splits are such that intermeriaries might not compile, so we can just check that everything passes at the last PR

@p0mvn p0mvn merged commit 06f4c32 into feat/pre-release-main Jun 27, 2026
16 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants