feat(network): header-carried tree-aux roots in header sync#254
feat(network): header-carried tree-aux roots in header sync#254evan-forbes wants to merge 16 commits into
Conversation
(cherry picked from commit faa17f6)
…ot rejection The reactor already rejects header-sync responses that carry tree-aux roots on a non-finalized range (`UnrequestedTreeAuxRoots` at decode, `MalformedMessage` at the reactor), but the tests still sent roots on those ranges. Switch non-finalized test messages to roots-free builders, add the `finalized_*` opt-in builders for finalized ranges, and add guard tests: - `decode_rejects_tree_aux_roots_when_not_requested` - `non_finalized_response_carrying_tree_aux_roots_is_malformed` Also wait on the backfilled headers landing (not the pre-set finalized height) before asserting the backward checkpoint-range commit trace. (cherry picked from commit 0872488)
…hits
Add observability for the header-carried tree-aux-roots feature:
- header-sync `headers_served`/`headers_received` rows now carry
`want_tree_aux_roots` and `tree_aux_roots_len`, and the header-sync
driver's commit-state rows carry `tree_aux_roots_len`
- new `state.vct.fast_path.{hit,miss}` counters record whether a finalized
commit consumed peer-supplied roots to skip the note-commitment rebuild
- `insert_bool` trace helper and a small `root_bytes` if/else cleanup
(cherry picked from commit 37dfbb8)
…aders tree_aux_roots_for_served_header_range returned an empty vec whenever the available roots did not cover every requested header height. Because served headers normally run ahead of committed/provisional roots, that empty-on-gap behavior meant no roots were ever served over the header-sync path, silently disabling header-carried tree-aux roots. Stop at the first missing or misaligned height and return the aligned prefix collected so far, matching the served_header_tree_aux_roots_require_a_complete_aligned_prefix test.
Header-carried tree-aux roots (the header-sync Headers message plus CommitHeaderRange persistence to zakura_header_commitment_roots_by_height) fully replace the old separate tree_aux request/response stream, so remove it. - zebra-network: delete the zakura/tree_aux client/server module, its stream kind, capability registration, and the tree_aux_port plumbing through init_with_zakura_header_sync / spawn_zakura_endpoint / service_registry. - zebrad: delete tree_aux_driver (StateTreeAuxPort serving + run_tree_aux_driver fetch loop) and its start.rs wiring; drop the now-dead tree_aux_roots_writer argument from drive_zakura_header_sync_actions. - zebra-state: remove TreeAuxRootsWriter, PeerSourceWriter, PeerSourceHandle, and the peer-root refetch signal; drop the writer from zebra_state::init's return. PeerSource stays as the DB-backed reader the committer uses. A missing or rejected root now waits for header sync to deliver a replacement via the in-place commit retry, or (on an archive node) recomputes from the per-height trees; it no longer refetches over a separate stream. zakura-commit-bench --with-roots is disabled pending a re-port onto the CommitHeaderRange path. (cherry picked from commit 979254f3edcd6d382d986f6a15473cb2cdaa4da6)
eb312f9 to
8ba535c
Compare
| /// Whether the requester wants all-or-nothing tree-aux roots. | ||
| want_tree_aux_roots: bool, |
There was a problem hiding this comment.
There are 2 cases when we expect these to be false:
- Syncing in non-checkpoint mode
- Beyond the last checkpoint.
I'd rather always require this for initial simplicity of the design and to avoid having conditions that are error-prone. I think that this is easy to misuse
We could track the optimization via a Linear task
There was a problem hiding this comment.
I also noticed that we are requesting these in non-checkpoint mode, which is redundant
| /// at version 1; a peer naming any other version of a known kind is rejected. | ||
| const ZAKURA_STREAM_VERSION_1: u16 = 1; | ||
| const ZAKURA_STREAM_VERSION_2: u16 = 2; | ||
| const ZAKURA_STREAM_VERSION_4: u16 = 4; |
There was a problem hiding this comment.
Do we want to downgrade this before the final release?
| if start_height > tip { | ||
| Vec::new() |
There was a problem hiding this comment.
Might be worth having traces/metrics here
| if roots.is_empty() && !state.db.is_vct_synced() { | ||
| roots = |
There was a problem hiding this comment.
| if height < start { | ||
| return Vec::new(); | ||
| } |
There was a problem hiding this comment.
I am a bit worried about the number of these silent Vec::new() conditions.
It is very likely that some nodes are not serving roots because of something silly, but we cannot see it due to the lack of observability/error propagation
Record a one-time `vct_upgrade_height` marker `U` (the lowest height this binary commits, and the lowest height in the `commitment_roots_by_height` serving index) in a new `vct_upgrade_metadata` column family. Written once on the first committed block and never moved. Root serving (`ReadRequest::BlockRoots`) now stitches the per-height trees below `U` with the serving index at and above `U`, so a node that upgraded mid-chain serves a range crossing `U` as one gap-free batch instead of the short index-only prefix that stalled the fetch client's minimum-progress check. A pre-index archive node (no `U`) still derives the whole range from the trees. Historical note-commitment tree availability is now the band `[U, H)` (H = checkpoint handoff) via a new `vct_tree_absent` helper: trees are present below `U` (pre-upgrade) and at/above `H` (semantic sync), absent only in between. For a genesis fast-sync (`U = 0`) this reduces exactly to the prior `height < H` behaviour.
Regenerate the column_family_names and per-CF raw-data/empty snapshots to reflect the vct_sync_metadata rename, the zakura_header_commitment_roots_by_height feature CF, and the new vct_upgrade_metadata CF added by the upgrade-height stitch.
p0mvn
left a comment
There was a problem hiding this comment.
LGTM.
Let's start updating documentation (e.g., docs/design/verified-commitment-trees.md) and add comments to critical areas.
I committed low-risk nits and doc/comment updates directly to not block cycles on this
|
Having problems with GitHub failing to merge. Looking into |
|
Okay, it got merged into |
Motivation
Carry the per-height Sapling/Orchard commitment roots (
tree_aux) inside theheader-sync
Headersmessage instead of fetching them over a separate stream,and remove the old separate stream now that it's redundant. A VCT
fast-syncing node receives provisional roots alongside the headers it already
requests, persists them ahead of body commit, and skips the note-commitment
frontier rebuild when the roots verify.
Scoped to that feature; targets PR #189 (
perf-note-commit-tree).What's in scope
zakura/header_sync/*,handler.rs):Headerscarries anoptional all-or-nothing
tree_aux_rootsvector parallel toheaders;GetHeadersgainswant_tree_aux_roots; header-sync stream version bump;byte-budget accounting, root validation, new error variants.
zebra-state/*): a newzakura_header_commitment_roots_by_heightCF persists provisional header-aheadroots;
CommitHeaderRangeacceptstree_aux_roots;BlockRootsservingappends provisional header-ahead roots after committed roots;
PeerSource/VctStateread provisional roots from the DB.header_sync_driver.rs,start.rs): serve roots forfinalized ranges, route received roots into
CommitHeaderRange.trace.rs):want_tree_aux_roots/tree_aux_roots_lenonheader-sync rows,
tree_aux_roots_lenon commit-state rows, andstate.vct.fast_path.{hit,miss}counters.zakura/tree_auxclient/servermodule, its stream kind + capability registration, the
tree_aux_driver(serving port + fetch loop), and the state-side
TreeAuxRootsWriter/PeerSourceWriter/PeerSourceHandle/ peer-root refetch signal.PeerSourcestays as the DB-backed reader. A missing/rejected root now waits for header sync
to deliver a replacement via the in-place commit retry (or, on an archive node,
recomputes from the per-height trees) — no separate-stream refetch.
Commits
refactor!: move aux tree to the headersync message— wire + state + wiring(de-bundled; see below).
test(network): align header-sync tests with non-finalized tree-aux-root rejectionfeat(network): trace header-carried tree-aux roots and vct fast-path hitsfix(zebrad): serve the aligned tree-aux root prefix when roots lag headersrefactor!: remove the separate tree_aux fetch streamDe-bundling note
The source commit for (1) also contained unrelated block-sync work (sequencer
round-robin fairness,
clear_submitted_applies_through, apply-limit tuning, etc.).All of that was excluded here —
block_sync/is byte-identical to the #189 base.Bug fixed (commit 4)
tree_aux_roots_for_served_header_rangereturned an empty vec whenever theavailable roots didn't cover every requested header height. Since served
headers normally run ahead of roots, that meant no roots were ever served over
the header path — silently disabling the feature. Fixed to return the aligned
prefix.
Known caveats (draft)
Test evidence
cargo fmt --all -- --check,cargo check,cargo clippy --all-targets— clean(only pre-existing valar-base warnings:
tx_v6cfg inzebra-chain,clone_on_copyinzebra-rpc).cargo test -p zebra-network --lib zakura::*— 537 passed.cargo test -p zebrad --lib zakura_header_sync_driver_tests— 35 passed.cargo test -p zebra-state --lib --features proptest-impl finalized_state— 174passed, 2 pre-existing failures unrelated to this PR (a stale
column_family_namessnapshot from the valarfast_sync_metadata→vct_sync_metadatarename + the new CF, and
vct_frozen_frontier_survives_reopen).AI disclosure
Used Claude Code to extract/de-bundle the source commits onto the #189 base,
remove the old tree_aux stream, resolve conflicts, and author the prefix bug fix.