Skip to content

feat(network): header-carried tree-aux roots in header sync#254

Closed
evan-forbes wants to merge 16 commits into
perf-note-commit-treefrom
evan/perf-plus-download-headersync-only
Closed

feat(network): header-carried tree-aux roots in header sync#254
evan-forbes wants to merge 16 commits into
perf-note-commit-treefrom
evan/perf-plus-download-headersync-only

Conversation

@evan-forbes

@evan-forbes evan-forbes commented Jun 25, 2026

Copy link
Copy Markdown

Motivation

Carry the per-height Sapling/Orchard commitment roots (tree_aux) inside the
header-sync Headers message
instead of fetching them over a separate stream,
and remove the old separate stream now that it's redundant. A VCT
fast-syncing node receives provisional roots alongside the headers it already
requests, persists them ahead of body commit, and skips the note-commitment
frontier rebuild when the roots verify.

Scoped to that feature; targets PR #189 (perf-note-commit-tree).

What's in scope

  • Wire/protocol (zakura/header_sync/*, handler.rs): Headers carries an
    optional all-or-nothing tree_aux_roots vector parallel to headers;
    GetHeaders gains want_tree_aux_roots; header-sync stream version bump;
    byte-budget accounting, root validation, new error variants.
  • State persistence/serving (zebra-state/*): a new
    zakura_header_commitment_roots_by_height CF persists provisional header-ahead
    roots; CommitHeaderRange accepts tree_aux_roots; BlockRoots serving
    appends provisional header-ahead roots after committed roots;
    PeerSource/VctState read provisional roots from the DB.
  • Driver wiring (header_sync_driver.rs, start.rs): serve roots for
    finalized ranges, route received roots into CommitHeaderRange.
  • Observability (trace.rs): want_tree_aux_roots/tree_aux_roots_len on
    header-sync rows, tree_aux_roots_len on commit-state rows, and
    state.vct.fast_path.{hit,miss} counters.
  • Old-stream removal (final commit): delete the zakura/tree_aux client/server
    module, its stream kind + capability registration, the tree_aux_driver
    (serving port + fetch loop), and the state-side TreeAuxRootsWriter /
    PeerSourceWriter / PeerSourceHandle / peer-root refetch signal. PeerSource
    stays as the DB-backed reader. A missing/rejected root now waits for header sync
    to deliver a replacement via the in-place commit retry (or, on an archive node,
    recomputes from the per-height trees) — no separate-stream refetch.

Commits

  1. refactor!: move aux tree to the headersync message — wire + state + wiring
    (de-bundled; see below).
  2. test(network): align header-sync tests with non-finalized tree-aux-root rejection
  3. feat(network): trace header-carried tree-aux roots and vct fast-path hits
  4. fix(zebrad): serve the aligned tree-aux root prefix when roots lag headers
  5. refactor!: remove the separate tree_aux fetch stream

De-bundling note

The source commit for (1) also contained unrelated block-sync work (sequencer
round-robin fairness, clear_submitted_applies_through, apply-limit tuning, etc.).
All of that was excluded here — block_sync/ is byte-identical to the #189 base.

Bug fixed (commit 4)

tree_aux_roots_for_served_header_range returned an empty vec whenever the
available roots didn't cover every requested header height. Since served
headers normally run ahead of roots, that meant no roots were ever served over
the header path
— silently disabling the feature. Fixed to return the aligned
prefix.

Known caveats (draft)

  • Header-sync stream version jumps 2 → 4 on this base (no v3 in between).

Test evidence

  • cargo fmt --all -- --check, cargo check, cargo clippy --all-targets — clean
    (only pre-existing valar-base warnings: tx_v6 cfg in zebra-chain,
    clone_on_copy in zebra-rpc).
  • cargo test -p zebra-network --lib zakura::* — 537 passed.
  • cargo test -p zebrad --lib zakura_header_sync_driver_tests — 35 passed.
  • cargo test -p zebra-state --lib --features proptest-impl finalized_state — 174
    passed, 2 pre-existing failures unrelated to this PR (a stale
    column_family_names snapshot from the valar fast_sync_metadatavct_sync_metadata
    rename + the new CF, and vct_frozen_frontier_survives_reopen).
  • Full workspace suite not run (draft).

AI disclosure

Used Claude Code to extract/de-bundle the source commits onto the #189 base,
remove the old tree_aux stream, resolve conflicts, and author the prefix bug fix.

…ot rejection

The reactor already rejects header-sync responses that carry tree-aux roots
on a non-finalized range (`UnrequestedTreeAuxRoots` at decode,
`MalformedMessage` at the reactor), but the tests still sent roots on those
ranges. Switch non-finalized test messages to roots-free builders, add the
`finalized_*` opt-in builders for finalized ranges, and add guard tests:

- `decode_rejects_tree_aux_roots_when_not_requested`
- `non_finalized_response_carrying_tree_aux_roots_is_malformed`

Also wait on the backfilled headers landing (not the pre-set finalized
height) before asserting the backward checkpoint-range commit trace.

(cherry picked from commit 0872488)
…hits

Add observability for the header-carried tree-aux-roots feature:

- header-sync `headers_served`/`headers_received` rows now carry
  `want_tree_aux_roots` and `tree_aux_roots_len`, and the header-sync
  driver's commit-state rows carry `tree_aux_roots_len`
- new `state.vct.fast_path.{hit,miss}` counters record whether a finalized
  commit consumed peer-supplied roots to skip the note-commitment rebuild
- `insert_bool` trace helper and a small `root_bytes` if/else cleanup

(cherry picked from commit 37dfbb8)
…aders

tree_aux_roots_for_served_header_range returned an empty vec whenever the
available roots did not cover every requested header height. Because served
headers normally run ahead of committed/provisional roots, that empty-on-gap
behavior meant no roots were ever served over the header-sync path, silently
disabling header-carried tree-aux roots. Stop at the first missing or
misaligned height and return the aligned prefix collected so far, matching
the served_header_tree_aux_roots_require_a_complete_aligned_prefix test.
Header-carried tree-aux roots (the header-sync Headers message plus
CommitHeaderRange persistence to zakura_header_commitment_roots_by_height)
fully replace the old separate tree_aux request/response stream, so remove it.

- zebra-network: delete the zakura/tree_aux client/server module, its stream
  kind, capability registration, and the tree_aux_port plumbing through
  init_with_zakura_header_sync / spawn_zakura_endpoint / service_registry.
- zebrad: delete tree_aux_driver (StateTreeAuxPort serving + run_tree_aux_driver
  fetch loop) and its start.rs wiring; drop the now-dead tree_aux_roots_writer
  argument from drive_zakura_header_sync_actions.
- zebra-state: remove TreeAuxRootsWriter, PeerSourceWriter, PeerSourceHandle, and
  the peer-root refetch signal; drop the writer from zebra_state::init's return.
  PeerSource stays as the DB-backed reader the committer uses. A missing or
  rejected root now waits for header sync to deliver a replacement via the
  in-place commit retry, or (on an archive node) recomputes from the per-height
  trees; it no longer refetches over a separate stream.

zakura-commit-bench --with-roots is disabled pending a re-port onto the
CommitHeaderRange path.

(cherry picked from commit 979254f3edcd6d382d986f6a15473cb2cdaa4da6)
@evan-forbes evan-forbes force-pushed the evan/perf-plus-download-headersync-only branch from eb312f9 to 8ba535c Compare June 25, 2026 06:58
@evan-forbes evan-forbes marked this pull request as ready for review June 25, 2026 07:28
Comment on lines +72 to +73
/// Whether the requester wants all-or-nothing tree-aux roots.
want_tree_aux_roots: bool,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 cases when we expect these to be false:

  1. Syncing in non-checkpoint mode
  2. Beyond the last checkpoint.

I'd rather always require this for initial simplicity of the design and to avoid having conditions that are error-prone. I think that this is easy to misuse

We could track the optimization via a Linear task

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that we are requesting these in non-checkpoint mode, which is redundant

/// at version 1; a peer naming any other version of a known kind is rejected.
const ZAKURA_STREAM_VERSION_1: u16 = 1;
const ZAKURA_STREAM_VERSION_2: u16 = 2;
const ZAKURA_STREAM_VERSION_4: u16 = 4;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to downgrade this before the final release?

Comment on lines +1564 to +1565
if start_height > tip {
Vec::new()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth having traces/metrics here

Comment thread zebra-state/src/service.rs Outdated
Comment on lines +1572 to +1573
if roots.is_empty() && !state.db.is_vct_synced() {
roots =

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still has a bug I explained yesterday in DM:

#259

I will rebase that PR after merging this

Comment on lines +835 to +837
if height < start {
return Vec::new();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried about the number of these silent Vec::new() conditions.

It is very likely that some nodes are not serving roots because of something silly, but we cannot see it due to the lack of observability/error propagation

p0mvn and others added 6 commits June 25, 2026 11:25
Record a one-time `vct_upgrade_height` marker `U` (the lowest height this
binary commits, and the lowest height in the `commitment_roots_by_height`
serving index) in a new `vct_upgrade_metadata` column family. Written once on
the first committed block and never moved.

Root serving (`ReadRequest::BlockRoots`) now stitches the per-height trees
below `U` with the serving index at and above `U`, so a node that upgraded
mid-chain serves a range crossing `U` as one gap-free batch instead of the
short index-only prefix that stalled the fetch client's minimum-progress
check. A pre-index archive node (no `U`) still derives the whole range from
the trees.

Historical note-commitment tree availability is now the band `[U, H)` (H =
checkpoint handoff) via a new `vct_tree_absent` helper: trees are present
below `U` (pre-upgrade) and at/above `H` (semantic sync), absent only in
between. For a genesis fast-sync (`U = 0`) this reduces exactly to the prior
`height < H` behaviour.
Regenerate the column_family_names and per-CF raw-data/empty snapshots to
reflect the vct_sync_metadata rename, the zakura_header_commitment_roots_by_height
feature CF, and the new vct_upgrade_metadata CF added by the upgrade-height stitch.

@p0mvn p0mvn left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Let's start updating documentation (e.g., docs/design/verified-commitment-trees.md) and add comments to critical areas.

I committed low-risk nits and doc/comment updates directly to not block cycles on this

@p0mvn p0mvn closed this Jun 25, 2026
@p0mvn

p0mvn commented Jun 25, 2026

Copy link
Copy Markdown

Having problems with GitHub failing to merge. Looking into

@p0mvn

p0mvn commented Jun 25, 2026

Copy link
Copy Markdown

Okay, it got merged into perf-note-commit-tree. I think there is a GitHub bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants