feat(perf): add private Zakura cohort deploy and bench tooling by p0mvn · Pull Request #291 · valargroup/zebra

p0mvn · 2026-06-27T07:09:58Z

Motivation

Sync-perf debugging from the 1.8M snapshot was unreliable when bench nodes peered with the shared public Zakura fleet: serving peers could change mid-run as other engineers redeployed nodes, making results hard to compare. This PR brings the previously merged private Zakura cohort, deployer, and benchmark harness work onto ironwood-main so an isolated, operator-controlled benchmark can be run against a deterministic serving cohort.

This also restores a local, ad-hoc multi-node zebrad deploy path. The existing automated deploy path targets a single hard-coded compatibility host in CI; the new deployer can build a chosen commit, push it to a fleet, run it as a service, and fetch or follow node logs.

Solution

Add private Zakura dev-network cohort support and documentation, including cohort identity checks in the Zakura handshake so isolated test cohorts do not mix with public/default peers.
Add deploy/deployer/, a dependency-free Python 3.11+ CLI that builds zebrad once per unique commit SHA, deploys binaries/config/systemd units to multiple SSH targets, reports status, and fetches or follows deterministic log files.
Add deploy/runner/ benchmark tooling for the private-cohort lifecycle: seed serving nodes, render peer configs, freeze serving nodes, run local sync benchmarks, analyze CSV bottleneck metrics, show a live dashboard, verify isolation, and collect logs.
Add make perf-* wrappers and extend the deployer renderer for [network.zakura], storage mode, V2/legacy P2P toggles, metrics endpoints, tracing filters, and running commit reporting.
Add commit-metrics-gated state instrumentation for commit pipeline phases and batch bytes, consumed by the benchmark analyzer/dashboard while leaving default builds unaffected.

Tests

Source PR validation, carried over from #264 and #267:

Deployed two live archive serving nodes, formed a private Zakura cohort, and verified zakura.p2p.conn.* metrics with zero wrong_network/wrong_chain rejects.
Ran an isolated bench that synced from the cohort only (legacy peers = 0, VCT fast path active), producing CSV output and analyze results.
Validated deployer behavior end-to-end: cold build produced zebrad 5.0.0-rc.3, a second build reused the cache, deploy succeeded to both nodes, status reported active services, logs fetch copied deterministic log files, and logs follow streamed live output.
Unit-checked deployer render paths by parsing rendered TOML for seed/freeze phases and confirming existing fleets without new fields render unchanged.
make -n confirmed each perf target maps to the intended perf.sh command.

* feat(network): add private Zakura dev-network cohorts Zakura (v2) dev nodes bootstrap from a few peers, but discovery and gossip then pull in the rest of the network, so concurrent experiments by different team members collide. This adds an opt-in way to run an isolated v2 overlay on top of unchanged mainnet consensus. Add an optional `[network.zakura] dev_network` cohort tag. When set, a node only forms Zakura connections with peers sharing the same tag: its `ZakuraHandshakeConfig` advertises `ZakuraNetworkId::Configured` and a cohort-derived `chain_id` (`derive_dev_chain_id` = domain-separated blake2b over the real genesis hash and the tag). Both fields are already validated in the Zakura control handshake, the legacy->Zakura upgrade prelude, and signed discovery records (records copy `handshake.chain_id`), so isolation propagates with no wire-format change and no new reject code: - a public mainnet node (`network_id = Mainnet`) and a dev node reject each other with `WrongNetwork` and stay on legacy; - different cohorts (both `Configured`, different `chain_id`) reject with `WrongChain`, and cross-cohort discovery records fail import; - same-tag peers match and form the private overlay. The tag only scopes the Zakura v2 overlay. Genesis, network magic, and activation heights are unchanged, so a cohort node validates the real chain. `chain_id` here is a Zakura peer-matching id only; block validation uses the unchanged network parameters. Has no effect unless `v2_p2p` is enabled. The legacy->Zakura upgrade path rebuilds the handshake config from scratch, so it now also threads the cohort tag; otherwise a tagged node would advertise the cohort id on its native endpoint but the plain id during upgrades and could not upgrade with its own cohort. * docs(network): add Zakura dev-network README Add a developer-facing README for the `[network.zakura] dev_network` cohort feature next to the code (`zebra-network/src/zakura/README.md`): what it does, the network_id/chain_id mechanism, the code map, and how to test. Complements the operator-facing book guide.

Add deploy/deployer/, a dependency-free Python CLI for deploying zebrad to a fleet of nodes and collecting their logs. It reuses the build -> scp -> install-with-.bak -> systemctl restart -> rollback pattern from deploy-zcashd-compat.yml, generalized to a dynamic multi-node TOML config (per-node name / ssh_string / commit). - build: resolve each node's commit to a SHA (origin/<ref> fallback) and build zebrad once per unique SHA into a cache, using a throwaway detached git worktree so the caller's working tree is untouched. Honors CARGO_TARGET_DIR. - deploy: distribute the binary + a rendered zebrad.toml (deterministic [tracing] log_file) + a systemd unit, install with a .bak backup, restart, and roll back on failure. Nodes run in parallel. - status: per-node service state + version. - logs fetch / logs follow: copy or live-tail the deterministic log file by name. Remote scripts are fed on stdin (bash -s) rather than as ssh args, since ssh flattens argv and would collapse `bash -c '<multi-word>'` to its first word.

…267) Tooling for deterministic, isolated sync-perf benchmarking against a private Zakura cohort of operator-controlled serving nodes. - deploy/runner/: perf.sh is the single entry point over the seed -> peers -> freeze -> run -> analyze lifecycle; feed_run.sh forks a snapshot and samples five-category bottleneck metrics into a CSV; feed_analyze.py does steady-state bottleneck attribution; a tokenized bench config and serving-fleet template; the live metrics dashboard; and a runbook. Host-specific paths and the cohort identity live in an untracked cohort.env. - deploy/deployer/deploy.py: render a fleet-wide [network.zakura] cohort block, storage_mode / v2_p2p / legacy_p2p, and an optional metrics endpoint plus tracing filter; `status` now reports the running git commit and configured ref. - make/perf.mk and Makefile: `make perf-*` wrappers (build-local, run, analyze, dashboard, verify-isolation, seed/peers/freeze/status).

…refinements (#271) Adds per-phase commit-pipeline instrumentation (behind the `commit-metrics` feature) and the perf-harness refinements that consume it, on top of the private-cohort bench harness (#267). State instrumentation (commit-metrics gated; default builds unaffected): - finalized_state.rs: time the history-tree MMR push (history_push phase). - block.rs: time spent-UTXO reads, address-balance reads, and batch build, and record committed batch size in bytes (write-throughput / per-block size). - disk_db.rs: DiskWriteBatch::size_in_bytes() accessor for the above; gated to its sole caller so a default build does not flag it dead. Perf harness (deploy/runner + make/perf.mk): - feed_run.sh: stop a prior same-label run before re-forking, so a rerun can't collide on the fork/ports (RocksDB "multiple active instances" -> instant exit). - perf.sh: add a `logs` subcommand (drift-spam filtered; RAW=1 for all). - zebra-metrics-dashboard.py: smooth throughput over a 20s trailing window so batched checkpoint commits don't alias into false spike/zero; add --smooth. - feed_analyze.py: commit-pipeline attribution over the new metric set. - make/perf.mk: perf-logs target; default stop height -> 1.9M; wider analyze window. Verified: `cargo check -p zebra-state` passes with and without commit-metrics; rustfmt clean. Instrumentation exercised live by the bench binary emitting these metrics during cohort sync. AI disclosure: drafted with Claude Code (Claude Opus) — instrumentation, harness changes, and this description.

v12-auditor · 2026-06-27T07:10:06Z

Note

Complete: Audit complete. V12 found one issue worth reviewing.

Open the full results here.

Finding	Severity	Details
F-93627	🟠 `High`	Ironwood state is unenforced V6 transactions contain an `ironwood_shielded_data` field and expose Ironwood nullifiers and note commitments, while `ironwood.rs` states that Ironwood has distinct note-commitment and nullifier state. The state layer still enforces only Sprout, Sapling, and Orchard: finalized duplicate-nullifier checks, mempool/best-chain duplicate checks, anchor checks, nullifier persistence, and note-commitment tree persistence all omit Ironwood. The finalized write path calls `prepare_shielded_transaction_batch` and `prepare_trees_batch`, but those helpers write only Sprout/Sapling/Orchard nullifiers and trees. On builds where V6 is enabled and NU6.3/NU7 is active, a malicious block or mempool transaction can therefore reuse an Ironwood nullifier across transactions or rely on Ironwood commitments and anchors that are never added to consensus state. The defect is conditional on Ironwood activation, but under that configuration it is a consensus-critical validation gap.

And one more auto-invalidated finding.

Analyzed six files, diff f2bf436...9cfc07a.

p0mvn · 2026-06-27T20:32:11Z

@cursor review

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.}

^{Reviewed by Cursor Bugbot for commit 151bd15. Configure here.}

p0mvn added 4 commits June 27, 2026 01:08

p0mvn marked this pull request as draft June 27, 2026 17:55

reverts and lints

151bd15

cursor Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread deploy/runner/perf.sh Outdated

p0mvn added 2 commits June 27, 2026 14:35

cursor comment

6712e17

revert changelogs

6aca992

p0mvn marked this pull request as ready for review June 27, 2026 20:56

p0mvn merged commit 8934575 into ironwood-main Jun 27, 2026
54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(perf): add private Zakura cohort deploy and bench tooling#291

feat(perf): add private Zakura cohort deploy and bench tooling#291
p0mvn merged 7 commits into
ironwood-mainfrom
roman/ironwood-zakura-perf-deploy

p0mvn commented Jun 27, 2026 •

edited

Loading

Uh oh!

v12-auditor Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

p0mvn commented Jun 27, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

p0mvn commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Tests

Uh oh!

v12-auditor Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p0mvn commented Jun 27, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

p0mvn commented Jun 27, 2026 •

edited

Loading

v12-auditor Bot commented Jun 27, 2026 •

edited

Loading