Skip to content

Roman/Evan/perf plus download fixes#263

Closed
p0mvn wants to merge 4 commits into
evan/perf-plus-download-fixesfrom
roman/evan/perf-plus-download-fixes
Closed

Roman/Evan/perf plus download fixes#263
p0mvn wants to merge 4 commits into
evan/perf-plus-download-fixesfrom
roman/evan/perf-plus-download-fixes

Conversation

* feat(network): add private Zakura dev-network cohorts

Zakura (v2) dev nodes bootstrap from a few peers, but discovery and gossip
then pull in the rest of the network, so concurrent experiments by different
team members collide. This adds an opt-in way to run an isolated v2 overlay on
top of unchanged mainnet consensus.

Add an optional `[network.zakura] dev_network` cohort tag. When set, a node
only forms Zakura connections with peers sharing the same tag: its
`ZakuraHandshakeConfig` advertises `ZakuraNetworkId::Configured` and a
cohort-derived `chain_id` (`derive_dev_chain_id` = domain-separated blake2b
over the real genesis hash and the tag). Both fields are already validated in
the Zakura control handshake, the legacy->Zakura upgrade prelude, and signed
discovery records (records copy `handshake.chain_id`), so isolation propagates
with no wire-format change and no new reject code:

- a public mainnet node (`network_id = Mainnet`) and a dev node reject each
  other with `WrongNetwork` and stay on legacy;
- different cohorts (both `Configured`, different `chain_id`) reject with
  `WrongChain`, and cross-cohort discovery records fail import;
- same-tag peers match and form the private overlay.

The tag only scopes the Zakura v2 overlay. Genesis, network magic, and
activation heights are unchanged, so a cohort node validates the real chain.
`chain_id` here is a Zakura peer-matching id only; block validation uses the
unchanged network parameters. Has no effect unless `v2_p2p` is enabled.

The legacy->Zakura upgrade path rebuilds the handshake config from scratch, so
it now also threads the cohort tag; otherwise a tagged node would advertise the
cohort id on its native endpoint but the plain id during upgrades and could not
upgrade with its own cohort.

* docs(network): add Zakura dev-network README

Add a developer-facing README for the `[network.zakura] dev_network` cohort
feature next to the code (`zebra-network/src/zakura/README.md`): what it does,
the network_id/chain_id mechanism, the code map, and how to test. Complements
the operator-facing book guide.
@v12-auditor

v12-auditor Bot commented Jun 25, 2026

Copy link
Copy Markdown

Warning

Insufficient credits for auto-review. Keep at least $5.00 of available balance to start a run. Please add credits to continue.

@p0mvn p0mvn changed the title Roamn/Evan/perf plus download fixes Roman/Evan/perf plus download fixes Jun 25, 2026
p0mvn added 3 commits June 25, 2026 18:13
Add deploy/deployer/, a dependency-free Python CLI for deploying zebrad to a
fleet of nodes and collecting their logs. It reuses the build -> scp ->
install-with-.bak -> systemctl restart -> rollback pattern from
deploy-zcashd-compat.yml, generalized to a dynamic multi-node TOML config
(per-node name / ssh_string / commit).

- build: resolve each node's commit to a SHA (origin/<ref> fallback) and build
  zebrad once per unique SHA into a cache, using a throwaway detached git
  worktree so the caller's working tree is untouched. Honors CARGO_TARGET_DIR.
- deploy: distribute the binary + a rendered zebrad.toml (deterministic
  [tracing] log_file) + a systemd unit, install with a .bak backup, restart,
  and roll back on failure. Nodes run in parallel.
- status: per-node service state + version.
- logs fetch / logs follow: copy or live-tail the deterministic log file by name.

Remote scripts are fed on stdin (bash -s) rather than as ssh args, since ssh
flattens argv and would collapse `bash -c '<multi-word>'` to its first word.
…267)

Tooling for deterministic, isolated sync-perf benchmarking against a private
Zakura cohort of operator-controlled serving nodes.

- deploy/runner/: perf.sh is the single entry point over the
  seed -> peers -> freeze -> run -> analyze lifecycle; feed_run.sh forks a
  snapshot and samples five-category bottleneck metrics into a CSV;
  feed_analyze.py does steady-state bottleneck attribution; a tokenized bench
  config and serving-fleet template; the live metrics dashboard; and a runbook.
  Host-specific paths and the cohort identity live in an untracked cohort.env.
- deploy/deployer/deploy.py: render a fleet-wide [network.zakura] cohort block,
  storage_mode / v2_p2p / legacy_p2p, and an optional metrics endpoint plus
  tracing filter; `status` now reports the running git commit and configured ref.
- make/perf.mk and Makefile: `make perf-*` wrappers (build-local, run, analyze,
  dashboard, verify-isolation, seed/peers/freeze/status).
…refinements (#271)

Adds per-phase commit-pipeline instrumentation (behind the `commit-metrics`
feature) and the perf-harness refinements that consume it, on top of the
private-cohort bench harness (#267).

State instrumentation (commit-metrics gated; default builds unaffected):
- finalized_state.rs: time the history-tree MMR push (history_push phase).
- block.rs: time spent-UTXO reads, address-balance reads, and batch build, and
  record committed batch size in bytes (write-throughput / per-block size).
- disk_db.rs: DiskWriteBatch::size_in_bytes() accessor for the above; gated to
  its sole caller so a default build does not flag it dead.

Perf harness (deploy/runner + make/perf.mk):
- feed_run.sh: stop a prior same-label run before re-forking, so a rerun can't
  collide on the fork/ports (RocksDB "multiple active instances" -> instant exit).
- perf.sh: add a `logs` subcommand (drift-spam filtered; RAW=1 for all).
- zebra-metrics-dashboard.py: smooth throughput over a 20s trailing window so
  batched checkpoint commits don't alias into false spike/zero; add --smooth.
- feed_analyze.py: commit-pipeline attribution over the new metric set.
- make/perf.mk: perf-logs target; default stop height -> 1.9M; wider analyze window.

Verified: `cargo check -p zebra-state` passes with and without commit-metrics;
rustfmt clean. Instrumentation exercised live by the bench binary emitting these
metrics during cohort sync.

AI disclosure: drafted with Claude Code (Claude Opus) — instrumentation, harness
changes, and this description.
@p0mvn p0mvn closed this Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant