fix(network): block-sync stack review follow-ups#295
Open
evan-forbes wants to merge 1 commit into
Open
Conversation
Author
|
happy to revert this as well fwiw
might cause issues with existing deployments |
7 tasks
Author
|
fixing conflicts now |
Follow-ups from review of the merged block-sync throughput stack (#284, #286, #287, #288, #289): - sequencer: drop the redundant 500ms timer-driven floor-starvation shed. Floor rescue is already demand-driven (inline after each accepted body, plus the synchronous FundFloorReservation path), so the periodic backstop was a pure fixed-cadence poll. - reactor: make the floor watchdog event-driven. Arm a sleep to the earliest outstanding floor-claim deadline instead of polling on a fixed tick, mirroring the per-peer routine's own-timeout arm. Removes the floor_watchdog_tick config knob. - consensus/state: rename consensus.disable_vct_fast_sync -> consensus.vct_fast_sync (default true) to remove the double negative. Mirror updated in state config, docs, and CHANGELOG. - work_queue: debug_assert the take_in_range_budgeted precondition (positive count, low <= high) instead of silently returning empty. - block_sync: reword refactor-historical comments ("ported from", "verbatim", "matches the previous", "used to") to describe current behavior. AI-assisted: implemented with Claude Code (Opus 4.8).
c8db64e to
bfdfe76
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Review follow-ups ("flups") collected from the merged block-sync throughput stack
(#284, #286, #287, #288, #289). These were filed as deferred during stack review and
are bundled here as a single follow-up PR against
feat/pre-release-main.The two largest themes:
exists — two backstops remained after the stack landed (the fix(network): add block sync congestion control #288 merge kept both
the demand-driven floor-funding path and the old timer backstop).
disable_vct_fast_sync).Solution
sequencer: drop the redundant timer-driven floor-starvation shed (F1, fix(network): prioritize block sync floor requests #287).The 500ms
FLOOR_STARVATION_SHED_INTERVAL/shed_tickarm is deleted. Floorrescue is already fully demand-driven: inline after each accepted body, plus the
synchronous
SequencerControlInput::FundFloorReservationpath a floor requesterdrives when it cannot reserve budget. The periodic backstop was a pure
fixed-cadence poll on top of those.
reactor: make the floor watchdog event-driven (F2). Instead of polling on afixed
floor_watchdog_tick, the reactor arms asleepto the earliestoutstanding floor-claim deadline (a new non-allocating
PeerRegistry::earliest_outstanding_deadline_at), mirroring the per-peerroutine's existing own-timeout arm. The cancellation logic (
run_floor_watchdog)is unchanged; only when it fires moves from a 1s poll to the exact deadline.
Removes the
floor_watchdog_tickconfig knob (+ accessor, default, const, andboth stored test configs).
consensus/state: renameconsensus.disable_vct_fast_sync→consensus.vct_fast_sync(F3, fix(network): tune block sync throughput defaults #289). Default flips fromfalsetotrue(same behavior, no double negative). Inverted at every consumer: the state-config
mirror,
select_source_mode,finalized_stateresume guard, thezebradstartup wiring, unit/prop tests, the VCT design doc, and CHANGELOG. No
serde(alias)— this is a clean pre-release rename (the old name never shipped ina stable release).
work_queue: assert thetake_in_range_budgetedprecondition (F4, perf(network): pack block sync ranges by size hint #284).Adds
debug_assert!(max_count > 0 && low <= high, ...)so an invalid call is aloud caller bug in debug/test builds, while release still degrades to an empty
Vecrather than panicking a live node.block_sync: reword refactor-historical comments (F5, perf(network): retain raw block bodies in reorder backlog #286). Comments framedas "ported from / verbatim from reactor.rs / matches the previous
run_peer / used to …" required knowing the pre-refactor code and carried
rot-prone line numbers. Reworded across the module to describe current behavior;
genuine "why it's shaped this way" comments were kept.
Dropped after discussion (left as-is): bumping
MAX_BS_INFLIGHT_REQUESTSand the32k
DEFAULT_BS_MAX_INFLIGHTdefault.Tests
cargo fmt --all -- --check— clean.cargo clippy -p zebra-network -p zebra-consensus -p zebra-state -p zebrad --all-targets -- -D warnings— clean.cargo nextest run -p zebra-network -p zebra-consensus -p zebra-state— 1149/1150pass. The one failure (
peer::handshake::tests::mutual_p2p_v2_legacy_upgrade_forms_zakura_connection)is a pre-existing concurrency flake unrelated to this change — it passes 3/3 in
isolation and touches handshake code, not block-sync/config/VCT.
vct::tests::source_mode_precedence(rewritten forthe inverted flag), the consensus/state default tests, the stored-config TOML
parse/round-trip, and the block-sync
shed_top_*/ admission tests.Specifications & References
Originating review threads: #284, #286, #287, #288, #289.
Follow-up Work
peer_routine'sOUTBOUND_FULL_POLL_INTERVAL(10ms re-check while a peer'soutbound stream is full) is the one remaining poll left in place: converting it
needs a session-level "outbound capacity freed" notification that may not exist
yet, and the
recvgating is subtle. Left untouched for safety and noted here.AI Disclosure
removal, event-driven watchdog, config rename, comment sweep) and ran the local
fmt/clippy/test validation.
PR Checklist
type(scope): description