Skip to content

fix(zakura): harden auto block-sync memory ceiling#226

Draft
evan-forbes wants to merge 1 commit into
ironwood-mainfrom
auto-mem
Draft

fix(zakura): harden auto block-sync memory ceiling#226
evan-forbes wants to merge 1 commit into
ironwood-mainfrom
auto-mem

Conversation

@evan-forbes

Copy link
Copy Markdown

Motivation

Block-sync's in-flight block-body byte ceiling defaulted to a flat value and could be bypassed by runtime paths that accepted MemoryLimit::Auto without resolving it against real system memory. On cgroup-limited hosts, especially containers, that can overestimate available memory and risk OOM/SIGKILL during startup or sync.

This PR implements the auto memory provisioning plan for Zakura block sync and hardens the follow-up review findings around cgroup discovery, resolver invariants, and the resolved-config boundary.

Solution

  • Add MemoryLimit::{Auto, Bytes} config support and resolve it once at startup through an injectable MemoryProbe.
  • Add a Linux production memory probe in zebrad that resolves the process cgroup path from /proc/self/cgroup and /proc/self/mountinfo, walks ancestors, and clamps host memory by effective cgroup limits/headroom.
  • Clamp auto and explicit ceilings against safe total memory, available memory, MIN_CEILING, MAX_CEILING, and the factor-aware effective bound.
  • Add warnings for clamped explicit values, under-provisioned hosts, and available-memory-starved auto resolution.
  • Split parsed ZakuraBlockSyncConfig from runtime ResolvedZakuraBlockSyncConfig so MemoryLimit::Auto does not reach the block-sync runtime/ByteBudget path.
  • Thread the resolved config through zebrad startup, peer-set initialization, Zakura endpoint setup, service registration, block-sync startup, testkit, and tests.
  • Update changelogs/config fixture for the new default and sysinfo dependency.

Tests

Passed:

cargo fmt --all -- --check
cargo test -p zebra-network provisioning_tests --locked
cargo check -p zebra-network --locked --features zakura-testkit
git diff --check
git diff --cached --check

Attempted but blocked before test execution:

cargo test -p zebrad memory_probe --locked

This failed while compiling librocksdb-sys v0.16.0+8.10.0 C++ sources with errors such as uint64_t has not been declared, before the memory_probe tests ran. The failure appears to be a local/toolchain RocksDB build issue rather than a failure in these tests.

Specifications & References

  • Implements MEMORY_PROVISIONING_PLAN.md Layer 0: derive and validate the block-sync in-flight memory ceiling from cgroup-aware system memory at startup.

Follow-up Work

  • Integrate the sibling advertised-block-size oversubscription factor once that plan lands; this PR already accepts a factor parameter and currently passes the standalone default 1.0.

AI Disclosure

  • No AI tools were used in this PR
  • AI tools were used: OpenAI Codex was used for implementation, review follow-up fixes, and PR description drafting.

PR Checklist

  • The PR title follows conventional commits format: type(scope): description
  • The PR follows the contribution guidelines.
  • This change was discussed in an issue or with the team beforehand.
  • The solution is tested.
  • The documentation and changelogs are up to date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant