Skip to content

Feature/parallel txs optimise prestate load#11495

Merged
Marchhill merged 5 commits intobal-devnet-3-benchmarksfrom
feature/parallel-txs-optimise-prestate-load
May 6, 2026
Merged

Feature/parallel txs optimise prestate load#11495
Marchhill merged 5 commits intobal-devnet-3-benchmarksfrom
feature/parallel-txs-optimise-prestate-load

Conversation

@Marchhill
Copy link
Copy Markdown
Contributor

Fixes Closes Resolves #

Please choose one of the keywords above to refer to the issue this PR solves followed by the issue number (e.g. Fixes #000). If no issue number, remove the line. Also, remove everything marked optional that is not applicable. Remove this note after reading.

Changes

  • List the changes

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Optional. Remove if not applicable.

Documentation

Requires documentation update

  • Yes
  • No

If yes, link the PR to the docs update or the issue with the details labeled docs. Remove if not applicable.

Requires explanation in Release Notes

  • Yes
  • No

If yes, fill in the details here. Remove if not applicable.

Remarks

Optional. Remove if not applicable.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

Claude finished @Marchhill's task in 8m 21s —— View job


PR Review: Feature/parallel-txs-optimise-prestate-load

  • Read repository rules and guidelines
  • Analyze PR diff and changed files
  • Review for correctness, edge cases, and regressions
  • Review for security and performance
  • Post findings

Summary

This PR defers per-account prestate loading from PrepareForProcessing into slot 0 of the parallel execution loop, allowing tx workers (slots 1..N) to begin executing transactions while the loader fans out the DB reads account-by-account. A ManualResetEventSlim gate per ReadOnlyAccountChanges blocks workers only when they race ahead of the loader. The int sentinel (-1) for prestate indices is also widened to uint.MaxValue via Eip7928Constants.PrestateIndex.

The core concurrency design is sound: Set() provides a release fence and Wait() an acquire fence, so workers observing a set gate are guaranteed to see all mutations made before SignalPrestateLoaded(). The PrestateAwareIndexComparer correctly preserves binary-search invariants across all hot paths (TryGetLastBefore, GetExact, HasExactIndex). The exception-path fallback in LoadPreStateToSuggestedBlockAccessList (signal all remaining gates on throw) prevents deadlocks.


Findings

Medium

No tests for the concurrent gate mechanism [src/Nethermind/Nethermind.State.Test/BlockAccessListBasedWorldStateTests.cs]

The test changes only update the hardcoded sentinel from -1 to PrestateIndex in existing sequential tests. There are no new tests covering:

  • A worker blocking on WaitForPrestate() until slot 0 signals it
  • The exception path (all gates signaled on LoadPreStateToSuggestedBlockAccessList throwing)
  • Re-prepared block idempotency (gates are already set, workers don't re-block)

Given the non-trivial concurrency design, regression protection for these paths would be valuable.


ManualResetEventSlim never disposed [src/Nethermind/Nethermind.Core/BlockAccessLists/ReadOnlyAccountChanges.cs:285]

public void EnablePrestateGate() => _prestateGate ??= new ManualResetEventSlim(false);

ManualResetEventSlim implements IDisposable. While Dispose() is a no-op in practice here (the kernel handle is only allocated if .WaitHandle is accessed, which nothing in this path does), it is still a correctness issue. The slim objects accumulate one-per-account-per-block in the managed heap. For blocks with large BALs this adds GC pressure, and disposing them after FinalizeBlockProcessing would be cleaner. Fix this →


Low

Stale -1 sentinel references in comments (three locations)

ReadOnlyAccountChanges.cs line 25 (class XML doc):

ReadOnlyAccountChanges.cs line 249:

// Prestate is keyed at index -1, smaller than every real change, so we prepend (one realloc)
// to preserve the sorted-by-index invariant.

ReadOnlySlotChanges.cs line 15 (class XML doc):

/// <see cref="Get(int)"/> can binary-search via <see cref="System.MemoryExtensions"/>.

Get(int)Get(uint) since the method signature was widened. Fix this →%20ReadOnlyAccountChanges.cs%20line%2025%3A%20change%20%22additions%20at%20index%20%3Cc%3E-1%3C%2Fc%3E%22%20to%20reference%20Eip7928Constants.PrestateIndex%20(uint.MaxValue).%202)%20ReadOnlyAccountChanges.cs%20line%20249%3A%20update%20%22Prestate%20is%20keyed%20at%20index%20-1%22%20to%20explain%20PrestateIndex%20sentinel%20is%20uint.MaxValue%20sorted%20first%20by%20PrestateAwareIndexComparer.%203)%20ReadOnlySlotChanges.cs%20line%2015%3A%20change%20cref%20from%20Get(int)%20to%20Get(uint).)


IndexedChangeDecoder.Encode silently truncates Index to ushort [src/Nethermind/Nethermind.Serialization.Rlp/Eip7928/IndexedChangeDecoder.cs:40]

stream.Encode((ushort)item.Index);

The comment correctly documents the invariant ("Real indices bounded by ushort.MaxValue, PrestateIndex never encoded"), but a silent truncation would produce a wrong-index encoding if the invariant is ever violated. A Debug.Assert(item.Index <= ushort.MaxValue || item.Index == Eip7928Constants.PrestateIndex) would catch this at development time.


WaitForPrestate() is public with no deadlock documentation [src/Nethermind/Nethermind.Core/BlockAccessLists/ReadOnlyAccountChanges.cs:289]

The method is safe to call from workers (slots 1..N) and from consumers after the parallel block completes, but calling it from slot 0 (the loader) before SignalPrestateLoaded() would deadlock. The existing gate comment explains the overall protocol but doesn't warn callers of the loader-must-not-call-before-signal constraint.


Unfilled PR description

The PR description is the raw template with no changes listed, no type-of-change checkboxes ticked, and no testing/documentation sections filled. This makes it hard to assess scope and intent during review.


Correctness notes (no action required)

  • Binary search ordering: PrestateAwareIndexComparer correctly places uint.MaxValue before all real indices. All affected binary-search paths (GetExact, TryGetLastBefore, HasExactIndex in both ReadOnlyAccountChanges and ReadOnlySlotChanges) use IndexKey<T> which delegates to this comparer — invariant preserved ✓
  • Wire format: Decoded indices come from ctx.DecodeUShort() → max 65535 != uint.MaxValue, so PrestateIndex can never appear in decoded data ✓
  • Sequential path: WaitForPrestate() short-circuits immediately when _prestateGate is null (gate never enabled in non-parallel path) ✓
  • Exception safety: LoadPreStateToSuggestedBlockAccessList signals all gates in its catch block before re-throwing, so workers never deadlock on loader failure ✓
  • _lastLoadedBal double-prepare guard: moved correctly into LoadPreStateToSuggestedBlockAccessList; gates are re-enabled idempotently via ??= on re-prepare ✓
  • Memory visibility: ManualResetEventSlim.Set() is a release fence; Wait() is an acquire fence — mutations to _balanceChanges, ExistedBeforeBlock, etc. in slot 0 are visible to workers after Wait() returns ✓


/// <summary>Storage value as visible at the start of <paramref name="blockAccessIndex"/> (i.e. last change strictly before the index).</summary>
public byte[] Get(int blockAccessIndex)
public byte[] Get(uint blockAccessIndex)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class XML doc (line 15) still references <see cref="Get(int)"/> — the method signature was widened to uint in this PR. The cref will produce a broken XML doc link.

Suggested change
public byte[] Get(uint blockAccessIndex)
public byte[] Get(uint blockAccessIndex)

(The fix is in the class-level <summary> — change Get(int)Get(uint).)

// Real indices are bounded by Eip7928Constants.MaxTxs == ushort.MaxValue, and the
// PrestateIndex sentinel is never encoded (it is only added to the suggested BAL on
// the receiving side via LoadPreStateToSuggestedBlockAccessList).
stream.Encode((ushort)item.Index);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent truncation if Index is in (ushort.MaxValue, uint.MaxValue). The invariant holds by construction today (all real indices come from ctx.DecodeUShort()), but a Debug.Assert would make violations loud:

Suggested change
stream.Encode((ushort)item.Index);
Debug.Assert(item.Index <= ushort.MaxValue || item.Index == Eip7928Constants.PrestateIndex,
"PrestateIndex is never encoded; real indices must fit in ushort");
stream.Encode((ushort)item.Index);

// block on a serial pre-pass, PrepareForProcessing flips the gate on each account, slot 0
// of the parallel loop loads prestate per-account and signals as soon as the account is
// done, and worker reads above wait per-account. Idempotent: an already-set gate stays set
// so a re-prepared block (already loaded) doesn't block workers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManualResetEventSlim implements IDisposable. While Dispose() is effectively a no-op here (the kernel handle is only allocated if .WaitHandle is accessed, which nothing in this path does), the objects accumulate — one per account per block — in the managed heap. Disposing them after the parallel block completes (FinalizeBlockProcessing or equivalent) would reduce GC pressure and follow the IDisposable contract.

@Marchhill Marchhill merged commit 2444605 into bal-devnet-3-benchmarks May 6, 2026
379 of 510 checks passed
@Marchhill Marchhill deleted the feature/parallel-txs-optimise-prestate-load branch May 6, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant