Feature/parallel txs optimise prestate load by Marchhill · Pull Request #11495 · NethermindEth/nethermind

Marchhill · 2026-05-05T13:41:54Z

Fixes Closes Resolves #

Please choose one of the keywords above to refer to the issue this PR solves followed by the issue number (e.g. Fixes #000). If no issue number, remove the line. Also, remove everything marked optional that is not applicable. Remove this note after reading.

Changes

List the changes

Types of changes

What types of changes does your code introduce?

Bugfix (a non-breaking change that fixes an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a change that causes existing functionality not to work as expected)
Optimization
Refactoring
Documentation update
Build-related changes
Other: Description

Testing

Requires testing

Yes
No

If yes, did you write tests?

Yes
No

Notes on testing

Optional. Remove if not applicable.

Documentation

Requires documentation update

Yes
No

If yes, link the PR to the docs update or the issue with the details labeled docs. Remove if not applicable.

Requires explanation in Release Notes

Yes
No

If yes, fill in the details here. Remove if not applicable.

Remarks

Optional. Remove if not applicable.

claude · 2026-05-05T13:42:24Z

Claude finished @Marchhill's task in 8m 21s —— View job

PR Review: Feature/parallel-txs-optimise-prestate-load

Read repository rules and guidelines
Analyze PR diff and changed files
Review for correctness, edge cases, and regressions
Review for security and performance
Post findings

Summary

This PR defers per-account prestate loading from PrepareForProcessing into slot 0 of the parallel execution loop, allowing tx workers (slots 1..N) to begin executing transactions while the loader fans out the DB reads account-by-account. A ManualResetEventSlim gate per ReadOnlyAccountChanges blocks workers only when they race ahead of the loader. The int sentinel (-1) for prestate indices is also widened to uint.MaxValue via Eip7928Constants.PrestateIndex.

The core concurrency design is sound: Set() provides a release fence and Wait() an acquire fence, so workers observing a set gate are guaranteed to see all mutations made before SignalPrestateLoaded(). The PrestateAwareIndexComparer correctly preserves binary-search invariants across all hot paths (TryGetLastBefore, GetExact, HasExactIndex). The exception-path fallback in LoadPreStateToSuggestedBlockAccessList (signal all remaining gates on throw) prevents deadlocks.

Findings

Medium

No tests for the concurrent gate mechanism [src/Nethermind/Nethermind.State.Test/BlockAccessListBasedWorldStateTests.cs]

The test changes only update the hardcoded sentinel from -1 to PrestateIndex in existing sequential tests. There are no new tests covering:

A worker blocking on WaitForPrestate() until slot 0 signals it
The exception path (all gates signaled on LoadPreStateToSuggestedBlockAccessList throwing)
Re-prepared block idempotency (gates are already set, workers don't re-block)

Given the non-trivial concurrency design, regression protection for these paths would be valuable.

ManualResetEventSlim never disposed [src/Nethermind/Nethermind.Core/BlockAccessLists/ReadOnlyAccountChanges.cs:285]

public void EnablePrestateGate() => _prestateGate ??= new ManualResetEventSlim(false);

ManualResetEventSlim implements IDisposable. While Dispose() is a no-op in practice here (the kernel handle is only allocated if .WaitHandle is accessed, which nothing in this path does), it is still a correctness issue. The slim objects accumulate one-per-account-per-block in the managed heap. For blocks with large BALs this adds GC pressure, and disposing them after FinalizeBlockProcessing would be cleaner. Fix this →

Low

Stale -1 sentinel references in comments (three locations)

ReadOnlyAccountChanges.cs line 25 (class XML doc):

ReadOnlyAccountChanges.cs line 249:

// Prestate is keyed at index -1, smaller than every real change, so we prepend (one realloc)
// to preserve the sorted-by-index invariant.

ReadOnlySlotChanges.cs line 15 (class XML doc):

/// <see cref="Get(int)"/> can binary-search via <see cref="System.MemoryExtensions"/>.

Get(int) → Get(uint) since the method signature was widened. Fix this →%20ReadOnlyAccountChanges.cs%20line%2025%3A%20change%20%22additions%20at%20index%20%3Cc%3E-1%3C%2Fc%3E%22%20to%20reference%20Eip7928Constants.PrestateIndex%20(uint.MaxValue).%202)%20ReadOnlyAccountChanges.cs%20line%20249%3A%20update%20%22Prestate%20is%20keyed%20at%20index%20-1%22%20to%20explain%20PrestateIndex%20sentinel%20is%20uint.MaxValue%20sorted%20first%20by%20PrestateAwareIndexComparer.%203)%20ReadOnlySlotChanges.cs%20line%2015%3A%20change%20cref%20from%20Get(int)%20to%20Get(uint).)

IndexedChangeDecoder.Encode silently truncates Index to ushort [src/Nethermind/Nethermind.Serialization.Rlp/Eip7928/IndexedChangeDecoder.cs:40]

stream.Encode((ushort)item.Index);

The comment correctly documents the invariant ("Real indices bounded by ushort.MaxValue, PrestateIndex never encoded"), but a silent truncation would produce a wrong-index encoding if the invariant is ever violated. A Debug.Assert(item.Index <= ushort.MaxValue || item.Index == Eip7928Constants.PrestateIndex) would catch this at development time.

WaitForPrestate() is public with no deadlock documentation [src/Nethermind/Nethermind.Core/BlockAccessLists/ReadOnlyAccountChanges.cs:289]

The method is safe to call from workers (slots 1..N) and from consumers after the parallel block completes, but calling it from slot 0 (the loader) before SignalPrestateLoaded() would deadlock. The existing gate comment explains the overall protocol but doesn't warn callers of the loader-must-not-call-before-signal constraint.

Unfilled PR description

The PR description is the raw template with no changes listed, no type-of-change checkboxes ticked, and no testing/documentation sections filled. This makes it hard to assess scope and intent during review.

Correctness notes (no action required)

Binary search ordering: PrestateAwareIndexComparer correctly places uint.MaxValue before all real indices. All affected binary-search paths (GetExact, TryGetLastBefore, HasExactIndex in both ReadOnlyAccountChanges and ReadOnlySlotChanges) use IndexKey<T> which delegates to this comparer — invariant preserved ✓
Wire format: Decoded indices come from ctx.DecodeUShort() → max 65535 != uint.MaxValue, so PrestateIndex can never appear in decoded data ✓
Sequential path: WaitForPrestate() short-circuits immediately when _prestateGate is null (gate never enabled in non-parallel path) ✓
Exception safety: LoadPreStateToSuggestedBlockAccessList signals all gates in its catch block before re-throwing, so workers never deadlock on loader failure ✓
_lastLoadedBal double-prepare guard: moved correctly into LoadPreStateToSuggestedBlockAccessList; gates are re-enabled idempotently via ??= on re-prepare ✓
Memory visibility: ManualResetEventSlim.Set() is a release fence; Wait() is an acquire fence — mutations to _balanceChanges, ExistedBeforeBlock, etc. in slot 0 are visible to workers after Wait() returns ✓

claude · 2026-05-05T13:50:26Z


    /// <summary>Storage value as visible at the start of <paramref name="blockAccessIndex"/> (i.e. last change strictly before the index).</summary>
-    public byte[] Get(int blockAccessIndex)
+    public byte[] Get(uint blockAccessIndex)


The class XML doc (line 15) still references <see cref="Get(int)"/> — the method signature was widened to uint in this PR. The cref will produce a broken XML doc link.

Suggested change

public byte[] Get(uint blockAccessIndex)

public byte[] Get(uint blockAccessIndex)

(The fix is in the class-level <summary> — change Get(int) → Get(uint).)

claude · 2026-05-05T13:50:34Z

+        // Real indices are bounded by Eip7928Constants.MaxTxs == ushort.MaxValue, and the
+        // PrestateIndex sentinel is never encoded (it is only added to the suggested BAL on
+        // the receiving side via LoadPreStateToSuggestedBlockAccessList).
+        stream.Encode((ushort)item.Index);


Silent truncation if Index is in (ushort.MaxValue, uint.MaxValue). The invariant holds by construction today (all real indices come from ctx.DecodeUShort()), but a Debug.Assert would make violations loud:

Suggested change

stream.Encode((ushort)item.Index);

Debug.Assert(item.Index <= ushort.MaxValue || item.Index == Eip7928Constants.PrestateIndex,

"PrestateIndex is never encoded; real indices must fit in ushort");

stream.Encode((ushort)item.Index);

claude · 2026-05-05T13:50:42Z

+    // block on a serial pre-pass, PrepareForProcessing flips the gate on each account, slot 0
+    // of the parallel loop loads prestate per-account and signals as soon as the account is
+    // done, and worker reads above wait per-account. Idempotent: an already-set gate stays set
+    // so a re-prepared block (already loaded) doesn't block workers.


ManualResetEventSlim implements IDisposable. While Dispose() is effectively a no-op here (the kernel handle is only allocated if .WaitHandle is accessed, which nothing in this path does), the objects accumulate — one per account per block — in the managed heap. Disposing them after the parallel block completes (FinalizeBlockProcessing or equivalent) would reduce GC pressure and follow the IDisposable contract.

…mise-prestate-load

Marchhill added 2 commits May 5, 2026 11:18

sentinel change

8da7cfb

initial impl

971d754

github-actions Bot added the state+storage label May 5, 2026

use task

8f55a3d

claude Bot reviewed May 5, 2026

View reviewed changes

Marchhill added 2 commits May 6, 2026 15:21

pre-execution in parallel thread

f5cc015

Merge branch 'bal-devnet-3-benchmarks' into feature/parallel-txs-opti…

2444605

…mise-prestate-load

Marchhill merged commit 2444605 into bal-devnet-3-benchmarks May 6, 2026
379 of 510 checks passed

Marchhill deleted the feature/parallel-txs-optimise-prestate-load branch May 6, 2026 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/parallel txs optimise prestate load#11495

Feature/parallel txs optimise prestate load#11495
Marchhill merged 5 commits intobal-devnet-3-benchmarksfrom
feature/parallel-txs-optimise-prestate-load

Marchhill commented May 5, 2026

Uh oh!

claude Bot commented May 5, 2026 •

edited

Loading

Uh oh!

claude Bot May 5, 2026

Uh oh!

claude Bot May 5, 2026

Uh oh!

claude Bot May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	public byte[] Get(uint blockAccessIndex)
	public byte[] Get(uint blockAccessIndex)

Conversation

Marchhill commented May 5, 2026

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

If yes, did you write tests?

Notes on testing

Documentation

Requires documentation update

Requires explanation in Release Notes

Remarks

Uh oh!

claude Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Feature/parallel-txs-optimise-prestate-load

Summary

Findings

Medium

Low

Correctness notes (no action required)

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 5, 2026 •

edited

Loading