Skip to content

Parallel transaction execution with BALs#9182

Merged
Marchhill merged 840 commits into
masterfrom
feature/parallel-txs
Apr 27, 2026
Merged

Parallel transaction execution with BALs#9182
Marchhill merged 840 commits into
masterfrom
feature/parallel-txs

Conversation

@Marchhill
Copy link
Copy Markdown
Contributor

@Marchhill Marchhill commented Aug 20, 2025

Depends on BALs #9114

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

@Marchhill Marchhill mentioned this pull request Aug 21, 2025
16 tasks
@Marchhill Marchhill mentioned this pull request Oct 7, 2025
16 tasks
@Marchhill Marchhill force-pushed the feature/parallel-txs branch from a84593b to 692e22b Compare November 13, 2025 14:12
@Marchhill Marchhill changed the base branch from feature/block-level-access-lists to bal-devnet-1 December 16, 2025 16:33
@Marchhill Marchhill changed the base branch from bal-devnet-1 to bal-devnet-2-bals-only January 30, 2026 11:24
@flcl42 flcl42 force-pushed the bal-devnet-2-bals-only branch 6 times, most recently from 76a8434 to de48b8c Compare February 24, 2026 10:39
@Marchhill Marchhill changed the base branch from bal-devnet-2-bals-only to bal-devnet-2 March 2, 2026 19:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 6, 2026

EVM Opcode Benchmark Diff

Aggregated runs: base=1, pr=1

No significant regressions or improvements detected.

Base automatically changed from bal-devnet-2 to master March 9, 2026 17:55
@LukaszRozmej
Copy link
Copy Markdown
Member

@Marchhill let me handle last few open comments

@Marchhill
Copy link
Copy Markdown
Contributor Author

@Marchhill let me handle last few open comments

Thanks but I think all addressed now except for ref readonly, and maybe using composition which we could talk through and make a separate PR. Just need to check nothing broke from the changes

- BlockAccessListManager: hybrid wrap of parallel-worker exceptions —
  InvalidTransactionException re-wrapped (preserving Reason), plain
  InvalidBlockException re-wrapped, others rethrown via ExceptionDispatchInfo
- BlockCachePreWarmer: extract ShouldPreWarm/IsBalReadWarmingEnabled and
  WarmupTransactionAddresses/WarmupBalAccountAt helpers
- Engine module tests: parametrize EngineModuleTests + AuRaMergeEngineModuleTests
  with [TestFixture(true)]/[TestFixture(false)], delete Sequential* derived classes
@benaadams
Copy link
Copy Markdown
Member

@claude re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 27, 2026

Claude finished @benaadams's task in 6m 30s —— View job


Re-Review: Parallel Transaction Execution with BALs

  • Load repository rules
  • Read key changed files
  • Review open issues from previous reviews
  • Check for new issues from April 24–27 commits
  • Post final review

Fixed Since April 20 Re-Review

  • NullReferenceException risk in BlockAccessListBasedWorldState.Get/GetOriginal — null checks now added ✓
  • Exception wrapping via ParallelExecutionException class preserving Reason — addresses the stack-trace concern ✓
  • BlockCachePreWarmerShouldPreWarm(), IsBalReadWarmingEnabled(), WarmupFromBal(), WarmupBalAccount() helpers extracted ✓
  • Test parametrization — EngineModuleTests and AuRaMergeEngineModuleTests both use [TestFixture(true)] / [TestFixture(false)], Sequential* derived classes deleted ✓
  • Enabled=false → ParallelExecutionEnabled=false invariant holds: ParallelExecutionEnabled = Enabled && ... is mathematically impossible to be true when Enabled is false, so LukaszRozmej's query is answered ✓

New Issue

Medium — Thread-pool thread leak when non-InvalidBlockException escapes parallel worker

BlockProcessor.ParallelBlockValidationTransactionsExecutor.cs:102–118

The lambda inside ParallelUnbalancedWork.For catches only InvalidBlockException. If any other exception escapes (NullReferenceException, OperationCanceledException, StackOverflowException etc.), gasResults[txIndex] is never completed. IncrementalValidation is then permanently blocked at the inner gasResults[j].Task.GetAwaiter().GetResult() loop, leaking a thread-pool thread indefinitely. Because ParallelUnbalancedWork.For propagates the original exception on the calling thread, incrementalValidationTask.GetAwaiter().GetResult() is never reached, leaving the stuck task with no cancellation path.

Fix: add a bare catch after catch (InvalidBlockException) that calls TrySetCanceled() and rethrows — see inline suggestion. Fix this →


Still Open — Acknowledged as Future TODOs

High — Enumerable.SequenceEqual + ChangeAtIndex.SlotChanges yield-return iterator

BlockAccessListManager.cs:338 and BlockAccessList.cs:379 (inline comment added)

Both still present. Acknowledged by the author as "same as master + will be resolved in future todo" and benaadams has approved. Tracking here for completeness — the feature defaults to false so the hot path is only exercised when explicitly enabled.

Medium — _lastLoadedBal guard semantics

BlockAccessListManager.cs:63, 89–93_lastLoadedBal = Hash256.Zero on construction; BlockAccessListManager is AddScoped (one per DI scope = one per block, per BlockProcessingModule.cs:67). The cache guard always passes on first call. If PrepareForProcessing can be called multiple times within the same scope (e.g. a retry), this is intentional — but that invariant is not documented. Either add a comment explaining the retry scenario, or remove the guard and call LoadPreStateToSuggestedBlockAccessList unconditionally.

Low / Tracked as TODO

  • N VirtualMachine allocations per block (BlockAccessListManager.cs:487) — // todo: will optimize to allocate ~16 worldstates upfront present
  • GetNonce/GetBalance O(N) linear scan (AccountChanges.cs:271) — // todo: binary search present
  • SlotChanges.Get allocates byte[] per storage read (SlotChanges.cs:67, 73) — [.. tmp.WithoutLeadingZeros()] still allocates per SLOAD in parallel path
  • AccountChanges.Equals/SlotChanges.Equals use LINQ SequenceEqual — not a hot path but violates the no-LINQ rule
  • IBlockAccessListManager split// todo: maybe split into smaller classes still in interface; 12 methods spanning tx processing, BAL tracking, system contracts, gas validation, AuRa preprocessing
  • ref readonly on GetBalance/GetNonce — LukaszRozmej's analysis (April 27 09:48) proposed a List<T> + CollectionsMarshal.AsSpan approach; acknowledged as out of scope for this PR

Summary

The April 27 commit by LukaszRozmej addressed the last round of outstanding comments cleanly. The one new issue is the thread-pool leak when a non-InvalidBlockException escapes a parallel worker (Medium, one-line fix suggested inline). The two hot-path allocation issues remain acknowledged as future work. The PR is close to ready — the thread leak fix is the only unresolved item that should land before merge.

- ParallelBlockValidationTransactionsExecutor now composes an inner
  IBlockProcessor.IBlockTransactionsExecutor instead of inheriting from
  BlockValidationTransactionsExecutor; BAL-disabled path delegates to inner
- BlockProcessingModule wires inner via AddDecorator (matches the
  SimulateBlockValidationTransactionsExecutor pattern)
- ThrowInvalidTransactionException protected -> internal so the decorator
  reuses it instead of duplicating
- Fix thread-pool hang when a non-InvalidBlockException escapes the parallel
  worker: TrySetCanceled the gasResults slot before rethrowing so
  IncrementalValidation unblocks; wrap parallel-for in try/catch that
  observes the background task before propagating the original exception

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LukaszRozmej
Copy link
Copy Markdown
Member

@claude re-review. Everything with TODO's is out of scope of this PR and will be done in next ones

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 27, 2026

Claude finished @LukaszRozmej's task in 6m 9s —— View job


Re-Review: Parallel Transaction Execution with BALs

  • Load repository rules
  • Review recent commits since last review
  • Check thread-pool leak fix status
  • Review LukaszRozmej's recent changes (composition, exception wrapping, test parametrization)
  • Check for new issues
  • Post final review

Fixed Since April 27 Review

  • Thread-pool leak: Fixed ✓ — TrySetCanceled() in the bare catch block unblocks IncrementalValidation, and the outer catch observes the background task before rethrowing the original exception (BlockProcessor.ParallelBlockValidationTransactionsExecutor.cs:117–134)
  • Composition over inheritance: Implemented ✓ — ParallelBlockValidationTransactionsExecutor now takes IBlockProcessor.IBlockTransactionsExecutor inner and delegates via AddDecorator in BlockProcessingModule (matching SimulateBlockValidationTransactionsExecutor pattern)
  • Exception wrapping: Improved ✓ — ParallelExecutionException now extends InvalidTransactionException preserving the Reason field from the underlying InvalidTransactionException (or synthesising a fallback reason for plain InvalidBlockException)
  • Test parametrization: Cleaned up ✓ — [TestFixture(true)]/[TestFixture(false)] on base classes, Sequential* derived classes deleted
  • BlockCachePreWarmer helper methods: Extracted ✓ — ShouldPreWarm, IsBalReadWarmingEnabled, WarmupFromBal, WarmupBalAccount helpers

Minor Observations

1. BranchProcessor.ShouldPreWarmOrBatchRead name is inverted

BranchProcessor.cs:203, 213

private Task? PreWarmTransactions(...) =>
    ShouldPreWarmOrBatchRead(suggestedBlock, spec)
        ? null             // <-- null = skip prewarming
        : preWarmer?.PreWarmCaches(...);

private bool ShouldPreWarmOrBatchRead(Block suggestedBlock, IReleaseSpec spec)
    => suggestedBlock.Transactions.Length < 3 && !(_parallelExecutionBatchRead && ...);

The method returns true when we should skip prewarming (fewer than 3 txs + no BAL batch read). The name implies the opposite — "should prewarm or batch read". Consider renaming to ShouldSkipPreWarming or inverting the ternary.

2. _lastLoadedBal caching intent is undocumented

BlockAccessListManager.cs:63, 89–93 — The _lastLoadedBal guard correctly prevents redundant LoadPreStateToSuggestedBlockAccessList calls when PrepareForProcessing is called multiple times for the same block within the same DI scope (e.g. on retry). However, the comment says only "check last loaded bal to avoid loading prestate again" without explaining when that scenario occurs. Worth adding a one-liner about the retry case so future readers don't remove it as "dead code" (as three prior reviews did).


Acknowledged as Future TODOs (out of scope per trigger comment)

  • Enumerable.SequenceEqual on ChangeAtIndex.SlotChanges (BlockAccessListManager.cs:338)
  • ChangeAtIndex.SlotChanges yield-return iterator allocation per account (BlockAccessList.cs:379)
  • GetNonce/GetBalance O(N) linear scan (AccountChanges.cs:272)
  • N VirtualMachine + TxProcessorWithWorldState allocations per block (BlockAccessListManager.cs:487)
  • IBlockAccessListManager split into smaller interfaces
  • LINQ SequenceEqual in AccountChanges.Equals / SlotChanges.Equals
  • SlotChanges.Get byte[] allocation per SLOAD

Summary

The two most recent commits (20ae9675, 14668967) cleanly address all the open blocking items from the April 27 review. The thread-pool leak fix is correct and complete, composition over inheritance is properly wired through AddDecorator, and the test suite correctly exercises both parallel and sequential paths via [TestFixture(true)]/[TestFixture(false)] parametrization. The two observations above are low-severity naming/documentation nits, not blockers. With parallel execution defaulting to false and both benaadams and LukaszRozmej having approved, the PR is mergeable.

LukaszRozmej and others added 3 commits April 27, 2026 15:22
Distinguish self-induced TaskCanceledException (from worker TrySetCanceled,
pure noise) from any other independent secondary fault in IncrementalValidation
(BAL validator, gas check, etc.) — the latter is logged at Error since the
original parallel-for exception is what we rethrow, so this branch is the only
place the secondary fault is observable.

Adds ILogManager ctor param; test sites pass LimboLogs.Instance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- BranchProcessor.ShouldPreWarmOrBatchRead -> ShouldSkipPreWarming.
  Old name implied "do prewarm/batch-read"; method actually returns true
  when prewarming should be skipped. Comment also rewritten to describe
  the BAL batch-read carve-out for tiny blocks.
- BlockAccessListManager._lastLoadedBal: add field comment explaining the
  guard prevents double-mutation of the suggested block's BAL by
  LoadPreStateToSuggestedBlockAccessList (which appends -1 prestate
  entries in place) when PrepareForProcessing is re-invoked for the same
  block within the same DI scope (manager is scoped to the main
  processing context, not per block). Three prior reviews mistook the
  field for dead code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BranchProcessor was duplicating the BAL read-warming predicate
(_parallelExecutionBatchRead && spec.BlockLevelAccessListsEnabled) that
already lives in BlockCachePreWarmer. Expose IsBalReadWarmingEnabled on
IBlockCachePreWarmer so BranchProcessor can ask rather than re-derive,
and split ShouldSkipPreWarming into two helpers — one for the tx-count
+ BAL carve-out, one delegating the spec/config check to the prewarmer.

Drops the now-unused IBlocksConfig ctor dep and _parallelExecutionBatchRead
field from BranchProcessor; test sites adjusted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LukaszRozmej
Copy link
Copy Markdown
Member

@claude one more review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 27, 2026

Claude finished @LukaszRozmej's task in 7m 6s —— View job


Re-Review: Parallel Transaction Execution with BALs

  • Load repository rules
  • Read key changed files (post-April 27 commits)
  • Check thread-pool leak fix and recent LukaszRozmej changes
  • Review remaining open issues
  • Check for new issues
  • Post final review

Fixed Since April 27 13:07 Review

  • Thread-pool leak: bare catch { TrySetCanceled(); throw; } is in place; outer catch observes the background task and logs independent secondary faults at Error ✓
  • Composition over inheritance: ParallelBlockValidationTransactionsExecutor now takes IBlockProcessor.IBlockTransactionsExecutor inner and delegates via AddDecorator in BlockProcessingModule (StandardBlockValidationModule) ✓
  • ShouldPreWarmOrBatchReadShouldSkipPreWarming: renamed, logic clarified, split into ShouldSkipPreWarming + ShouldBalReadWarm named helpers ✓
  • _lastLoadedBal comment: documents that PrepareForProcessing can be called multiple times in one DI scope on the same block (e.g. retry) — explains why the guard is not dead code ✓
  • BranchProcessor removes duplicated BAL predicate: delegates to preWarmer.IsBalReadWarmingEnabled(spec) (now exposed on IBlockCachePreWarmer) rather than re-deriving internally; drops unused _parallelExecutionBatchRead field ✓
  • SystemContractHandler sealed
  • BlockAccessListSystemContractHandler sealed, directly implements ISystemContractHandler (no intermediate class hierarchy) ✓

Minor Issues

1. BlockValidationTransactionsExecutor.SetBlockAccessListManager is dead code

BlockValidationTransactionsExecutor.cs:30

public virtual void SetBlockAccessListManager(in IBlockAccessListManager balManager) { }

This virtual no-op is never called — BlockProcessor.cs only calls SetBlockExecutionContext on the executor. It is also not on IBlockProcessor.IBlockTransactionsExecutor. The corresponding stub in SimulateBlockValidationTransactionsExecutor.cs:60 throws NotImplementedException. If the method is dead code, both should be deleted; if it has a future use, remove the NotImplementedException stub. Fix this →

2. Restore method inconsistency — balance uses change.BlockAccessIndex, code/nonce/storage use Index

BlockAccessList.cs:323, 332, 341, 351

accountChanges.PopBalanceChange(change.BlockAccessIndex, out _);  // line 323
accountChanges.PopCodeChange(Index, out _);                       // line 332
accountChanges.PopNonceChange(Index, out _);                      // line 341
slotChanges.TryPopStorageChange(Index, out _);                    // line 351

change.BlockAccessIndex and Index are equal during a restore (you only restore changes from the current tx), but the inconsistency is a latent trap for future maintainers. Standardise to change.BlockAccessIndex throughout. Fix this →


Acknowledged as Future TODOs (out of scope per LukaszRozmej)

  • Enumerable.SequenceEqual on ChangeAtIndex.SlotChanges (BlockAccessListManager.cs:344) — yield-return iterator allocations per matched account per ValidateBlockAccessList call
  • ChangeAtIndex.SlotChanges field stores IEnumerable<SlotChanges> from SlotChangesAtIndex(index) (BlockAccessList.cs:379) — iterator state machine allocated per account unconditionally in GetChangesAtIndex
  • N VirtualMachine + TxProcessorWithWorldState allocations per block (BlockAccessListManager.cs:493) — // todo: will optimize to allocate ~16 worldstates upfront
  • GetNonce/GetBalance O(N) linear scan (AccountChanges.cs:271) — // todo: binary search
  • IBlockAccessListManager God object — // todo: maybe split into smaller classes

Summary

All prior blocking items are resolved. The two minor items above (#1 dead code, #2 Restore inconsistency) are clean-up nits that do not affect correctness or safety. The acknowledged performance TODOs are tracked in-code and deferred. With ParallelExecution defaulting to false, the feature is safely gated. The PR is mergeable.

…t.Restore

- Remove BlockValidationTransactionsExecutor.SetBlockAccessListManager
  (virtual no-op, never called, not on the IBlockTransactionsExecutor
  interface) and the corresponding NotImplementedException stub on
  SimulateBlockValidationTransactionsExecutor.
- BlockAccessList.Restore: balance branch was the odd one out using
  change.BlockAccessIndex while code/nonce/storage all use Index. They
  are equal at restore time (snapshot/restore is per-tx, Index doesn't
  change between them), so normalize to Index for consistency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LukaszRozmej
Copy link
Copy Markdown
Member

@claude one more review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 27, 2026

Claude finished @LukaszRozmej's task in 3m 31s —— View job


Re-Review: Parallel Transaction Execution with BALs

  • Load repository rules
  • Read recent commits and changed files
  • Check status of issues from previous review
  • Look for new issues
  • Post final review

Fixed Since April 27 13:40 Review

All two minor items from that review are confirmed fixed:

  • SetBlockAccessListManager dead code — virtual no-op removed from BlockValidationTransactionsExecutor.cs, corresponding NotImplementedException stub removed from SimulateBlockValidationTransactionsExecutor.cs
  • BlockAccessList.Restore inconsistency — balance branch was the odd one out using change.BlockAccessIndex; all four cases now uniformly use Index (correct since snapshot/restore is per-tx and Index doesn't change within a snapshot boundary) ✓

Acknowledged TODOs (out of scope per LukaszRozmej)

These items persist but are explicitly deferred to follow-up PRs:

  • Enumerable.SequenceEqual on ChangeAtIndex.SlotChanges (BlockAccessListManager.cs:344) — two yield-return enumerator allocations per matched account per ValidateBlockAccessList call
  • ChangeAtIndex.SlotChanges stores IEnumerable<SlotChanges> from SlotChangesAtIndex(index) (BlockAccessList.cs:379) — iterator state machine allocated per account unconditionally in GetChangesAtIndex
  • N VirtualMachine + TxProcessorWithWorldState allocations per block (BlockAccessListManager.cs:493)
  • GetNonce/GetBalance O(N) linear scan (AccountChanges.cs:271, 286)
  • IBlockAccessListManager split (IBlockAccessListManager.cs:15)
  • SlotChanges.Get byte[] allocation per SLOAD (SlotChanges.cs:67, 73)
  • AccountChanges.Equals/SlotChanges.Equals using LINQ SequenceEqual

No New Issues Found

This pass found no new correctness, safety, or performance concerns beyond the acknowledged TODOs.


Overall Assessment

All blocking and minor items raised across multiple review rounds are now resolved. The two items fixed in today's commits (SetBlockAccessListManager removal, Restore normalization) were the last outstanding non-deferred findings. With ParallelExecution defaulting to false, the hot-path TODOs are safely gated behind a feature flag. The PR is mergeable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants