Skip to content

perf(sync-service): overlap WASM compilation with Aura proof downloads#3225

Closed
replghost wants to merge 2 commits into
mainfrom
worktree-perf-overlap-wasm-compile
Closed

perf(sync-service): overlap WASM compilation with Aura proof downloads#3225
replghost wants to merge 2 commits into
mainfrom
worktree-perf-overlap-wasm-compile

Conversation

@replghost

@replghost replghost commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Summary

During parachain bootstrap, the three P2P requests (:code/:heappages storage proof, AuraApi_slot_duration call proof, AuraApi_authorities call proof) were fetched sequentially. WASM compilation (~200-300ms) also blocked the Aura proof downloads even though the downloads don't depend on the compiled VM.

This PR fires all three requests in parallel using future::join3. The storage proof branch bundles download + decode + WASM compilation, so compilation overlaps with the Aura proof network fetches. After all three complete, the Aura runtime calls are executed sequentially against the compiled VM.

Note: join3 does not cancel on first failure — if the storage proof fails, the two Aura requests still run to completion (up to 16s timeout). This is acceptable for a bootstrap path that retries in a 5s loop; the wasted requests are against a peer already known to be bad and the cost is negligible compared to the retry delay.

Changes

  • Refactored bootstrap_parachain_consensus to use future::join3 for concurrent P2P requests
  • Each async branch captures its own clones of peer_id and network_service
  • All error paths preserved identically

Benchmark (smolbench, 3 runs, cold start, debug build)

Polkadot (avg of 3 runs)

Chain Baseline Branch Change
Asset Hub 20,224ms 15,111ms -25%
Bridge Hub 34,941ms 15,199ms -57%
Collectives 21,984ms 14,366ms -35%
Coretime 21,231ms 14,501ms -32%
People 21,223ms 15,026ms -29%

Paseo (avg of 3 runs)

Chain Baseline Branch Change
Asset Hub 12,808ms 15,608ms noisy
Bulletin 11,032ms 13,547ms noisy
People 10,735ms 12,962ms noisy

Paseo results are network-variance dominated (testnet, fewer peers). Polkadot with its more stable peer population shows the improvement clearly.

Repro

cd wasm-node/javascript
npm install && node prepare.mjs --debug && rm -rf dist && npm run buildModules
node demo/smolbench.mjs --networks paseo,polkadot --runs 3 --init-only --timeout 300

Closes #3220
Closes #3219

Fire all three P2P requests (storage proof, slot_duration call proof,
authorities call proof) in parallel using future::join3. The storage
proof branch includes decoding and WASM compilation, which now overlaps
with the Aura proof network fetches. After all three complete, the Aura
calls are executed sequentially against the compiled VM.

On cold start this saves 1-2 seconds because WASM compilation (~200-300ms)
no longer blocks the Aura proof downloads. The two Aura downloads also
run in parallel with each other, saving an additional round-trip.
join3 waits for all three futures even when one has already failed.
try_join3 cancels the remaining requests on first error, avoiding
wasted peer bandwidth and up to 60s of unnecessary timeout delay.
@lexnv

lexnv commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Benchmark (smolbench, 3 runs, cold start, debug build)

Do we have a way of testing with release builds as well?

Similarly, is the downloading of storage_proof_request heavily influenced by targeted peers? Could we deterministically connect with every run to the same set of peers?

@replghost

Copy link
Copy Markdown
Contributor Author

Re-ran with release builds (not debug), 10 runs on Polkadot mainnet:

n p50 mean min max
Baseline (v3.1.1) 6 10,810ms 11,090ms 6,501ms 16,226ms
PR (join3) 10 14,440ms 16,217ms 8,237ms 29,581ms

No improvement in release. WASM compilation is ~200-300ms in release vs ~1.5s in debug, so it's never on the critical path. The original debug-build benchmarks were misleading. Closing.

@replghost replghost closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(sync-service): overlap WASM compilation with Aura proof downloads perf(sync-service): parallelize Aura call proofs during parachain bootstrap

2 participants