Skip to content

perf(fuzz): clone only assigned corpus shard#15106

Merged
0xKarl98 merged 3 commits into
masterfrom
mablr/fuzz_perf_fix
Jun 9, 2026
Merged

perf(fuzz): clone only assigned corpus shard#15106
0xKarl98 merged 3 commits into
masterfrom
mablr/fuzz_perf_fix

Conversation

@mablr

@mablr mablr commented Jun 8, 2026

Copy link
Copy Markdown
Member

Problem

PR #15070 shards the warmed corpus with corpus_seed.clone().for_worker(..). That means each parallel worker first materializes a full cloned seed, then filters it down to the entries assigned to that worker.

Fix

corpus_seed.clone_for_worker(..) borrows the campaign seed and clones only the entries assigned to the target worker. Shared warmed coverage and optimization state are still copied per worker, and corpus-content metrics are recomputed for the shard.

Methodology

End-to-end real invariant campaign: 370-entry persisted corpus, depth-2000 sequences, --invariant-workers 8, 3 interleaved iterations per binary, peak RSS via /usr/bin/time -l.

Three forge binaries built from the parent commit, #15070, and this fix, run on the same persisted corpus. Source: https://gist.github.com/mablr/cb84076c6c0712ce4b2cb11efc39df0b

Results (8 workers, 370 entries)

Variant Wall-clock Peak RSS
Parent (clone()) 66.6–70.4 s 4.0–5.5 GB
#15070 (clone().for_worker) 67.0–68.7 s 5.7–6.7 GB
Fix (clone_for_worker) 64.3–68.7 s 2.79–2.80 GB

Conclusion

Wall-clock was comparable across variants.

Peak RSS is the important metric for the memory issue #15070 targeted.

In this benchmark, #15070 increased peak RSS vs the parent, while this follow-up reduced peak RSS to ~2.8 GB, about 2.4x lower than #15070.

@figtracer

figtracer commented Jun 9, 2026

Copy link
Copy Markdown
Member

I ran a focused local benchmark for the changed sharding path, comparing the old clone().for_worker shape against this PR's clone_for_worker shape with the PR-sized corpus setup: 370 corpus entries, depth 2000, 8 workers. This benchmark directly measures the corpus sharding operation, not the full end-to-end invariant campaign.

mode measured sharding time max RSS
old clone().filter 361 ms 1.77 GB
new clone_for_worker 66 ms 0.47 GB

TLDR: 5.5x faster for the sharding operation and about 3.7x lower process peak RSS in this focused benchmark.

@0xKarl98 0xKarl98 merged commit 3b561f7 into master Jun 9, 2026
19 checks passed
@github-project-automation github-project-automation Bot moved this to Done in Foundry Jun 9, 2026
@0xKarl98 0xKarl98 deleted the mablr/fuzz_perf_fix branch June 9, 2026 13:47
@grandizzy grandizzy restored the mablr/fuzz_perf_fix branch June 9, 2026 13:47
@figtracer figtracer deleted the mablr/fuzz_perf_fix branch June 9, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants