Use sorted vectors for DWARF address maps — reduce memory overhead#336
Use sorted vectors for DWARF address maps — reduce memory overhead#336lewing wants to merge 3 commits intodotnet/mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces peak memory usage in DWARF debug processing by replacing address-to-expression/function std::unordered_map lookups with a compact sorted-vector map (SortedMap) that is built once and then queried read-only.
Changes:
- Introduces
SortedMap<K, V>(sortedstd::vector+ binary search) for address lookups. - Replaces
AddrExprMapandFuncAddrMapinternalstd::unordered_mapstorage withSortedMap. - Adds up-front reservation and a post-build
sort()step for the new maps.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Replace std::unordered_map with SortedMap (sorted vector + binary search) for AddrExprMap and FuncAddrMap. These maps are built once during construction and only used for read-only lookups, making them ideal for this pattern. - std::unordered_map: ~64 bytes/entry (hash buckets, linked list nodes) - SortedMap (sorted vector): ~16 bytes/entry (contiguous, cache-friendly) SortedMap tracks a finalized flag to assert lookups aren't performed before sort(). Duplicate keys are de-duplicated after sorting (keeps first) to handle cases like FuncAddrMap where start == declarations. DelimiterLocations reservation now counts actual non-zero entries rather than just the number of delimiter location arrays. Output is byte-for-byte identical, confirming functional correctness. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3b37476 to
11d339b
Compare
lewing
left a comment
There was a problem hiding this comment.
Addressed all three review comments:
-
delimCount under-reservation — Now iterates each
DelimiterLocationsentry and counts actual non-zero offsets instead of just usingdelimiterLocations.size(). -
Duplicate keys after sort() —
sort()now de-duplicates adjacent equal keys usingstd::unique(keeps first). This handles legitimate duplicates inFuncAddrMapwherefuncLocation.start == funcLocation.declarationsfor some functions.
There was a problem hiding this comment.
Pull request overview
This PR reduces memory usage when processing wasm binaries with DWARF debug info by replacing per-address std::unordered_map lookups with a sorted std::vector-backed map and binary search, targeting OOM issues in large debug symbol workloads.
Changes:
- Introduce a
SortedMap<K, V>helper (vector + sort + binary search) for read-only address lookups. - Replace
AddrExprMapandFuncAddrMapinternalstd::unordered_mapstructures withSortedMap, including pre-reservation and finalization (sort()). - Update lookup code paths to use the new
find()API and pointer-based returns.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- Remove SortedMap::count() — no callers remain after removing pre-sort assertions, and it could be misused during build phase. - Document that duplicate keys (e.g. FuncAddrMap start==declarations) always map to the same value, so de-dup order is irrelevant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR reduces peak memory usage when processing DWARF debug info by replacing address→node/function hash maps with a sorted-vector-based lookup structure in wasm-debug.cpp, leveraging the fact that these maps are constructed once and then queried read-only during DWARF rewriting.
Changes:
- Introduce a
SortedMap(sortedstd::vector+ binary search) for address-keyed lookups. - Replace
std::unordered_mapusages inAddrExprMapandFuncAddrMapwithSortedMap. - Pre-reserve vector capacity and finalize maps via sorting/deduplication after construction.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Add debug-time validation in SortedMap::sort() that duplicate keys map to the same value (assertUniqueValues=true by default). FuncAddrMap passes assertUniqueValues=false because contiguous functions legitimately share boundary addresses (func1.end == func2.start), matching the old unordered_map overwrite behavior. AddrExprMap uses the default (true) to catch debug info issues early. Also adds operator== to DelimiterInfo for the assertion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
lewing
left a comment
There was a problem hiding this comment.
Both remaining comments addressed in 68b89a6:
-
sort() duplicate validation (line 392):
sort()now takesassertUniqueValuesparam (defaulttrue). In debug builds, asserts that duplicate keys map to the same value before de-duplicating. AddrExprMap uses the default strict check; FuncAddrMap passesfalsesince contiguous functions legitimately share boundary addresses (func1.end == func2.start). -
AddrExprMap uniqueness (line 478): The strict assertion (
assertUniqueValues=true) now validates duringsort()thatstartMapandendMapentries have unique keys (or identical values for any duplicates), preserving the same invariant as the old pre-sort assertions.
I am curious, do you have some measurements for this? What is a typical size of the data, how many entries? |
Problem
wasm-opt uses excessive memory when processing wasm files with DWARF debug symbols, causing OOM on .NET CI Helix agents (dotnet/runtime#125244, dotnet/runtime#125233). The
AddrExprMapandFuncAddrMapstructures inwasm-debug.cppusestd::unordered_mapwith ~64 bytes per entry overhead.Fix
Replace
std::unordered_mapwith aSortedMap(sortedstd::vector+ binary search). These maps are built once during construction and only used for read-only lookups, making them ideal for this pattern.Why it works
std::unordered_map: ~64 bytes/entry (hash buckets, linked list nodes, pointer chasing)SortedMap(sorted vector): ~16 bytes/entry (contiguous memory, cache-friendly)Output is byte-for-byte identical, confirming functional correctness.
This is the first (and simplest) change from #335, split out for easier review.