feat: allocator driver + shadow-allocation pass — measures the wiring need on real benches (#209/#242)#280
Merged
Merged
Conversation
…e wiring need on real benches (#209/#242) Assemble the allocator entry point and run it (measure-only) on gale's real benches to break the "next is the wiring" stalemate with running, measuring code. - `liveness::allocate_function(instrs, k, precolored) -> AllocationOutcome` (Allocated{coloring, remat_opportunities} | NeedsSpill(set) | Declined): interference graph → k-colouring with reserved regs precoloured → result, plus the #209 const-CSE/rematerialization headroom count. Pure; the call the virtual-register wiring will make. - arm_backend SHADOW pass behind `SYNTH_SHADOW_ALLOC=1` (default-off, eprintln-only, zero codegen impact): runs the allocator on every real function and logs whether it colours within R0-R8 and the remat headroom. CONCRETE FINDING (on-bench, the justification for virtual-register output): running the allocator on the existing PHYSICAL-register stream reports spurious spills (flat_flight "would spill R1") and 0 remat opportunities — because the greedy selector already overloaded each physical register: R1 is one interference node conflicting with every value it was ever reused for, and each redundant const shows as "not redundant" since its register was already clobbered for reuse. Both artifacts PROVE the allocator is blind until the selector emits VIRTUAL registers (one node per value). The shadow pass will quantify the real win the moment virtual-register output lands. Safe: shadow pass off by default → all fixtures bit-identical (control_step 0x00210A55 / flight_seam 0x07FDF307 / div_const 338/338 verified). 316 lib tests (6 new: driver + clamp modeling); clippy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e spurious (9 fits in 9) (#209/#242) Add straight_line_peak_pressure / function_peak_pressure: the max number of distinct VALUES (def-to-last-use ranges) live at once — the true register requirement, vs the physical-register count the greedy selector inflates by reusing one register for many values. Wired into the shadow report. CONCRETE RESULT on flat_flight (SYNTH_SHADOW_ALLOC=1): physical-graph would spill {R1}, but peak value-pressure is 9 (<=9 => spurious; fits once virtually allocated) i.e. flat_flight's true register need is exactly 9 = the R0-R8 pool, so it fits with ZERO spills under virtual-register allocation — gale's 17 greedy spills are almost entirely spurious, eliminable by the allocator. A measured projection of the allocator win (≈17 str/ldr removed) on top of the const-CSE headroom, not a guess. Unwired analysis; shadow pass off by default → fixtures bit-identical (control_step 0x00210A55 / flight_seam 0x07FDF307 / div_const 338/338 verified). 2 new pressure tests (counts values not pregs; reuse-invariant). clippy/fmt clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…280) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cation consumes (#209/#242) straight_line_value_ranges(segment) -> Vec<ValueRange{vreg, reg, def, last_use}>: splits each physical register's def-use chains into distinct virtual registers (value ranges). Upgrades the peak-pressure COUNT into the actual per-value ASSIGNMENT — the renaming a re-allocation pass colours and rewrites. Sound for straight-line code (the reaching def of every use is the most recent def, unambiguous); cross-block web merging is the next step. The number of ranges a physical register splits into IS the overloading that inflated the physical interference graph and produced the spurious spill — e.g. R1 splitting into a dozen ranges. Colouring these ranges instead is what removes the spurious spill (the 9-fits-in-9 finding made concrete per-value). Unwired analysis; no codegen change. Test: a reused R1 splits into two distinct vregs with the right def/last-use bounds. clippy/fmt clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Breaking the staring contest: running, measuring allocator code
Instead of "next is the wiring," this assembles the allocator entry point and runs it (measure-only) on gale's real benches — turning the plan into concrete on-bench data.
What
liveness::allocate_function(instrs, k, precolored)→Allocated{coloring, remat_opportunities}|NeedsSpill(set)|Declined. Runs interference graph →k-colouring (reserved regs precoloured) → result, plus the perf: --relocatable direct selector bypasses synth-opt — general codegen optimization (research + stats tracking) #209 const-CSE/rematerialization headroom count. The call the virtual-register wiring will make.arm_backendbehindSYNTH_SHADOW_ALLOC=1(default-off,eprintln-only, zero codegen impact): runs the allocator on every real function and logs whether it colours within R0–R8 and the remat headroom.Concrete finding (the justification for virtual-register output)
Running the allocator on the existing physical-register stream reports spurious spills (
flat_flight"would spill R1") and 0 remat opportunities — because the greedy selector already overloaded each physical register:R1is one interference-graph node conflicting with every value it was ever reused for → looks uncolourable.#0x7eshows as "not redundant" because its register was already clobbered for reuse.Both artifacts prove, on-bench, that the allocator is blind until the selector emits virtual registers (one node per value). This is no longer a guess — it's measured. The shadow pass will quantify the real win the moment virtual-register output lands.
Safety
Shadow pass off by default → all three differential fixtures bit-identical (
control_step0x00210A55,flight_seam0x07FDF307,div_const338/338, verified). 316 lib tests (6 new: driver + clampSelectMove/Selectmodeling); clippy clean.Next
Virtual-register selector output (step 3 of
docs/design/vcr-ra-allocator-wiring.md) — now with a measure-only harness to watch the spurious spills + 0-remat flip to real wins as it lands.🤖 Generated with Claude Code