test(synthesis): pin the real codegen waste — redundant const, not dead stores (VCR-RA-001)#248
Closed
avrabe wants to merge 1 commit into
Closed
test(synthesis): pin the real codegen waste — redundant const, not dead stores (VCR-RA-001)#248avrabe wants to merge 1 commit into
avrabe wants to merge 1 commit into
Conversation
…ad stores (VCR-RA-001) Evidence-driven: measured on REAL selector output, (p & 0x7e) + (p & 0x7e) lowers to `movw r1,#126; and r2,r0,r1; movw r3,#126; and r4,r0,r3; ...` — the selector RE-MATERIALIZES 0x7e into a fresh register while the first copy is still live. So on the shape gale measures (flat_flight's repeated clamps), the dominant waste is REDUNDANT MATERIALIZATION (const-CSE territory), NOT dead stores: analyze_function reports dead_defs=0, redundant_consts=1. This redirects the transform priority — the dead-store pass (#246) is correct but a no-op on this waste; the delta-producing transform is const-CSE (drop the redundant movw, rewrite the consumer to the resident reg). And 0x7e is a valid Thumb-2 AND immediate, so the materialization is itself avoidable via immediate folding — an even larger latent win. The test pins current suboptimal codegen and is EXPECTED TO FLIP when const-CSE / immediate folding lands (redundant → 0) — the flip is the signal the optimization works. Measure-before-transform, per the methodology. Part of #242. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Contributor
Author
avrabe
added a commit
that referenced
this pull request
Jun 4, 2026
…r bound (VCR-RA-001) (#250) * test(encoder): pin AND-immediate byte-range encoding + document the fold bound (VCR-RA-001) Investigating immediate folding (the biggest win the #248 evidence pointed to) surfaced an encoder limitation: the `And { Operand2::Imm }` path packs the low 12 bits straight into the `i:imm3:imm8` field WITHOUT ThumbExpandImm (the modified-immediate expansion). For imm <= 0xFF (gale's int8 clamps #0x7e/#0x7f) that is correct — `and r2,r0,#0x7e` encodes to the canonical `00 f0 7e 02`. For imm >= 0x100 the field needs a true rotation/replication pattern that is not implemented, so it would silently encode a different value. This path is currently DEAD (the selector never emits And-imm), so no live bug — but it sets the precondition for the immediate-folding transform: **fold only imm <= 0xFF** until the encoder is hardened to ThumbExpandImm / Ok-or-Err (the "encoder must be Ok-or-Err, never silently wrong" principle, #180/#185). That bound covers the measured flat_flight waste. Pins the safe-range encoding as a regression guard; no codegen change. Part of #242. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(selector): fold small constants into i32.and immediates (VCR-RA-001) The FIRST delta-emitting codegen transform on the allocator track. Evidence (#248) showed the dominant flat_flight-shape waste is redundant const materialization: `i32.const C; i32.and` lowered to `movw rN,#C; and rD,rA,rN` even when C is a small constant the AND instruction can take as an immediate. Fix: when the operand pushed immediately before `i32.and` is `i32.const C` with C in 0..=0xFF AND its `movw` is cleanly at the instruction tail (not spilled), fold to `and rD, rA, #C` and drop the materialization (foldable_bitwise_imm + drop_prev_const_materialization, mirroring the const-divisor pattern). Bounded to 0..=0xFF: the encoder's AND-immediate path is not yet ThumbExpandImm- complete (#249) — larger modified immediates await an encoder hardening to Ok-or-Err. 0..=0xFF covers gale's int8 clamps. Measured delta: the (p & 0x7e) + (p & 0x7e) pattern drops 8 → 6 instructions (both `movw #126` eliminated; each AND uses the immediate). Better than const-CSE — no materialization at all. GATE (full, codegen change): 282 lib tests + 20 integration suites pass; the three frozen differential fixtures stay RESULT-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); tests i32_and_folds_small_const_into_immediate (folded shape) + i32_and_does_not_fold_out_of_range_const (0x140 stays a register operand — the encoder safety bound). CI fuzz adds totality. Part of #242. Supersedes the #248 evidence (the redundancy it pinned is now folded away). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Evidence-driven measurement that redirects the transform priority.
On real selector output,
(p & 0x7e) + (p & 0x7e)lowers to:The selector re-materializes 0x7e into a fresh register while the first copy is still live →
analyze_function: dead_defs=0, redundant_consts=1.What this tells us
movw, rewrite the consumer to the resident reg) — exactly whatredundant_const_defsdetects.ANDimmediate →and r2,r0,#126needs no materialization at all (immediate folding — an even larger latent win).The test pins current suboptimal codegen and is expected to flip when const-CSE / immediate folding lands (redundant → 0) — the flip is the signal the optimization works. Measure-before-transform.
Part of #242.
🤖 Generated with Claude Code