Skip to content

test(synthesis): pin the real codegen waste — redundant const, not dead stores (VCR-RA-001)#248

Closed
avrabe wants to merge 1 commit into
mainfrom
test/vcr-redundancy-evidence
Closed

test(synthesis): pin the real codegen waste — redundant const, not dead stores (VCR-RA-001)#248
avrabe wants to merge 1 commit into
mainfrom
test/vcr-redundancy-evidence

Conversation

@avrabe

@avrabe avrabe commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Evidence-driven measurement that redirects the transform priority.

On real selector output, (p & 0x7e) + (p & 0x7e) lowers to:

movw r1,#126 ; and r2,r0,r1 ; movw r3,#126 ; and r4,r0,r3 ; add ...

The selector re-materializes 0x7e into a fresh register while the first copy is still live → analyze_function: dead_defs=0, redundant_consts=1.

What this tells us

  • The dead-store pass (feat(synthesis): dead-store elimination pass (VCR-RA-001, ready-to-wire transform) #246) is a no-op on the real waste — DCE is not the lever.
  • The delta-producing transform is const-CSE (drop the redundant movw, rewrite the consumer to the resident reg) — exactly what redundant_const_defs detects.
  • Bonus: 0x7e is a valid Thumb-2 AND immediate → and r2,r0,#126 needs no materialization at all (immediate folding — an even larger latent win).

The test pins current suboptimal codegen and is expected to flip when const-CSE / immediate folding lands (redundant → 0) — the flip is the signal the optimization works. Measure-before-transform.

Part of #242.

🤖 Generated with Claude Code

…ad stores (VCR-RA-001)

Evidence-driven: measured on REAL selector output, (p & 0x7e) + (p & 0x7e) lowers
to `movw r1,#126; and r2,r0,r1; movw r3,#126; and r4,r0,r3; ...` — the selector
RE-MATERIALIZES 0x7e into a fresh register while the first copy is still live.

So on the shape gale measures (flat_flight's repeated clamps), the dominant waste
is REDUNDANT MATERIALIZATION (const-CSE territory), NOT dead stores: analyze_function
reports dead_defs=0, redundant_consts=1. This redirects the transform priority —
the dead-store pass (#246) is correct but a no-op on this waste; the delta-producing
transform is const-CSE (drop the redundant movw, rewrite the consumer to the resident
reg). And 0x7e is a valid Thumb-2 AND immediate, so the materialization is itself
avoidable via immediate folding — an even larger latent win.

The test pins current suboptimal codegen and is EXPECTED TO FLIP when const-CSE /
immediate folding lands (redundant → 0) — the flip is the signal the optimization
works. Measure-before-transform, per the methodology.

Part of #242.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@avrabe

avrabe commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #250 — the redundant-const waste this test pinned is now folded away by i32.and immediate folding (the 'expected to flip' outcome). #250 carries the measured 8→6 delta + the encoder safety bound.

@avrabe avrabe closed this Jun 4, 2026
@avrabe avrabe deleted the test/vcr-redundancy-evidence branch June 4, 2026 20:27
avrabe added a commit that referenced this pull request Jun 4, 2026
…r bound (VCR-RA-001) (#250)

* test(encoder): pin AND-immediate byte-range encoding + document the fold bound (VCR-RA-001)

Investigating immediate folding (the biggest win the #248 evidence pointed to)
surfaced an encoder limitation: the `And { Operand2::Imm }` path packs the low
12 bits straight into the `i:imm3:imm8` field WITHOUT ThumbExpandImm (the
modified-immediate expansion). For imm <= 0xFF (gale's int8 clamps #0x7e/#0x7f)
that is correct — `and r2,r0,#0x7e` encodes to the canonical `00 f0 7e 02`. For
imm >= 0x100 the field needs a true rotation/replication pattern that is not
implemented, so it would silently encode a different value.

This path is currently DEAD (the selector never emits And-imm), so no live bug —
but it sets the precondition for the immediate-folding transform: **fold only
imm <= 0xFF** until the encoder is hardened to ThumbExpandImm / Ok-or-Err (the
"encoder must be Ok-or-Err, never silently wrong" principle, #180/#185). That
bound covers the measured flat_flight waste.

Pins the safe-range encoding as a regression guard; no codegen change.

Part of #242.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(selector): fold small constants into i32.and immediates (VCR-RA-001)

The FIRST delta-emitting codegen transform on the allocator track. Evidence
(#248) showed the dominant flat_flight-shape waste is redundant const
materialization: `i32.const C; i32.and` lowered to `movw rN,#C; and rD,rA,rN`
even when C is a small constant the AND instruction can take as an immediate.

Fix: when the operand pushed immediately before `i32.and` is `i32.const C` with
C in 0..=0xFF AND its `movw` is cleanly at the instruction tail (not spilled),
fold to `and rD, rA, #C` and drop the materialization (foldable_bitwise_imm +
drop_prev_const_materialization, mirroring the const-divisor pattern).

Bounded to 0..=0xFF: the encoder's AND-immediate path is not yet ThumbExpandImm-
complete (#249) — larger modified immediates await an encoder hardening to
Ok-or-Err. 0..=0xFF covers gale's int8 clamps.

Measured delta: the (p & 0x7e) + (p & 0x7e) pattern drops 8 → 6 instructions
(both `movw #126` eliminated; each AND uses the immediate). Better than const-CSE
— no materialization at all.

GATE (full, codegen change): 282 lib tests + 20 integration suites pass; the
three frozen differential fixtures stay RESULT-identical (control_step
0x00210A55, flight_seam 0x07FDF307, div_const 338/338); tests
i32_and_folds_small_const_into_immediate (folded shape) +
i32_and_does_not_fold_out_of_range_const (0x140 stays a register operand — the
encoder safety bound). CI fuzz adds totality.

Part of #242. Supersedes the #248 evidence (the redundancy it pinned is now folded away).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant