Skip to content

feat(selector): fold constants into i32.add / i32.sub immediates (VCR-RA-001)#254

Merged
avrabe merged 1 commit into
mainfrom
feat/vcr-add-sub-immediate-folding
Jun 4, 2026
Merged

feat(selector): fold constants into i32.add / i32.sub immediates (VCR-RA-001)#254
avrabe merged 1 commit into
mainfrom
feat/vcr-add-sub-immediate-folding

Conversation

@avrabe

@avrabe avrabe commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Completes the arithmetic+bitwise immediate-folding family (and/or/xor #250/#252; now add/sub).

What

When the operand before i32.add/i32.sub is i32.const C (C ∈ 0..=0xFFF) with its movw cleanly at the tail, fold to add/sub rd, a, #C and drop the materialization.

Range is the full 0..=0xFFF (4095) — wider than the bitwise 0..=0xFF — because the ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253), verified correct before folding into it.

Fires on real code

This one actively fires on control_step: its frame sub #16 and const adds now fold, and the differential stays result-identical — confirming correctness across a real codegen change (not just a no-op on the fixtures). Updated test_237_stack_pointer_global_is_register_promoted to accept the frame-size 16 in its folded sub #16 form (still a plain scalar immediate, still not __synth_wasm_data-relocated — exactly the property it guards).

Gate

clippy clean; 284 lib tests; three frozen differentials result-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test i32_add_sub_fold_const_into_immediate (byte + >0xFF ADDW-path values). CI fuzz adds totality.

Part of #242. Builds on #253 (ADDW/SUBW encoder fix), merged.

🤖 Generated with Claude Code

…-RA-001)

Completes the arithmetic+bitwise immediate-folding family (and/or/xor #250/#252,
now add/sub). When the operand before i32.add/i32.sub is `i32.const C` with C in
0..=0xFFF and its `movw` cleanly at the tail, fold to `add/sub rd, a, #C` and
drop the materialization.

Range is the full 0..=0xFFF (4095), wider than the bitwise 0..=0xFF, because the
ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253) —
verified correct before folding into it.

This one actively fires on real fixtures: control_step's frame `sub #16` and its
const adds now fold (the differential stays result-identical, confirming
correctness across a codegen change). Updated test_237_stack_pointer_global_is_
register_promoted to accept the frame-size 16 in its folded `sub #16` form (still
a plain scalar immediate, still not __synth_wasm_data-relocated — the property it
checks).

GATE: clippy clean; 284 lib tests; three frozen differentials RESULT-identical
(control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test
i32_add_sub_fold_const_into_immediate (byte + >0xFF ADDW-path values). CI fuzz
adds totality.

Part of #242.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avrabe avrabe merged commit 95e8521 into main Jun 4, 2026
11 of 12 checks passed
@avrabe avrabe deleted the feat/vcr-add-sub-immediate-folding branch June 4, 2026 21:39
avrabe added a commit that referenced this pull request Jun 5, 2026
…rectness fixes (#260)

Promote the accumulated v0.11.30 work into the CHANGELOG before tagging:
native-pointer ABI (#237) + the VCR-* constant-immediate folding (#250/#252/#254)
+ analysis foundation (#243/#245) + three latent-miscompile encoder fixes
(#251 ORR/EOR NOP, #253 ADD/SUB large-frame, #255 CMP/ADDS/SUBS ThumbExpandImm).
Adds a falsification statement covering the encoder correctness class.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 5, 2026
…#257) (#264)

Wires the fuse_mul_add pass (#263 foundation) into the backend, after instruction
selection and before branch resolution (the fusion removes instructions, shifting
byte offsets). This is the codegen change that emits gale's measured delta.

Refined the soundness condition so it fires on real (branchy) functions while
staying sound: the mul result must be read ONLY by the add anywhere in the
function (new op_may_use helper — call/branch-aware: a pure branch reads no GP
reg; a call may read R0-R3; Bx / i64-pair / FP are conservatively assumed to
read). The "between mul and add" check still blocks on any control flow (a branch
there breaks the linear mul→add dataflow).

MEASURED (oracle repaired this session): flat_flight (flight_seam_flat) .o
1891 → 1819 bytes (~18 muls fused into mla). The three frozen differentials stay
RESULT-identical — flight_seam (which exercises the gyro*980+accel*20 filter)
stays 0x07FDF307 with the fusion firing; control_step 0x00210A55; div_const
338/338.

Also fixes test add_uses_correct_source_registers (semantic_correctness.rs):
`i32.const 10; i32.const 20; i32.add` folds the 20 into the ADD immediate since
#254 — the test predated that folding and still asserted a register operand. It
was missed because #254 was gated with `cargo test --lib` (this is a `tests/`
integration test); the full-suite run for this PR surfaced it.

Part of #242 / closes the lever-#2 portion of #257.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant