feat(selector): fold constants into i32.add / i32.sub immediates (VCR-RA-001)#254
Merged
Merged
Conversation
…-RA-001) Completes the arithmetic+bitwise immediate-folding family (and/or/xor #250/#252, now add/sub). When the operand before i32.add/i32.sub is `i32.const C` with C in 0..=0xFFF and its `movw` cleanly at the tail, fold to `add/sub rd, a, #C` and drop the materialization. Range is the full 0..=0xFFF (4095), wider than the bitwise 0..=0xFF, because the ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253) — verified correct before folding into it. This one actively fires on real fixtures: control_step's frame `sub #16` and its const adds now fold (the differential stays result-identical, confirming correctness across a codegen change). Updated test_237_stack_pointer_global_is_ register_promoted to accept the frame-size 16 in its folded `sub #16` form (still a plain scalar immediate, still not __synth_wasm_data-relocated — the property it checks). GATE: clippy clean; 284 lib tests; three frozen differentials RESULT-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test i32_add_sub_fold_const_into_immediate (byte + >0xFF ADDW-path values). CI fuzz adds totality. Part of #242. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This was referenced Jun 4, 2026
avrabe
added a commit
that referenced
this pull request
Jun 5, 2026
…rectness fixes (#260) Promote the accumulated v0.11.30 work into the CHANGELOG before tagging: native-pointer ABI (#237) + the VCR-* constant-immediate folding (#250/#252/#254) + analysis foundation (#243/#245) + three latent-miscompile encoder fixes (#251 ORR/EOR NOP, #253 ADD/SUB large-frame, #255 CMP/ADDS/SUBS ThumbExpandImm). Adds a falsification statement covering the encoder correctness class. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
Jun 5, 2026
…#257) (#264) Wires the fuse_mul_add pass (#263 foundation) into the backend, after instruction selection and before branch resolution (the fusion removes instructions, shifting byte offsets). This is the codegen change that emits gale's measured delta. Refined the soundness condition so it fires on real (branchy) functions while staying sound: the mul result must be read ONLY by the add anywhere in the function (new op_may_use helper — call/branch-aware: a pure branch reads no GP reg; a call may read R0-R3; Bx / i64-pair / FP are conservatively assumed to read). The "between mul and add" check still blocks on any control flow (a branch there breaks the linear mul→add dataflow). MEASURED (oracle repaired this session): flat_flight (flight_seam_flat) .o 1891 → 1819 bytes (~18 muls fused into mla). The three frozen differentials stay RESULT-identical — flight_seam (which exercises the gyro*980+accel*20 filter) stays 0x07FDF307 with the fusion firing; control_step 0x00210A55; div_const 338/338. Also fixes test add_uses_correct_source_registers (semantic_correctness.rs): `i32.const 10; i32.const 20; i32.add` folds the 20 into the ADD immediate since #254 — the test predated that folding and still asserted a register operand. It was missed because #254 was gated with `cargo test --lib` (this is a `tests/` integration test); the full-suite run for this PR surfaced it. Part of #242 / closes the lever-#2 portion of #257. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Completes the arithmetic+bitwise immediate-folding family (
and/or/xor#250/#252; nowadd/sub).What
When the operand before
i32.add/i32.subisi32.const C(C ∈ 0..=0xFFF) with itsmovwcleanly at the tail, fold toadd/sub rd, a, #Cand drop the materialization.Range is the full 0..=0xFFF (4095) — wider than the bitwise
0..=0xFF— because the ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253), verified correct before folding into it.Fires on real code
This one actively fires on
control_step: its framesub #16and const adds now fold, and the differential stays result-identical — confirming correctness across a real codegen change (not just a no-op on the fixtures). Updatedtest_237_stack_pointer_global_is_register_promotedto accept the frame-size 16 in its foldedsub #16form (still a plain scalar immediate, still not__synth_wasm_data-relocated — exactly the property it guards).Gate
clippy clean; 284 lib tests; three frozen differentials result-identical (control_step
0x00210A55, flight_seam0x07FDF307, div_const338/338); testi32_add_sub_fold_const_into_immediate(byte + >0xFF ADDW-path values). CI fuzz adds totality.Part of #242. Builds on #253 (ADDW/SUBW encoder fix), merged.
🤖 Generated with Claude Code