feat(selector): fold constants into i32.add / i32.sub immediates (VCR-RA-001) by avrabe · Pull Request #254 · pulseengine/synth

avrabe · 2026-06-04T21:37:43Z

Completes the arithmetic+bitwise immediate-folding family (and/or/xor #250/#252; now add/sub).

What

When the operand before i32.add/i32.sub is i32.const C (C ∈ 0..=0xFFF) with its movw cleanly at the tail, fold to add/sub rd, a, #C and drop the materialization.

Range is the full 0..=0xFFF (4095) — wider than the bitwise 0..=0xFF — because the ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253), verified correct before folding into it.

Fires on real code

This one actively fires on control_step: its frame sub #16 and const adds now fold, and the differential stays result-identical — confirming correctness across a real codegen change (not just a no-op on the fixtures). Updated test_237_stack_pointer_global_is_register_promoted to accept the frame-size 16 in its folded sub #16 form (still a plain scalar immediate, still not __synth_wasm_data-relocated — exactly the property it guards).

Gate

clippy clean; 284 lib tests; three frozen differentials result-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test i32_add_sub_fold_const_into_immediate (byte + >0xFF ADDW-path values). CI fuzz adds totality.

Part of #242. Builds on #253 (ADDW/SUBW encoder fix), merged.

🤖 Generated with Claude Code

…-RA-001) Completes the arithmetic+bitwise immediate-folding family (and/or/xor #250/#252, now add/sub). When the operand before i32.add/i32.sub is `i32.const C` with C in 0..=0xFFF and its `movw` cleanly at the tail, fold to `add/sub rd, a, #C` and drop the materialization. Range is the full 0..=0xFFF (4095), wider than the bitwise 0..=0xFF, because the ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253) — verified correct before folding into it. This one actively fires on real fixtures: control_step's frame `sub #16` and its const adds now fold (the differential stays result-identical, confirming correctness across a codegen change). Updated test_237_stack_pointer_global_is_ register_promoted to accept the frame-size 16 in its folded `sub #16` form (still a plain scalar immediate, still not __synth_wasm_data-relocated — the property it checks). GATE: clippy clean; 284 lib tests; three frozen differentials RESULT-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test i32_add_sub_fold_const_into_immediate (byte + >0xFF ADDW-path values). CI fuzz adds totality. Part of #242. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rectness fixes (#260) Promote the accumulated v0.11.30 work into the CHANGELOG before tagging: native-pointer ABI (#237) + the VCR-* constant-immediate folding (#250/#252/#254) + analysis foundation (#243/#245) + three latent-miscompile encoder fixes (#251 ORR/EOR NOP, #253 ADD/SUB large-frame, #255 CMP/ADDS/SUBS ThumbExpandImm). Adds a falsification statement covering the encoder correctness class. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…#257) (#264) Wires the fuse_mul_add pass (#263 foundation) into the backend, after instruction selection and before branch resolution (the fusion removes instructions, shifting byte offsets). This is the codegen change that emits gale's measured delta. Refined the soundness condition so it fires on real (branchy) functions while staying sound: the mul result must be read ONLY by the add anywhere in the function (new op_may_use helper — call/branch-aware: a pure branch reads no GP reg; a call may read R0-R3; Bx / i64-pair / FP are conservatively assumed to read). The "between mul and add" check still blocks on any control flow (a branch there breaks the linear mul→add dataflow). MEASURED (oracle repaired this session): flat_flight (flight_seam_flat) .o 1891 → 1819 bytes (~18 muls fused into mla). The three frozen differentials stay RESULT-identical — flight_seam (which exercises the gyro*980+accel*20 filter) stays 0x07FDF307 with the fusion firing; control_step 0x00210A55; div_const 338/338. Also fixes test add_uses_correct_source_registers (semantic_correctness.rs): `i32.const 10; i32.const 20; i32.add` folds the 20 into the ADD immediate since #254 — the test predated that folding and still asserted a register operand. It was missed because #254 was gated with `cargo test --lib` (this is a `tests/` integration test); the full-suite run for this PR surfaced it. Part of #242 / closes the lever-#2 portion of #257. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

avrabe merged commit 95e8521 into main Jun 4, 2026
11 of 12 checks passed

avrabe deleted the feat/vcr-add-sub-immediate-folding branch June 4, 2026 21:39

This was referenced Jun 4, 2026

perf: --relocatable direct selector bypasses synth-opt — general codegen optimization (research + stats tracking) #209

Open

docs(changelog): expand [0.11.30] for the release #260

Merged

avrabe mentioned this pull request Jun 5, 2026

feat: wire mul/add → mla fusion into codegen — gale flat_flight delta (#257) #264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(selector): fold constants into i32.add / i32.sub immediates (VCR-RA-001)#254

feat(selector): fold constants into i32.add / i32.sub immediates (VCR-RA-001)#254
avrabe merged 1 commit into
mainfrom
feat/vcr-add-sub-immediate-folding

avrabe commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented Jun 4, 2026

What

Fires on real code

Gate

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant