feat(selector): fold small constants into i32.and immediates + encoder bound (VCR-RA-001) by avrabe · Pull Request #250 · pulseengine/synth

avrabe · 2026-06-04T20:26:43Z

The first delta-emitting codegen transform on the allocator track (VCR-RA-001, epic #242). Consolidates the encoder-bound finding (was #249) with the folding that uses it.

The waste (measured, #248)

i32.const C; i32.and lowered to movw rN,#C; and rD,rA,rN even when C is small enough to be an AND immediate — the redundant materialization gale measured on flat_flight.

The fix

When the operand before i32.and is i32.const C with C ∈ 0..=0xFF and its movw is cleanly at the instruction tail (not spilled), fold to and rD, rA, #C and drop the materialization (foldable_bitwise_imm + drop_prev_const_materialization, mirroring the const-divisor pattern).

Encoder bound (was #249)

Bounded to 0..=0xFF because the encoder's AND-immediate path isn't yet ThumbExpandImm-complete (and r2,r0,#0x7e → 00 f0 7e 02 is correct; ≥0x100 would mis-encode). 0..=0xFF covers gale's int8 clamps. A pinning test documents the safe range; folding is guarded to it, so the encoder never sees an un-encodable immediate.

Measured delta

(p & 0x7e) + (p & 0x7e): 8 → 6 instructions — both movw #126 eliminated, each AND uses the immediate. Better than const-CSE (no materialization at all).

Full gate (codegen change)

282 lib tests + 20 integration suites pass.
Three frozen differentials stay result-identical: control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338.
Tests: i32_and_folds_small_const_into_immediate (folded shape), i32_and_does_not_fold_out_of_range_const (0x140 stays a register operand — the encoder safety bound), and_immediate_encodes_correctly_in_byte_range... (encoder).
CI fuzz (encoder_no_panic, wasm_ops_lower_or_error) adds totality.

Supersedes #248 (the evidence it pinned is now folded away) and subsumes #249 (the encoder-bound test is included here).

Part of #242.

🤖 Generated with Claude Code

…old bound (VCR-RA-001) Investigating immediate folding (the biggest win the #248 evidence pointed to) surfaced an encoder limitation: the `And { Operand2::Imm }` path packs the low 12 bits straight into the `i:imm3:imm8` field WITHOUT ThumbExpandImm (the modified-immediate expansion). For imm <= 0xFF (gale's int8 clamps #0x7e/#0x7f) that is correct — `and r2,r0,#0x7e` encodes to the canonical `00 f0 7e 02`. For imm >= 0x100 the field needs a true rotation/replication pattern that is not implemented, so it would silently encode a different value. This path is currently DEAD (the selector never emits And-imm), so no live bug — but it sets the precondition for the immediate-folding transform: **fold only imm <= 0xFF** until the encoder is hardened to ThumbExpandImm / Ok-or-Err (the "encoder must be Ok-or-Err, never silently wrong" principle, #180/#185). That bound covers the measured flat_flight waste. Pins the safe-range encoding as a regression guard; no codegen change. Part of #242. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…001) The FIRST delta-emitting codegen transform on the allocator track. Evidence (#248) showed the dominant flat_flight-shape waste is redundant const materialization: `i32.const C; i32.and` lowered to `movw rN,#C; and rD,rA,rN` even when C is a small constant the AND instruction can take as an immediate. Fix: when the operand pushed immediately before `i32.and` is `i32.const C` with C in 0..=0xFF AND its `movw` is cleanly at the instruction tail (not spilled), fold to `and rD, rA, #C` and drop the materialization (foldable_bitwise_imm + drop_prev_const_materialization, mirroring the const-divisor pattern). Bounded to 0..=0xFF: the encoder's AND-immediate path is not yet ThumbExpandImm- complete (#249) — larger modified immediates await an encoder hardening to Ok-or-Err. 0..=0xFF covers gale's int8 clamps. Measured delta: the (p & 0x7e) + (p & 0x7e) pattern drops 8 → 6 instructions (both `movw #126` eliminated; each AND uses the immediate). Better than const-CSE — no materialization at all. GATE (full, codegen change): 282 lib tests + 20 integration suites pass; the three frozen differential fixtures stay RESULT-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); tests i32_and_folds_small_const_into_immediate (folded shape) + i32_and_does_not_fold_out_of_range_const (0x140 stays a register operand — the encoder safety bound). CI fuzz adds totality. Part of #242. Supersedes the #248 evidence (the redundancy it pinned is now folded away). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

avrabe · 2026-06-04T20:36:09Z

Built PR #250 and measured it on the G474RE — the first codegen-application delta, and it's correct on silicon:

bench	before	after #250	Δ	selfcheck
`controller_step`	169	168	−1 cyc	`0x05e33e81` ✅
`flat_flight`	262	261	−1 cyc	`0x07fdf307` ✅

Object-level: flat_flight 180→179 instrs (movw 33→32, .text 588→584 B); controller_step 120→119 (movw 27→26). So the fold fired and the result is bit-correct — the transform + my reflash loop are validated end-to-end. 🎉

One yield note: it folded exactly 1 AND-immediate per function, though both have ~3–4 & 0xFF byte-extractions in the packing. The movw cleanly at tail, not spilled guard is gating the rest (the other ANDs' movws are spilled or mid-sequence). So the bigger AND-fold yield is coupled to the spill work (VCR-RA-001) — once spill-under-pressure keeps those consts resident, more sites qualify. And per the 262→103 gap decomposition, the dominant levers are still ahead: const-CSE on the #0x7e/#0x7f clamp bounds (×6 each), mul+add→mla fusion, and tighter clamp lowering (18→6 IT-blocks).

Net: a clean, correct first step. Both microbenches stay frozen-and-staged; I'll post the delta for each subsequent transform as it lands.

…nt NOP (VCR-RA-001) (#251) Verifying the prerequisite for extending immediate folding to i32.or/xor surfaced a latent silent-wrong path: the Thumb-2 `Orr`/`Eor` encoders handled only `Operand2::Reg`; `Operand2::Imm` fell through to `0xBF00` (NOP). Folding an or/xor immediate would have silently turned the operation into a no-op — a miscompile. Fix (Ok-or-Err, #180/#185): encode the ORR.W / EOR.W T1 immediate for the zero-extended byte range (imm <= 0xFF) — `orr r2,r0,#0x7e → 40 f0 7e 02`, `eor → 80 f0 7e 02` — and return an error for larger modified immediates (ThumbExpandImm not yet implemented) rather than emit a wrong/NOP encoding. This path was DEAD (the selector never emits Orr/Eor-imm), so no existing codegen changes — the frozen fixtures are unaffected. It removes the silent-wrong path and is the precondition for safe i32.or/xor immediate folding (the #250 i32.and pattern, extended). Test orr_eor_immediate_encode_in_byte_range_else_error pins the byte-range bytes + the out-of-range error. Part of #242. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

codecov · 2026-06-04T20:58:19Z

Codecov Report

❌ Patch coverage is 95.09804% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/synth-synthesis/src/instruction_selector.rs	94.50%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

… (VCR-RA-001) (#252) Extends #250's i32.and immediate folding to i32.or and i32.xor, now that the encoder's ORR/EOR immediate paths are correct + Ok-or-Err (#251). Same shape: when the operand before the op is `i32.const C` (C in 0..=0xFF) with its `movw` cleanly at the instruction tail, fold to `orr/eor rd, a, #C` and drop the materialization (foldable_bitwise_imm + drop_prev_const_materialization). Bounded to 0..=0xFF by the encoder (ORR/EOR-imm > 0xFF returns Err until ThumbExpandImm lands, #251) — the fold guard keeps it in range. GATE (codegen change): clippy clean; 283 lib tests; the three frozen differential fixtures stay RESULT-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test i32_or_xor_fold_small_const_into_immediate (both ops fold, no movw survives). CI fuzz adds totality. Part of #242. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…-RA-001) (#254) Completes the arithmetic+bitwise immediate-folding family (and/or/xor #250/#252, now add/sub). When the operand before i32.add/i32.sub is `i32.const C` with C in 0..=0xFFF and its `movw` cleanly at the tail, fold to `add/sub rd, a, #C` and drop the materialization. Range is the full 0..=0xFFF (4095), wider than the bitwise 0..=0xFF, because the ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253) — verified correct before folding into it. This one actively fires on real fixtures: control_step's frame `sub #16` and its const adds now fold (the differential stays result-identical, confirming correctness across a codegen change). Updated test_237_stack_pointer_global_is_ register_promoted to accept the frame-size 16 in its folded `sub #16` form (still a plain scalar immediate, still not __synth_wasm_data-relocated — the property it checks). GATE: clippy clean; 284 lib tests; three frozen differentials RESULT-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test i32_add_sub_fold_const_into_immediate (byte + >0xFF ADDW-path values). CI fuzz adds totality. Part of #242. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…rectness fixes (#260) Promote the accumulated v0.11.30 work into the CHANGELOG before tagging: native-pointer ABI (#237) + the VCR-* constant-immediate folding (#250/#252/#254) + analysis foundation (#243/#245) + three latent-miscompile encoder fixes (#251 ORR/EOR NOP, #253 ADD/SUB large-frame, #255 CMP/ADDS/SUBS ThumbExpandImm). Adds a falsification statement covering the encoder correctness class. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

avrabe and others added 2 commits June 4, 2026 21:50

avrabe merged commit d83abb0 into main Jun 4, 2026
12 checks passed

avrabe deleted the feat/vcr-and-immediate-folding branch June 4, 2026 20:37

avrabe mentioned this pull request Jun 4, 2026

feat(selector): fold small constants into i32.or / i32.xor immediates (VCR-RA-001) #252

Merged

avrabe mentioned this pull request Jun 4, 2026

feat(selector): fold constants into i32.add / i32.sub immediates (VCR-RA-001) #254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(selector): fold small constants into i32.and immediates + encoder bound (VCR-RA-001)#250

feat(selector): fold small constants into i32.and immediates + encoder bound (VCR-RA-001)#250
avrabe merged 2 commits into
mainfrom
feat/vcr-and-immediate-folding

avrabe commented Jun 4, 2026

Uh oh!

avrabe commented Jun 4, 2026

Uh oh!

Uh oh!

codecov Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented Jun 4, 2026

The waste (measured, #248)

The fix

Encoder bound (was #249)

Measured delta

Full gate (codegen change)

Uh oh!

avrabe commented Jun 4, 2026

Uh oh!

Uh oh!

codecov Bot commented Jun 4, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant