Skip to content

feat(selector): fold small constants into i32.or / i32.xor immediates (VCR-RA-001)#252

Merged
avrabe merged 1 commit into
mainfrom
feat/vcr-or-xor-immediate-folding
Jun 4, 2026
Merged

feat(selector): fold small constants into i32.or / i32.xor immediates (VCR-RA-001)#252
avrabe merged 1 commit into
mainfrom
feat/vcr-or-xor-immediate-folding

Conversation

@avrabe

@avrabe avrabe commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Extends #250's i32.and immediate folding to i32.or and i32.xor, now that the encoder's ORR/EOR immediate paths are correct + Ok-or-Err (#251).

What

When the operand before i32.or/i32.xor is i32.const C (C ∈ 0..=0xFF) with its movw cleanly at the tail, fold to orr/eor rd, a, #C and drop the materialization — same shape as the And fold.

Bounded to 0..=0xFF by the encoder (ORR/EOR-imm > 0xFF returns Err until ThumbExpandImm lands, #251); the fold guard keeps it in range.

Gate (codegen change)

  • clippy clean; 283 lib tests pass.
  • Three frozen differentials stay result-identical: control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338.
  • Test i32_or_xor_fold_small_const_into_immediate (both ops fold, no movw survives).
  • CI fuzz adds totality.

Part of #242. Builds on #250 (fold pattern) + #251 (encoder hardening), both merged.

🤖 Generated with Claude Code

… (VCR-RA-001)

Extends #250's i32.and immediate folding to i32.or and i32.xor, now that the
encoder's ORR/EOR immediate paths are correct + Ok-or-Err (#251). Same shape:
when the operand before the op is `i32.const C` (C in 0..=0xFF) with its `movw`
cleanly at the instruction tail, fold to `orr/eor rd, a, #C` and drop the
materialization (foldable_bitwise_imm + drop_prev_const_materialization).

Bounded to 0..=0xFF by the encoder (ORR/EOR-imm > 0xFF returns Err until
ThumbExpandImm lands, #251) — the fold guard keeps it in range.

GATE (codegen change): clippy clean; 283 lib tests; the three frozen
differential fixtures stay RESULT-identical (control_step 0x00210A55, flight_seam
0x07FDF307, div_const 338/338); test i32_or_xor_fold_small_const_into_immediate
(both ops fold, no movw survives). CI fuzz adds totality.

Part of #242.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avrabe avrabe merged commit 52ee43b into main Jun 4, 2026
12 checks passed
@avrabe avrabe deleted the feat/vcr-or-xor-immediate-folding branch June 4, 2026 21:03
@avrabe

avrabe commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Built #252 and diffed against #250 on the gale workload: byte-identical for both flat_flight and controller_step — no i32.or/i32.xor-with-small-constant sites in the flight/engine algos. Their | is the byte-packing (a | b<<8 | c<<16 | d<<24), which ORs registers, not const-with-immediate, so the fold guard correctly doesn't fire. So #252 is sound infra (and #251's ORR/EOR-NOP fix closes a real latent miscompile), but it's a no-op on these benches — nothing to reflash.

That completes the immediate-fold family (AND/OR/XOR) with net −1 cyc on the gale benches (the single AND site from #250). Per the 262→103 decomposition, the needle-movers for flat_flight are still the three I measured: const-CSE on the #0x7e/#0x7f clamp bounds (×6 each), mul+addmla fusion, and clamp lowering (18→6 IT-blocks) — that's where the ~150-cyc gap toward native's 103 actually lives. Both microbenches stay staged; I'll post the delta the moment one of those lands.

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.30769% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/instruction_selector.rs 92.30% 8 Missing ⚠️

📢 Thoughts on this report? Let us know!

avrabe added a commit that referenced this pull request Jun 4, 2026
…-RA-001) (#254)

Completes the arithmetic+bitwise immediate-folding family (and/or/xor #250/#252,
now add/sub). When the operand before i32.add/i32.sub is `i32.const C` with C in
0..=0xFFF and its `movw` cleanly at the tail, fold to `add/sub rd, a, #C` and
drop the materialization.

Range is the full 0..=0xFFF (4095), wider than the bitwise 0..=0xFF, because the
ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253) —
verified correct before folding into it.

This one actively fires on real fixtures: control_step's frame `sub #16` and its
const adds now fold (the differential stays result-identical, confirming
correctness across a codegen change). Updated test_237_stack_pointer_global_is_
register_promoted to accept the frame-size 16 in its folded `sub #16` form (still
a plain scalar immediate, still not __synth_wasm_data-relocated — the property it
checks).

GATE: clippy clean; 284 lib tests; three frozen differentials RESULT-identical
(control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test
i32_add_sub_fold_const_into_immediate (byte + >0xFF ADDW-path values). CI fuzz
adds totality.

Part of #242.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 5, 2026
…rectness fixes (#260)

Promote the accumulated v0.11.30 work into the CHANGELOG before tagging:
native-pointer ABI (#237) + the VCR-* constant-immediate folding (#250/#252/#254)
+ analysis foundation (#243/#245) + three latent-miscompile encoder fixes
(#251 ORR/EOR NOP, #253 ADD/SUB large-frame, #255 CMP/ADDS/SUBS ThumbExpandImm).
Adds a falsification statement covering the encoder correctness class.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant