Skip to content

fix(encoder): ADD/SUB large immediates use ADDW/SUBW — fixes large-frame miscompile (VCR-RA-001)#253

Merged
avrabe merged 1 commit into
mainfrom
fix/encoder-addw-subw-large-imm
Jun 4, 2026
Merged

fix(encoder): ADD/SUB large immediates use ADDW/SUBW — fixes large-frame miscompile (VCR-RA-001)#253
avrabe merged 1 commit into
mainfrom
fix/encoder-addw-subw-large-imm

Conversation

@avrabe

@avrabe avrabe commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Latent silent miscompile found while verifying the Add/Sub immediate path for folding.

The bug

encode_thumb32_add/sub emit ADD.W/SUB.W (T3), whose i:imm3:imm8 is a ThumbExpandImm modified immediate — but the code packed the raw bits without expansion. Correct only for imm ≤ 0xFF; for imm ≥ 0x100 it silently encoded a different value.

This path is live for frame alloc/dealloc: add sp, sp, #256 encoded as add sp, sp, #0 → a ≥256-byte frame (≥64 i32 locals, or fewer with spills) corrupted the stack with no error. No fixture hits it (they have small frames), so it was invisible — the kind of silent miscompile the verify-before-transform discipline exists to catch.

The fix

Gate

clippy clean; backend suite passes; three frozen differentials stay result-identical (≤0xFF frames → bit-identical): control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338. Test pins add sp,sp,#256 → 0d f2 00 1d, sub → ad f2 00 1d, #5000 → Err.

Part of #242.

🤖 Generated with Claude Code

…ame miscompile (VCR-RA-001)

Latent silent miscompile found while verifying the Add/Sub immediate path for
folding: encode_thumb32_add/sub emit ADD.W/SUB.W (T3), whose `i:imm3:imm8` is a
ThumbExpandImm MODIFIED immediate — but the code packed the raw bits without
expansion. Correct only for imm <= 0xFF; for imm >= 0x100 it silently encoded a
different value. This path is LIVE for frame alloc/dealloc: `add sp, sp, #256`
encoded as `add sp, sp, #0` → a >=256-byte frame (>=64 i32 locals, or fewer with
spills) corrupted the stack with no error.

Fix: for imm in 0x100..=0xFFF use ADDW/SUBW (T4) — a PLAIN 12-bit immediate
(0..4095, no ThumbExpandImm) — which encodes any frame size in range correctly.
Keep T3 for imm <= 0xFF so existing encodings stay bit-identical. Error for
imm > 4095 (no single-instruction form) rather than silently mis-encode
(Ok-or-Err, #180/#185).

GATE: clippy clean; backend suite passes; the three frozen differential fixtures
stay RESULT-identical (they use <= 0xFF frames → bit-identical): control_step
0x00210A55, flight_seam 0x07FDF307, div_const 338/338. Test
add_sub_large_immediate_use_addw_subw_not_misencoded pins
`add sp,sp,#256 → 0d f2 00 1d`, `sub → ad f2 00 1d`, and `#5000 → Err`.

Part of #242.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avrabe avrabe merged commit 5cc26a0 into main Jun 4, 2026
12 checks passed
@avrabe avrabe deleted the fix/encoder-addw-subw-large-imm branch June 4, 2026 21:14
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.33333% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-backend/src/arm_encoder.rs 93.33% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

avrabe added a commit that referenced this pull request Jun 4, 2026
…-RA-001) (#254)

Completes the arithmetic+bitwise immediate-folding family (and/or/xor #250/#252,
now add/sub). When the operand before i32.add/i32.sub is `i32.const C` with C in
0..=0xFFF and its `movw` cleanly at the tail, fold to `add/sub rd, a, #C` and
drop the materialization.

Range is the full 0..=0xFFF (4095), wider than the bitwise 0..=0xFF, because the
ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253) —
verified correct before folding into it.

This one actively fires on real fixtures: control_step's frame `sub #16` and its
const adds now fold (the differential stays result-identical, confirming
correctness across a codegen change). Updated test_237_stack_pointer_global_is_
register_promoted to accept the frame-size 16 in its folded `sub #16` form (still
a plain scalar immediate, still not __synth_wasm_data-relocated — the property it
checks).

GATE: clippy clean; 284 lib tests; three frozen differentials RESULT-identical
(control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test
i32_add_sub_fold_const_into_immediate (byte + >0xFF ADDW-path values). CI fuzz
adds totality.

Part of #242.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 4, 2026
… silent-miscompile class (#255) (#256)

gale #255: encode_thumb32_cmp_imm / _adds / _subs packed the raw `i:imm3:imm8`
field with no ThumbExpandImm expansion — correct only for imm <= 0xFF, silently
mis-encoding any larger value (e.g. `cmp #N` comparing the wrong constant → wrong
branch). Same class as #251 (ORR/EOR NOP) and #253 (ADD/SUB frame).

Fix (gale's suggested shared helper): add `try_thumb_expand_imm(u32) -> Option<u32>`
— the correct ARMv7-M ThumbExpandImm reverse-encoder (zero-extended byte, the
three replicated-byte patterns, and the 8-bit-rotated form). CMP/ADDS/SUBS now
encode any valid modified immediate CORRECTLY and return Err on a genuinely
unrepresentable one (CMP/S-forms have no plain-imm12 fallback), forcing the
selector to materialize into a register rather than silently mis-compile.

Note: gale's example `cmp #1000` is actually representable (1000 = 0xFA ror 30,
field 0xF7A) — the old code mis-encoded it (raw 0x3E8); this encodes it right.
The genuine error case is e.g. `#0x101` (bits too far apart for an 8-bit window).

GATE: clippy clean; backend suite passes; three frozen differentials RESULT-
identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338) —
the fixed paths weren't exercised by fixtures (the comparison lowering uses
reg-reg CMP), so this is purely defensive hardening of a latent class. Tests
try_thumb_expand_imm_encodes_modified_immediates +
cmp_adds_subs_immediate_error_on_non_modified_imm.

Closes #255. Part of #242. (A follow-up can retrofit AND/ORR/EOR to share this
helper too, extending their fold range beyond 0xFF.)

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 5, 2026
…rectness fixes (#260)

Promote the accumulated v0.11.30 work into the CHANGELOG before tagging:
native-pointer ABI (#237) + the VCR-* constant-immediate folding (#250/#252/#254)
+ analysis foundation (#243/#245) + three latent-miscompile encoder fixes
(#251 ORR/EOR NOP, #253 ADD/SUB large-frame, #255 CMP/ADDS/SUBS ThumbExpandImm).
Adds a falsification statement covering the encoder correctness class.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 5, 2026
…ress class (#259) (#261)

gale #259: the eight thumb32 load/store immediate-offset encoders
(encode_thumb32_ldr/str/ldrb_imm/ldrsb_imm/ldrh_imm/ldrsh_imm/strb_imm/strh_imm)
masked `offset & 0xFFF` and emitted unconditionally — for offset >= 0x1000 the
access silently targeted the wrong address (offset & 0xFFF). Same class as #253
(ADD/SUB) and #255 (CMP/ADDS/SUBS).

Fix: a shared `check_ldst_imm12` guard — error for offset > 0xFFF (the imm12
range is 0..=4095), forcing the selector to use register-offset addressing,
rather than silently wrap. Closes the load/store wrong-address class.

Liveness: probing `i32.load offset=5000` compiles successfully (no Err) — the
selector already materializes large offsets and never feeds the imm12 encoders
>= 0x1000, so this is DEFENSIVE hardening (matching gale's note), not a live
breakage.

GATE: clippy clean; backend suite passes; the three frozen fixtures are
BYTE-IDENTICAL between this change and main (control_step / flight_seam_flat /
div_const .o compared with `cmp` — the guard only adds an error path for
offset > 0xFFF, unreachable by the fixtures). Test
ldst_imm12_offset_errors_when_out_of_range.

NOTE: the /tmp/armv differential venv lost its `wasmtime` module (tooling
regression, unrelated to this change) — verified via byte-comparison instead,
which is strictly stronger than the result-level differential here.

Closes #259. Part of #242.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 5, 2026
… the class (#266)

The last two unguarded data-processing immediate encoders:
- AND.W immediate raw-packed i:imm3:imm8 (correct only <=0xFF, silently wrong
  above — the fold guard kept it in range, but the encoder itself was unsafe).
- CMN.W immediate fell back to a silent 0xBF00 NOP for imm > 0xFF.

Both now go through the shared `try_thumb_expand_imm` (added in #255): encode any
valid modified immediate correctly, else `Err` (forcing register materialization).
This closes the silent-miscompile class across the whole data-proc immediate
family — AND/ORR/EOR/CMP/CMN/ADD/SUB/ADDS/SUBS now share one correct, Ok-or-Err
path (#251/#253/#255/#259 lineage).

Behavior-frozen: for imm <= 0xFF the encoding is byte-identical (try_thumb_expand_imm
returns the same field), so the cmp/cmn/and folding output is unchanged — the
three differentials stay result-identical (control_step 0x00210A55, flight_seam
0x07FDF307, div_const 338/338). Test and_cmn_immediate_thumb_expand_else_error:
byte-range unchanged, a replicated modified immediate now encodes, and 0x101
errors (AND was raw-pack, CMN was a NOP).

Part of #242.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant