fix(encoder): ADD/SUB large immediates use ADDW/SUBW — fixes large-frame miscompile (VCR-RA-001)#253
Merged
Merged
Conversation
…ame miscompile (VCR-RA-001) Latent silent miscompile found while verifying the Add/Sub immediate path for folding: encode_thumb32_add/sub emit ADD.W/SUB.W (T3), whose `i:imm3:imm8` is a ThumbExpandImm MODIFIED immediate — but the code packed the raw bits without expansion. Correct only for imm <= 0xFF; for imm >= 0x100 it silently encoded a different value. This path is LIVE for frame alloc/dealloc: `add sp, sp, #256` encoded as `add sp, sp, #0` → a >=256-byte frame (>=64 i32 locals, or fewer with spills) corrupted the stack with no error. Fix: for imm in 0x100..=0xFFF use ADDW/SUBW (T4) — a PLAIN 12-bit immediate (0..4095, no ThumbExpandImm) — which encodes any frame size in range correctly. Keep T3 for imm <= 0xFF so existing encodings stay bit-identical. Error for imm > 4095 (no single-instruction form) rather than silently mis-encode (Ok-or-Err, #180/#185). GATE: clippy clean; backend suite passes; the three frozen differential fixtures stay RESULT-identical (they use <= 0xFF frames → bit-identical): control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338. Test add_sub_large_immediate_use_addw_subw_not_misencoded pins `add sp,sp,#256 → 0d f2 00 1d`, `sub → ad f2 00 1d`, and `#5000 → Err`. Part of #242. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
avrabe
added a commit
that referenced
this pull request
Jun 4, 2026
…-RA-001) (#254) Completes the arithmetic+bitwise immediate-folding family (and/or/xor #250/#252, now add/sub). When the operand before i32.add/i32.sub is `i32.const C` with C in 0..=0xFFF and its `movw` cleanly at the tail, fold to `add/sub rd, a, #C` and drop the materialization. Range is the full 0..=0xFFF (4095), wider than the bitwise 0..=0xFF, because the ADD/SUB immediate encoder now uses ADDW/SUBW (T4, plain imm12) for >0xFF (#253) — verified correct before folding into it. This one actively fires on real fixtures: control_step's frame `sub #16` and its const adds now fold (the differential stays result-identical, confirming correctness across a codegen change). Updated test_237_stack_pointer_global_is_ register_promoted to accept the frame-size 16 in its folded `sub #16` form (still a plain scalar immediate, still not __synth_wasm_data-relocated — the property it checks). GATE: clippy clean; 284 lib tests; three frozen differentials RESULT-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338); test i32_add_sub_fold_const_into_immediate (byte + >0xFF ADDW-path values). CI fuzz adds totality. Part of #242. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This was referenced Jun 4, 2026
avrabe
added a commit
that referenced
this pull request
Jun 4, 2026
… silent-miscompile class (#255) (#256) gale #255: encode_thumb32_cmp_imm / _adds / _subs packed the raw `i:imm3:imm8` field with no ThumbExpandImm expansion — correct only for imm <= 0xFF, silently mis-encoding any larger value (e.g. `cmp #N` comparing the wrong constant → wrong branch). Same class as #251 (ORR/EOR NOP) and #253 (ADD/SUB frame). Fix (gale's suggested shared helper): add `try_thumb_expand_imm(u32) -> Option<u32>` — the correct ARMv7-M ThumbExpandImm reverse-encoder (zero-extended byte, the three replicated-byte patterns, and the 8-bit-rotated form). CMP/ADDS/SUBS now encode any valid modified immediate CORRECTLY and return Err on a genuinely unrepresentable one (CMP/S-forms have no plain-imm12 fallback), forcing the selector to materialize into a register rather than silently mis-compile. Note: gale's example `cmp #1000` is actually representable (1000 = 0xFA ror 30, field 0xF7A) — the old code mis-encoded it (raw 0x3E8); this encodes it right. The genuine error case is e.g. `#0x101` (bits too far apart for an 8-bit window). GATE: clippy clean; backend suite passes; three frozen differentials RESULT- identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338) — the fixed paths weren't exercised by fixtures (the comparison lowering uses reg-reg CMP), so this is purely defensive hardening of a latent class. Tests try_thumb_expand_imm_encodes_modified_immediates + cmp_adds_subs_immediate_error_on_non_modified_imm. Closes #255. Part of #242. (A follow-up can retrofit AND/ORR/EOR to share this helper too, extending their fold range beyond 0xFF.) Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This was referenced Jun 5, 2026
avrabe
added a commit
that referenced
this pull request
Jun 5, 2026
…rectness fixes (#260) Promote the accumulated v0.11.30 work into the CHANGELOG before tagging: native-pointer ABI (#237) + the VCR-* constant-immediate folding (#250/#252/#254) + analysis foundation (#243/#245) + three latent-miscompile encoder fixes (#251 ORR/EOR NOP, #253 ADD/SUB large-frame, #255 CMP/ADDS/SUBS ThumbExpandImm). Adds a falsification statement covering the encoder correctness class. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
Jun 5, 2026
…ress class (#259) (#261) gale #259: the eight thumb32 load/store immediate-offset encoders (encode_thumb32_ldr/str/ldrb_imm/ldrsb_imm/ldrh_imm/ldrsh_imm/strb_imm/strh_imm) masked `offset & 0xFFF` and emitted unconditionally — for offset >= 0x1000 the access silently targeted the wrong address (offset & 0xFFF). Same class as #253 (ADD/SUB) and #255 (CMP/ADDS/SUBS). Fix: a shared `check_ldst_imm12` guard — error for offset > 0xFFF (the imm12 range is 0..=4095), forcing the selector to use register-offset addressing, rather than silently wrap. Closes the load/store wrong-address class. Liveness: probing `i32.load offset=5000` compiles successfully (no Err) — the selector already materializes large offsets and never feeds the imm12 encoders >= 0x1000, so this is DEFENSIVE hardening (matching gale's note), not a live breakage. GATE: clippy clean; backend suite passes; the three frozen fixtures are BYTE-IDENTICAL between this change and main (control_step / flight_seam_flat / div_const .o compared with `cmp` — the guard only adds an error path for offset > 0xFFF, unreachable by the fixtures). Test ldst_imm12_offset_errors_when_out_of_range. NOTE: the /tmp/armv differential venv lost its `wasmtime` module (tooling regression, unrelated to this change) — verified via byte-comparison instead, which is strictly stronger than the result-level differential here. Closes #259. Part of #242. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This was referenced Jun 5, 2026
avrabe
added a commit
that referenced
this pull request
Jun 5, 2026
… the class (#266) The last two unguarded data-processing immediate encoders: - AND.W immediate raw-packed i:imm3:imm8 (correct only <=0xFF, silently wrong above — the fold guard kept it in range, but the encoder itself was unsafe). - CMN.W immediate fell back to a silent 0xBF00 NOP for imm > 0xFF. Both now go through the shared `try_thumb_expand_imm` (added in #255): encode any valid modified immediate correctly, else `Err` (forcing register materialization). This closes the silent-miscompile class across the whole data-proc immediate family — AND/ORR/EOR/CMP/CMN/ADD/SUB/ADDS/SUBS now share one correct, Ok-or-Err path (#251/#253/#255/#259 lineage). Behavior-frozen: for imm <= 0xFF the encoding is byte-identical (try_thumb_expand_imm returns the same field), so the cmp/cmn/and folding output is unchanged — the three differentials stay result-identical (control_step 0x00210A55, flight_seam 0x07FDF307, div_const 338/338). Test and_cmn_immediate_thumb_expand_else_error: byte-range unchanged, a replicated modified immediate now encodes, and 0x101 errors (AND was raw-pack, CMN was a NOP). Part of #242. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Latent silent miscompile found while verifying the
Add/Subimmediate path for folding.The bug
encode_thumb32_add/subemit ADD.W/SUB.W (T3), whosei:imm3:imm8is a ThumbExpandImm modified immediate — but the code packed the raw bits without expansion. Correct only forimm ≤ 0xFF; forimm ≥ 0x100it silently encoded a different value.This path is live for frame alloc/dealloc:
add sp, sp, #256encoded asadd sp, sp, #0→ a ≥256-byte frame (≥64 i32 locals, or fewer with spills) corrupted the stack with no error. No fixture hits it (they have small frames), so it was invisible — the kind of silent miscompile the verify-before-transform discipline exists to catch.The fix
imm ∈ 0x100..=0xFFF→ ADDW/SUBW (T4), a plain 12-bit immediate (0–4095, no ThumbExpandImm) → encodes any in-range frame correctly.imm ≤ 0xFF→ keep T3 (existing encodings bit-identical).imm > 4095→ Err (no single-instruction form), not a silent mis-encode (Ok-or-Err, opt: repair + re-enable the optimized linear-memory path (root-cause the MemLoad ADD-register corruption) #180/arm encoder: panics (debug_assert) on PC/R15 operand instead of returning Err — fuzz-found, pre-existing #185).Gate
clippy clean; backend suite passes; three frozen differentials stay result-identical (≤0xFF frames → bit-identical): control_step
0x00210A55, flight_seam0x07FDF307, div_const338/338. Test pinsadd sp,sp,#256 → 0d f2 00 1d,sub → ad f2 00 1d,#5000 → Err.Part of #242.
🤖 Generated with Claude Code