Skip to content

perf: in-place select — elide keep-val2 move (#209, VCR-SEL-002)#283

Merged
avrabe merged 1 commit into
mainfrom
feat/vcr-sel-002-inplace-select
Jun 6, 2026
Merged

perf: in-place select — elide keep-val2 move (#209, VCR-SEL-002)#283
avrabe merged 1 commit into
mainfrom
feat/vcr-sel-002-inplace-select

Conversation

@avrabe

@avrabe avrabe commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

What

The stack selector's Select lowering emitted three instructions into a fresh dst:

cmp cond, #0
selectmove dst, val1, NE     ; cond != 0 -> dst = val1
selectmove dst, val2, EQ     ; cond == 0 -> dst = val2

Since select consumes all three operands, val2's register is dead after the op. We reuse it as dst, so the EQ "keep val2" branch becomes a no-op move on itself and is elided — leaving exactly one conditional move (the NE override), which is what native emits for a clamp (x>k)?k:x. Each removed SelectMove is one fewer IT;MOV: −2 bytes and −1 cycle (the M4 executes a predicated-false MOV).

Soundness

In-place only when val2 is safely overwritable, else fall back to the original two-move form:

No instruction sits between the CMP and the conditional move in the in-place path, so the flags survive (the same property the two-move form relied on to avoid a flag-clobbering MOVS).

Oracle — RESULT-identical on every frozen differential

This is an optimization: bytes change, results do not.

fixture result status
control_step 13/13 → 0x00210A55 PASS
flight_seam inlined 0x07FDF307 MATCH
flight_seam flat 0x07FDF307 MATCH
div_const 338/338 PASS

Measured .text reduction

fixture before after Δ
control_step 378 B 354 B −24 (−6.3%)
flight_seam inlined 1058 B 1020 B −38 (−3.6%)
flight_seam flat 1294 B 1244 B −50 (−3.9%)

Tests

  • Updated test_control_flow_select / test_select_stack_mode_with_constants / test_select_with_global_values to assert the NE override (emitted in both forms; the EQ move is what in-place elides).
  • New test_select_in_place_elides_keep_move_209 pins one-NE-zero-EQ for a reusable val2.
  • 323/323 synth-synthesis lib tests pass, clippy clean.

Falsification

This change is wrong if any module produces a different result than wasmtime where val2's register was actually still needed after the select — i.e. the live-param / aliasing guards missed a case. Watched by the four differentials above + gale on-target against the 5 frozen baselines.

Traceability: VCR-SEL-002 (filed in #282). Part of the #209 codegen-quality program.

🤖 Generated with Claude Code

…, VCR-SEL-002)

The stack selector's `Select` lowering emitted three instructions —
`cmp cond,#0; selectmove dst,val1,NE; selectmove dst,val2,EQ` — into a
freshly allocated `dst`. Because `select` *consumes* all three operands,
`val2`'s register is dead afterward, so we reuse it as `dst`: the EQ
"keep val2" branch is then a no-op move on itself and is elided. Exactly
one conditional move (the NE override) remains — what native emits for a
clamp `(x>k)?k:x`. Each removed `SelectMove` is one fewer IT;MOV: −2 bytes
and −1 cycle (the M4 still executes a predicated-false MOV).

Soundness — in-place only when `val2` is safely overwritable:
  - not a live param register  (#193 param-clobber class)
  - distinct from `cond`       (cmp consumed it, but a later read can't)
  - distinct from `val1`       (degenerate; fresh dst keeps it simple)
Otherwise falls back to the original fresh-dst two-move form. No
instruction sits between the CMP and the conditional move in the in-place
path, so the flags are intact (same property the two-move form relied on).

Gate — RESULT-identical on every frozen differential (this is an
optimization: bytes change, results do not):
  control_step      13/13   0x00210A55   PASS
  flight_seam inlined        0x07FDF307   MATCH
  flight_seam flat           0x07FDF307   MATCH
  div_const         338/338               PASS

Measured .text reduction:
  control_step          378 -> 354 B  (-24,  -6.3%)
  flight_seam inlined  1058 -> 1020 B  (-38,  -3.6%)
  flight_seam flat     1294 -> 1244 B  (-50,  -3.9%)

Tests: updated test_control_flow_select / _stack_mode_with_constants /
_with_global_values to assert the NE override (present in both forms);
new test_select_in_place_elides_keep_move_209 pins one-NE-zero-EQ.
323/323 synth-synthesis lib tests pass, clippy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avrabe

avrabe commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

On-target results for the in-place select (#283) — this is the biggest lever so far, and it's a clean non-regressing win (eliding moves only):

bench before (v0.11.34) after #283 Δ instrs seam
flat_flight 255 241 −14 cyc 170→157 0x07FDF307
controller_step 162 150 −12 cyc 110→99 0x05e33e81
control_step 158 151 −7 cyc 121→113 2165333
filter_axis 37 37 0 (no select)

The clamps were the dominant cost (your "18 IT-blocks vs native 6" from the gap decomposition), and eliding the keep-val2 move per Select cuts directly into it — −33 cyc across the three clamp-heavy benches, vastly more than const-CSE (−1) or the allocator's expected modest gains. This confirms your read that instruction selection (clamp/IT-block lowering) is the larger slice of the 255→103 gap, not the allocator.

flat_flight now 241 / 103 = 2.34× (was 2.54× a few days ago, 3.18× at v0.11.18). All seams bit-identical. If this is the new default (no flag), I'd update the acceptance gate to flat_flight 241 / controller 150 / control_step 151 / filter 37 once it merges — and the const-CSE/allocator deltas then stack on top of this lower baseline. Nice — the select lowering was hiding a lot. 🎯

@codecov

codecov Bot commented Jun 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.60976% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/instruction_selector.rs 75.60% 10 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit a769300 into main Jun 6, 2026
13 of 14 checks passed
@avrabe avrabe deleted the feat/vcr-sel-002-inplace-select branch June 6, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant