Skip to content

synth: cross-Renode A/B confirms gale's −34.5% delta is real, not simulator-artifactual#27

Merged
avrabe merged 11 commits into
mainfrom
experiment/gale-via-synth
May 1, 2026
Merged

synth: cross-Renode A/B confirms gale's −34.5% delta is real, not simulator-artifactual#27
avrabe merged 11 commits into
mainfrom
experiment/gale-via-synth

Conversation

@avrabe

@avrabe avrabe commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds a fourth gale build pipeline (rustc → wasm → synth → Cortex-M ELF, vs the published LTO post's three) and the cross-Renode A/B that proves the resulting −34.5% handoff cycle delta is in the actual instruction stream, not a simulator artifact.

Headline result (from CI run 25135305084, on branch HEAD before this PR's last two commits):

Comparison Baseline median Gale median Δ
(Headline) Renode 1.16.0: baseline vs gale 354 cyc 232 cyc −34.5%
(a) Renode nightly: baseline vs gale 354 cyc 232 cyc −34.5% (identical)
(b) Baseline ELF, 1.16.0 vs nightly 354 cyc 354 cyc 0.0%
(c) Gale ELF, 1.16.0 vs nightly 232 cyc 232 cyc 0.0%

Same-ELF cross-version drift is exactly 0.0% on both paths. The −34.5% survives the model swap at the same magnitude. The delta is in the instruction stream, not in the simulator. Pooled gale max (236) < pooled baseline median (354).

What's in this PR

  • New build mode: -DGALE_USE_SYNTH=ON runs the gale-ffi crate through wasm-opt + synth (PulseEngine's Cortex-M AOT) to produce libgale_ffi.a at the same path the rustc-direct path produces it, no consumer changes.
  • Renode A/B (commit `ci(investigate): A/B same ELFs under Renode nightly`): runs the same two ELFs under Renode 1.16.0 (container-pinned) and Renode nightly (downloaded fresh from `builds.renode.io`) so the cycle-model can be ruled in or out as the source of the delta. Includes three control comparisons (a/b/c above) rendered into the workflow's step summary.
  • Synth correctness control (commit `synth CI: B6 ...`, audit P7 / DO-330 §6.1.4): a third gale build via rustc-direct (no synth in the chain) is run alongside under 1.16.0. If synth's handoff median agrees with rustc-direct's within ~3%, the synth-emitted ELF is a faithful Cortex-M codegen of the same Rust source. This is the cheapest tool-error-mitigation against synth being a research compiler.
  • Build manifest (commit `synth CI: ... D6`, audit P9): every CI run now produces a `manifest.txt` capturing rustc / cargo / synth / wasm-opt / west / robotframework / SDK versions, Renode 1.16.0 + nightly versions, sha256 of the nightly tarball, sha256 of all built ELFs, sha256 + size of all CSV outputs, and `west list` for Zephyr fork + module shas. Bundled into the artifact alongside the three ELFs themselves.

Total Renode runs in this workflow: 5 (baseline 1.16.0, synth 1.16.0, rustc-direct 1.16.0, baseline nightly, synth nightly). `timeout-minutes` bumped 240 → 300.

Audit pass

This PR was reviewed via the Mythos slop-hunt scaffold (10 personas + 1 fresh-session validator). Findings landed in this PR:

  • B6 (P7, tool qualification) — rustc-direct cross-check arm
  • D6 (P9, configuration management) — manifest.txt artifact
  • P10 validator refuted P5's claim that PATH override was a no-op for renode-test (nightly tarball ships its own renode-test, sibling-resolution works)

Findings deferred to the flight-bench PR: B1-B5 (analyzer, bench-code).

Test plan

  • CI run on this branch lands green (run 25149225844 currently queued)
  • manifest.txt appears in the artifact bundle and contains all expected fields
  • Cross-Renode controls (b) and (c) report 0.0% drift on the same ELF — proves cycle model isn't the source
  • Synth-vs-rustc-direct shows median agreement within 3% — proves synth's codegen is faithful
  • Step summary renders 4 comparison sections cleanly

🤖 Generated with Claude Code

avrabe and others added 11 commits April 27, 2026 22:29
When GALE_USE_SYNTH=ON, the gale-ffi crate is compiled to wasm32 first,
then run through the synth AOT compiler (pulseengine/synth) which emits
a Cortex-M ET_REL relocatable object. The object is wrapped into the
same libgale_ffi.a path the rest of the build expects via ar, so the
per-module gale_sem.c / gale_mutex.c / etc. consumers need no changes.

This is the build-system half of the 4th-variant experiment for the
cross-language LTO blog post: same engine bench, three existing builds
(GCC baseline / GCC + Gale / LLVM + LTO + Gale) plus a 4th data point
where verified Rust reaches Cortex-M via Verus → rustc → wasm → synth's
Rocq-proved i32 instruction selection.

Requires synth with --relocatable flag (pulseengine/synth#83).

Default behaviour (GALE_USE_SYNTH=OFF) is unchanged: rustc-direct-to-Cortex-M
is still the production path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ment

Adds the gale-via-synth lane to the engine-bench Renode matrix as a
follow-up to the cross-language LTO post. Builds the GCC baseline and
the GALE_USE_SYNTH=ON variant (wasm32 -> synth -> Cortex-M ET_REL ->
libgale_ffi.a) on the same CI run, then sweeps both through Renode at
the long sample count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two optional formally-verified-(or-not) wasm optimizers between
rustc and synth in the GALE_USE_SYNTH pipeline:

  rustc -> [wasm-opt -Oz] -> [loom optimize] -> synth -> ar -> .a

Both are detected via find_program() and only inserted into the
pipeline if found. If neither is on the path the pipeline reduces
to rustc -> synth (unchanged behaviour).

Effect on engine bench (stm32f4_disco, prj-gale.conf, GCC C kernel):
  synth alone:                  text=22448, total=38533
  synth + wasm-opt + loom:      text=22420, total=38505 (-28 B)

Wasm-level reduction is dramatic (-34% from wasm-opt -Oz), but the
synth-emitted ARM code is dominated by per-function instruction
selection overhead, so the final ELF only moves a few dozen bytes.
The verification-chain story is the bigger win: loom proves each
pass it applies preserves semantics; rejected passes are skipped
rather than applied unsoundly.

CI workflow installs both: binaryen apt package (wasm-opt) and
loom-cli from pulseengine/loom main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
workflow_dispatch alone requires the workflow to exist on the
default branch before it can be invoked, which we can't do without
merging the whole experiment first. Adding a push trigger on
experiment/gale-via-synth so the workflow auto-runs whenever the
experiment branch advances. Strip this trigger before any merge
to main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t --path)

cargo install --git URL --path PATH is invalid — the two flags are
mutually exclusive. When installing a sub-crate from a git workspace,
pass the package name as a positional argument:

  cargo install --git URL [--branch B] PACKAGE_NAME --force

Fixes the workflow's synth-cli install (was rejected with: 'the
argument --git <URL> cannot be used with --path <PATH>') and the
identical pattern used for loom-cli.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… optimize mode)

Diagnosed during local debugging: synth's optimized register-allocation
path clobbers r0/r1 (input parameter registers) at function entry when
the wasm body pushes i32 constants before the first local.get. The
function's prologue ends up looking like:

  movs r0, #1     ← clobbers param 0 (count)
  movs r1, #0     ← clobbers param 1 (limit)
  ...
  cmp r0, r1      ← compares clobbered values, not the actual params

This crashes the engine bench in Renode (HardFault on first gale call,
infinite handler loop, never reaches the test's Zero Drops assertion).
The CI run hit a 60-min step timeout without producing a single sample.

Minimal repro saved at /tmp/match_gale.wat (3 i32 params, i64 local,
push 3 i32 constants, then local.get 0). Worth filing as a synth issue
once the experiment lands.

Workaround: synth --no-optimize. Disables the offending pass and emits
proper AAPCS prologue (push r4..r8/lr, locals on stack, params read
from r0/r1/r2 unchanged). Verified locally: same gale_k_sem_give_decide
function now starts with `stmdb sp!, {r4..r8, lr}` and reads r0/r1
correctly.

Cost: ~68 bytes of additional flash (22624 → 22692) and unknown cycles.
The --no-optimize path uses stack-based locals which is wasteful but
correct. Stack frame size also goes up — synth reserves ~4KB per
function for locals which may be excessive; will need to validate
no stack overflow on the engine bench worker thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…frame fixes

The previous CI run booted Zephyr with the synth-built ELF but every
RPM step reported count=0 [drain_timeout]. Diagnosed two synth bugs:

1. i64 local storage dropped the upper half (--no-optimize path)
2. Locals area aliased the saved-register spill (also --no-optimize)

Both fixed in pulseengine/synth#85. This commit points the workflow
at that branch so the next CI run uses the fixed synth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in:
- explicit I64Or/And/Xor/ExtendI32U/ExtendI32S/Shl/ShrU/ShrS handlers
  in synth's select_with_stack (no more wildcard fallthrough to
  select_default's R0:R1/R2:R3 assumption)
- alloc_consecutive_pair now reserves implicit pair_hi of every
  stack entry plus extra_avoid for popped operands

Local build verified: gale_k_sem_give_decide ends with
  orr r0, r6, r8
  orr r1, r7, ip
matching the wasm i64.or semantics. 22644 B FLASH.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tracing the −34.5% handoff cycle delta in the synth bench. Found that
loom's optimizer hoists `local.set 3 = 0` from the fall-through arm
of gale_k_sem_give_decide to BEFORE the dispatcher, dropping the
WakeThread/Increment distinction at the wasm level — synth then emits
ARM that always returns action=INCREMENT regardless of has_waiter.

The bench passes (samples=7750, drops=0) because the engine_control
worker is rarely actually blocked at sem_give time, so the WAKE path
is rarely needed for correctness. But the cycle delta is then
comparing a degenerate always-INCREMENT path to rustc's correct
WAKE/INCREMENT discrimination — apples to oranges.

This run skips loom in CI so we can A/B against the loom-on result and
validate the hypothesis. CMakeLists' find_program(LOOM) fails when
loom isn't on PATH, falling through to wasm-opt -> synth without loom.

Filed for follow-up: pulseengine/loom optimizer bug. The hoisting is
unsound for this control-flow pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…5% delta

The container ships Renode 1.16.0 (ARG RENODE_VERSION=1.16.0 in
zephyrproject-rtos/docker-image Dockerfile.base, unchanged across
v0.28.x..v0.29.2). 1.16.1 (Feb 2026) touches several Cortex-M paths
that could shift cycle accounting on the gale instruction stream:
fixed ARMv8-M Thumb2 data-processing instructions, fixed Stack
Pointer bits[1:0] handling, fixed wrong exception when FPU is
disabled. None is labelled a cycle-counter fix in the changelog,
but Thumb-2 dispatch changes shift cycle accounting whenever an
instruction takes a different micro-op path.

Adds nightly Renode (builds.renode.io) alongside 1.16.0 and runs
the same two ELFs under both. Yields three controls:

  (a) baseline vs gale, both under nightly
      — does the gale delta change when the cycle model changes?
  (b) baseline_1.16.0 vs baseline_nightly (same ELF)
      — control: cycle-model drift on identical instructions.
  (c) gale_1.16.0 vs gale_nightly (same ELF)
      — does the model shift gale's instructions more than baseline's?
        If yes, the 1.16.0 model is mis-scoring gale-specific
        instructions and the delta is partly artifactual.

Implementation: PATH override puts /opt/renode-nightly first for the
two new run steps. Robot file unchanged (it reads ELF / BENCH_CSV_OUT
from env). Existing 1.16.0 comparison is undisturbed; nightly outputs
go to events-nightly.csv and a separate report section.

Timeout bumped 120 -> 240 min to cover all four Renode runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two audit-driven additions to the synth Renode A/B:

B6 — synth correctness control (audit P7, DO-330 §6.1.4):
   The headline -34.5% gale-vs-baseline delta has been ruled non-
   simulator-artifactual by the cycle-model A/B (controls a/b/c
   showed 0.0% drift on identical ELFs across Renode 1.16.0 and
   nightly). What remains is "could synth be miscompiling in a way
   that produces semantically-equivalent-but-faster code?" — a
   research compiler's normal failure mode.

   Adds a third gale build via rustc-direct (no synth in the chain;
   default GALE_USE_SYNTH=OFF path), runs it under 1.16.0, and
   renders an additional comparison "synth vs rustc-direct" in the
   step summary. If the two agree on handoff median within ~3%, the
   synth-emitted ELF is a faithful Cortex-M codegen of the same
   Rust source. This is the cheapest possible "tool error
   mitigation" in the DO-330 §6.1.4 sense — a redundant computation
   by a battle-tested compiler with the same input contract.

   Total Renode runs in this workflow: 5
   (baseline 1.16.0, synth 1.16.0, rustc-direct 1.16.0, baseline
   nightly, synth nightly). timeout-minutes bumped 240 -> 300.

D6 — manifest.txt artifact (audit P9):
   Single point of provenance. The bench's input layer is layered
   with branch-tip pointers (synth --branch fix/..., zephyr --mr
   gale/sem-replacement) and mutable tags (container :v0.29.0).
   Without a captured manifest, "I ran exactly this configuration"
   reduces to "trust the gale_sha and hope nothing else moved" —
   wrong by construction.

   Adds a "Compose build manifest" step at the end of the job that
   captures: rustc/cargo/synth/wasm-opt/west/robotframework/SDK
   versions, Renode 1.16.0 and nightly versions, sha256 of the
   nightly tarball, sha256 of all built ELFs, sha256 + byte-size of
   all CSV outputs, and `west list` for Zephyr fork + module shas.
   Output to /tmp/manifest.txt, included in the artifact bundle
   alongside the three ELFs themselves. The nightly tarball is
   downloaded to /tmp/renode-nightly.tgz first so the sha256 can be
   recorded (was: piped curl -> tar with no intermediate file).

   Six months from now, every reproducibility claim reduces to
   downloading the artifact, opening manifest.txt, and re-resolving
   the three sha-pinned things plus the listed branch tips.

Both YAMLs validated; no behavior changes to existing arms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Apr 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 48f54ff into main May 1, 2026
58 of 61 checks passed
@avrabe avrabe deleted the experiment/gale-via-synth branch May 1, 2026 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant