Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions benches/engine_control/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,22 @@ measures and what it does NOT measure. That file is the source of
truth for any downstream copy (blog posts, reports). Do not embed
scope claims in published copy without first updating SCOPE.md.

## Silicon-anchor protocol

Renode is the CI workhorse; **silicon captures are manual**, periodic,
and recorded directly into the repo as immutable evidence. See
[`silicon/README.md`](silicon/README.md) for the procedure, board
notes, and the `capture.sh` wrapper. Per-board configs live under
`silicon/boards/`; recorded captures land under `silicon/runs/<dated>/`
with a manifest, the firmware ELF, and the tagged events CSV.

The first supported board is the NUCLEO-G474RE (STM32G474, Cortex-M4F
+ FPU, 170 MHz) — closest production-shape silicon to the
`stm32f4_disco` Renode target. The ratio `silicon_median /
renode_median` per RPM step is what the anchor establishes; once
consistent across multiple captures it can be cited as the
Renode-silicon multiplier.

## Building

```sh
Expand Down
158 changes: 158 additions & 0 deletions benches/engine_control/silicon/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Silicon-anchor protocol — engine_control

CI runs Renode (deterministic, parallel-safe). **Silicon runs are
manual**, periodic, and hand-driven on a single shared board.
This directory contains the protocol for taking a silicon capture,
recording it as immutable evidence in the repo, and citing it as
the anchor for Renode-headlined published numbers.

## Why

Renode is per-translated-block instruction-cost simulation, not
microarchitectural simulation: no cache, no memory contention, no
pipeline modeling. The cross-Renode A/B (1.16.0 vs nightly = 0.0%
drift) ruled out simulator-version drift but did NOT rule out
Renode being systematically off vs real silicon by a fixed
multiplier. The silicon anchor settles that.

The relationship `silicon_cycles / renode_cycles = R` is what the
silicon anchor establishes. Once `R` is consistent across
multiple silicon captures over time, it can be cited as the
Renode-silicon multiplier for that bench/board combination.

## Recorded-run-in-git protocol

Every silicon run lives in
`silicon/runs/<YYYY-MM-DD>-<board>-<gale-sha>-<variant>-<tick_source>/`
and contains:

- `output.csv` — the raw UART capture (firmware-emitted)
- `events.csv` — same data, tagged through `tag_events.py`
- `manifest.txt` — board, MCU, clock, rustc/cargo versions, gale
commit SHA, ELF sha256, capture timestamp
- `firmware.elf` — the exact binary that produced the capture
- `firmware.elf.sha256` — checksum file

These directories are **immutable** once committed. To re-run the
same capture, create a new dated directory; never overwrite an
existing one. This makes any silicon citation in a blog post or
report point to a stable git URL.

CSV row counts are small (~50–500 KB per run, ~7,750 rows long
sweep). At one capture per board per major bench-relevant commit,
the repo growth is modest.

## Boards

| Board | Status | Anchors |
|---|---|---|
| `nucleo_g474re` (STM32G474RE, Cortex-M4F, 170 MHz) | scaffold ready | the existing Renode `stm32f4_disco` Cortex-M numbers |
| `esp32c3_devkit_rust1` (ESP32-C3, RV32IMC, 160 MHz) | not started | the *future* RISC-V Renode lane (separate work) |

## Capture procedure (NUCLEO-G474RE)

Hardware:
- Hardware: STMicroelectronics NUCLEO-G474RE
- Connection: USB to host (ST-Link integrated, virtual COM port at 115200 8N1)
- Programming: `west flash` via OpenOCD or pyOCD (ST-Link backend)

Host setup (one-time):
- Zephyr SDK with `arm-zephyr-eabi` toolchain
- OpenOCD or pyOCD installed (`brew install open-ocd` on macOS, or `apt install openocd`)
- Python with `pyserial` for the capture script: `pip3 install pyserial`

A publication-grade anchor on a given board is the **4-run matrix**:

| variant | tick_source | command |
|---|---|---|
| baseline | systick | `--variant baseline --tick-source systick` |
| baseline | lptim | `--variant baseline --tick-source lptim` |
| gale | systick | `--variant gale --tick-source systick` |
| gale | lptim | `--variant gale --tick-source lptim` |

The two tick-source variants exist because LPTIM has different jitter
and ISR-overhead characteristics than the Cortex-M default SysTick;
the `silicon / renode` multiplier is reported per `tick_source`.

```sh
cd $GALE_ROOT
for V in baseline gale; do
for T in systick lptim; do
bash benches/engine_control/silicon/capture.sh \
--board nucleo_g474re \
--variant "$V" \
--tick-source "$T" \
--sweep long
done
done
```

For a smoke run (does the board even talk?), drop `--sweep long`,
omit `--tick-source` (defaults to `systick`), and pick one variant.

Both invocations:

1. Build the firmware locally (no Bazel; `west build -b <board>`).
2. Compute the firmware ELF sha256.
3. Flash via `west flash`.
4. Open the board's USB CDC serial port and read until `=== END ===`
(default timeout: 30 minutes for `--sweep long`).
5. Generate `manifest.txt` from the build environment + capture
metadata.
6. Tag the raw output through `tag_events.py` (run-id auto-derived
from the date + board).
7. Write everything into a new `silicon/runs/<dir>/`.

The capture script does not commit. After all four runs are
captured and you've eyeballed `output.csv` for sanity, commit
the whole 4-run set together so analyze.py can compute the
matrix in one pass:

```sh
git add benches/engine_control/silicon/runs/<YYYY-MM-DD>-nucleo_g474re-*-{baseline,gale}-{systick,lptim}/
git commit -m "silicon: NUCLEO-G474RE 4-run anchor at gale@<short-sha>"
```

## Comparing silicon vs Renode

Once `silicon/runs/<dated-dir>-{baseline,gale}/` exist, run:

```sh
python3 benches/engine_control/analyze.py \
--baseline silicon/runs/<dir-baseline>/events.csv \
--gale silicon/runs/<dir-gale>/events.csv \
--runs 1 \
> /tmp/silicon-comparison.md
```

The analyzer renders the same baseline-vs-gale tables as for
Renode, but the metadata in the report header carries through the
silicon-run identifiers. Compare side-by-side with the Renode CI
output for the same gale SHA — the **ratio** `silicon_median /
renode_median` per RPM step is the calibration data.

If you want a single-call Renode-vs-silicon side-by-side rendering,
that's a planned analyzer extension (`--silicon-anchor <events.csv>`)
to be added once the first capture exists to test against.

## Anchor cadence

- One silicon capture per board per major bench-relevant gale
commit (e.g., when overhead compensation lands, when synth
pipeline changes, when a primitive's hot-path is rewritten).
- Each Renode-headlined publication cites the most recent matching
anchor by stable git URL.
- Three to four anchor points per board per year is enough to
claim the Renode-silicon relationship is monotonic.

## Don't

- Don't overwrite an existing `runs/<dated-dir>/` — start a new one.
- Don't combine pre-overhead-compensation and post-overhead-
compensation captures in the same comparison table; they're
different measurements (see `../SCOPE.md`).
- Don't claim WCET from silicon captures. Worst-case-observed is
not WCET. Same rule as the synthetic bench (see `../SCOPE.md`).
- Don't run silicon captures from a branch that isn't reproducible
(uncommitted changes). The manifest captures the working-tree
state, not just HEAD.
99 changes: 99 additions & 0 deletions benches/engine_control/silicon/boards/nucleo_g474re/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# NUCLEO-G474RE — silicon-anchor board notes

## Hardware

- **Board:** STMicroelectronics NUCLEO-G474RE
- **MCU:** STM32G474RET6 (Cortex-M4F + FPU + DSP, 170 MHz)
- **Memory:** 512 KB Flash, 128 KB RAM
- **Cycle counter:** DWT_CYCCNT (same as Cortex-M4F on `stm32f4_disco`)
- **Programmer:** integrated ST-Link/V3E over USB; exposes virtual
COM port for stdout
- **Upstream Zephyr support:** `nucleo_g474re` (already in the tree)

## Why this board for the anchor

Cortex-M4F + FPU at 170 MHz is the closest production-shape silicon
to the simulated `stm32f4_disco` (also Cortex-M4F + FPU at 168 MHz).
The architectural variables held constant between the synthetic and
silicon measurements are:

- ARMv7E-M instruction set (Thumb-2)
- DWT_CYCCNT cycle counter (same width, same definition)
- 3-stage in-order pipeline
- Single-cycle MUL, hardware DIV, single-precision FPU

What differs:

- Real cache effects (none on Cortex-M4 — no D-cache; flash
prefetch buffer behavior visible)
- Real bus arbitration with non-existent peripherals on this bench
(negligible — no DMA, no peripheral activity)
- Clock 170 vs 168 MHz (1.2% — accountable directly)

So the cycle ratio `silicon / renode` for `algo` and `handoff`
should be near 1.0 in steady state. Anything materially off is
information about Renode's cycle model, not about the silicon.

## Connection

USB cable from NUCLEO USB connector (CN1) to host. The ST-Link
virtual COM port appears as:

- macOS: `/dev/cu.usbmodem*`
- Linux: `/dev/ttyACM0`

Zephyr's default for this board uses LPUART1 for stdout, exposed
through ST-Link.

## Programming

`west flash` from a build directory works out of the box:

```sh
west flash -d /tmp/eng-nucleo-baseline
```

Default backend is OpenOCD. To force pyOCD:

```sh
west flash -d /tmp/eng-nucleo-baseline --runner pyocd
```

## Clock / cycle counter notes

On the G4 family, `k_cycle_get_32()` returns `SCB_DWT->CYCCNT`
directly, same as on F4. `sys_clock_hw_cycles_per_sec()` returns
the bus clock the cycle counter ticks at — verify this matches
170 MHz at runtime by reading the boot banner before relying on
absolute ns conversions.

## Kernel tick sources

The silicon-anchor protocol captures both Cortex-M SysTick and STM32
LPTIM as kernel-tick sources, since each has a different jitter,
drift, and ISR-overhead profile that the published `silicon / renode`
multiplier may be sensitive to.

| `--tick-source` | Overlay file | Notes |
|---|---|---|
| `systick` (default) | none — Cortex-M default | DWT_CYCCNT-aligned tick, ~1700 cycles per 10 µs at 170 MHz |
| `lptim` | `prj-tick-lptim.conf` | STM32 LPTIM-based tick. See clock-source caveat below. |

### LPTIM clock-source caveat

Zephyr's default LPTIM clock is LSE (32.768 kHz). The bench's
`CONFIG_SYS_CLOCK_TICKS_PER_SEC=100000` (10 µs granularity) cannot
run on a 32.768 kHz timer. To make the LPTIM variant apples-to-apples
with SysTick, layer a device-tree overlay that switches LPTIM1 onto
PCLK1 (170 MHz / prescaler).

A starter `tick-lptim.overlay` is **not** committed yet — the exact
G4 device-tree binding for the `clocks` property needs verification
against `dts/arm/st/g4/stm32g474Xe.dtsi` before it ships. Until that
overlay lands, the LPTIM variant runs at LSE-derived rates and the
manifest's `tick_source: lptim` field is the user's signal that the
two captures are not numerically comparable.

## Known issues

None yet — populate as captures happen.
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# NUCLEO-G474RE — alternate kernel-tick source: STM32 LPTIM
#
# Layered onto the bench prj.conf when capture.sh is invoked with
# `--tick-source lptim`. Compared against the SysTick variant
# (the Cortex-M default) to determine whether the silicon-anchor
# multiplier is sensitive to tick-source choice.
#
# Upstream nucleo_g474re.dts already labels &lptim1 as the
# `stm32_lp_tick_source` and sets status="okay" with LSI as its
# clock — no DT overlay needed.
#
# Per zephyr/drivers/timer/Kconfig.stm32_lptim, CONFIG_STM32_LPTIM_TIMER:
# depends on dt_nodelabel_exists(stm32_lp_tick_source) ← board DTS sets this
# depends on DT_HAS_ST_STM32_LPTIM_ENABLED ← board DTS sets status=okay
# depends on CLOCK_CONTROL && PM ← needs CONFIG_PM=y
# select TICKLESS_CAPABLE
# …so the only Kconfig fragment we actually need is `CONFIG_PM=y` (which
# auto-enables STM32_LPTIM_TIMER via its `default y`). We still set
# CORTEX_M_SYSTICK=n so the SysTick driver isn't compiled in alongside
# and racing for the system-clock-driver init slot.
#
# CAVEAT — tick rate.
# The bench prj.conf sets CONFIG_SYS_CLOCK_TICKS_PER_SEC=100000 (10 µs
# granularity). LPTIM-on-LSI runs at 32 kHz, well below 100 kHz, so
# Zephyr's tick subsystem will cap the achieved rate — the LPTIM
# variant is NOT apples-to-apples with SysTick at the bench's stated
# tick rate. The manifest's `tick_source: lptim` field is the user's
# signal that the two captures are not directly comparable; what they
# *do* show is how each tick-source's overhead/jitter profile differs
# in absolute cycles.

CONFIG_PM=y
CONFIG_CORTEX_M_SYSTICK=n
15 changes: 15 additions & 0 deletions benches/engine_control/silicon/boards/nucleo_g474re/prj.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# NUCLEO-G474RE — engine_control bench overlay
#
# Empty for now: Zephyr's nucleo_g474re defaults give us:
# - 170 MHz HCLK (PLL'd up)
# - LPUART1 console at 115200 8N1 via ST-Link VCP
# - DWT_CYCCNT enabled (Cortex-M4 default in Zephyr)
#
# Add overlay options here only if a future capture exposes a
# default that biases the measurement (e.g. interrupt priority of
# a peripheral we don't use; tickless idle behavior; etc.).
#
# Anything board-specific that *must* be on for the silicon
# measurement to be valid goes here. Anything project-wide
# (gale module enable, sweep size) stays in the main prj.conf
# overlay or the CMake invocation.
Loading
Loading