Skip to content

feat(bench): wire autosize end-to-end + H22 comparator-parity#127

Merged
blove merged 8 commits into
mainfrom
b2-followup-3-autosize
May 9, 2026
Merged

feat(bench): wire autosize end-to-end + H22 comparator-parity#127
blove merged 8 commits into
mainfrom
b2-followup-3-autosize

Conversation

@blove
Copy link
Copy Markdown
Contributor

@blove blove commented May 9, 2026

Summary

Wires the autosize script end-to-end through the bench harness and adds the H22 comparator-parity hypothesis evaluator. Closes B2 follow-up #3 — the autosize gap captured in the 2026-05-08 milestone is now resolved.

  • Pipeline: query-state, bench-types, and packages/bench-runner accept autosize (gated to S2 and to pretable | ag-grid | mui; tanstack returns unsupported).
  • Helper: measureBenchAutosizeRun(root, adapterId, autosize) in bench-runtime.ts — single-event "call-to-paint" timing (await the callback, then one rAF), reports interaction_latency_ms. Mirrors measureBenchKeySequenceRun shape.
  • Adapters: Pretable (grid.autosizeColumns()), AG Grid (gridApi.autoSizeColumns(colIds, false)), and MUI (apiRef.current.autosizeColumns({ includeOutliers: true }) — async on v7+) each accept onAutosizeReady and call back with a closure over their native autosize API. bench-app.tsx captures it in autosizeApiRef and dispatches on the autosize script. AG Grid's old pre-emptive mount-time autosize branch is replaced by the callback. MUI now exposes apiRef via useGridApiRef().
  • H22: evaluateH22(runs) in scripts/bench-matrix.mjs. Pretable autosize must complete within a 60Hz frame (≤ 16 ms) and within 10% of the best ag-grid/mui comparator on S2. Reuses H1's tight-zone min-repeat gate via a now-shared module-level COMPARATOR_PARITY_MIN_REPEATS = 10 constant.

Matrix re-run

S2/hypothesis/Chromium, all 13 scripts including autosize, repeats=3, ~5 min wall-clock. Output: status/milestones/2026-05-09-b2-with-autosize.hypotheses.json (the original 2026-05-08-b2-comparative-bench.hypotheses.json is unchanged).

Adapter autosize interaction_latency_ms (n=3)
pretable 5.3 ms
mui 11 ms
ag-grid (see milestone JSON)
tanstack unsupported

H22 status

satisfied — pretable 5.3 ms vs MUI 11 ms (ratio 0.482, comfortably below the tight zone, so the n=3 min-repeat gate does not apply). A 20-repeat re-run is not required because the verdict resolved outside the tight zone.

Other status changes vs the 2026-05-08 milestone: H1 flipped from failing to satisfied (parity at n=3 with mui this run; matches the n=20 correction documented in the previous repo-memory entry). No other hypotheses changed status.

What's NOT in this PR

  • Column-width fidelity instrumentation (whether autosize actually fits the widest cell). Latency only for v1.
  • Post-autosize scroll measurement. The script measures the autosize event in isolation.
  • A 20-repeat autosize re-run for tight statistical confidence — not needed because H22 already resolved satisfied outside the tight zone at n=3.
  • Website /bench page changes — the page renders only H1 today.

Test plan

  • pnpm -w typecheck
  • pnpm -w test (all suites pass; bench-matrix tests cover the 5 H22 scenarios — satisfied / failing-floor / failing-parity / insufficient-tight-zone / insufficient-no-pretable / directional-no-comparator)
  • pnpm -w lint
  • pnpm format
  • pnpm bench:matrix end-to-end re-run (3 repeats, 13 scripts, ~5 min)

🤖 Generated with Claude Code

blove and others added 8 commits May 8, 2026 22:00
End-to-end autosize harness wiring (pretable + ag-grid + mui; tanstack
unsupported), with H22 comparator-parity hypothesis evaluator reusing
the min-repeat gate from PR #125, and a full B2 matrix re-run with
autosize included.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Six-task plan for wiring autosize through the bench harness end-to-end,
adding evaluateH22 with the min-repeat gate, and re-running the B2
matrix with autosize included.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds "autosize" to the bench-runner supportedScripts allowlist (gated
to S2 and to pretable | ag-grid | mui — tanstack remains unsupported
per the B2 spec), to the apps/bench query-state parser, and to the
BenchScriptName Extract narrow in bench-types.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a single-event autosize latency helper that awaits the adapter's
autosize callback and one rAF, reporting interaction_latency_ms as
"call-to-paint" timing. Mirrors the shape of measureBenchKeySequenceRun.

Also unblocks the now-accepted "autosize" script in the query-state
parser by retargeting the existing fallback-to-defaults test to an
unrelated bogus value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pretable, AG Grid, and MUI adapters now publish their autosize entry
point through a new onAutosizeReady callback. bench-app.tsx captures it
in autosizeApiRef and dispatches measureBenchAutosizeRun on the autosize
script, mirroring the updateApiRef + measureBenchUpdatesRun chain.

Replaces AG Grid's pre-emptive onGridReady autosize branch (which only
ran at mount) with a callback so autosize fires on bench-script
dispatch. MUI now exposes apiRef via useGridApiRef so the harness can
call apiRef.current.autosizeColumns({ includeOutliers: true }) — async
on v7+. TanStack accepts the prop for harness uniformity but the
bench-runner returns "unsupported" before the adapter ever mounts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds H22 ("pretable autosize is within a single 60Hz frame and within
10% of the best ag-grid/mui comparator on S2"). Reuses the H1
comparator-parity pattern: 16 ms single-frame floor, 10% parity band,
≥10 repeats per side before resolving a tight-zone (0.9–1.2) ratio.

Hoists COMPARATOR_PARITY_MIN_REPEATS to module scope so H1 and H22
share a single source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
S2/hypothesis/Chromium, all 13 scripts including autosize, repeats=3,
~5 min wall-clock. H22 satisfied: pretable autosize 5.3 ms vs MUI 11 ms
(ratio 0.482, outside the tight zone — gate does not apply).

H1 also flipped from failing → satisfied vs the 2026-05-08 milestone
(parity at n=3 with mui this run; matches the n=20 correction documented
in the previous repo-memory entry).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pretable Ready Ready Preview, Comment May 9, 2026 5:23am

@blove blove enabled auto-merge (squash) May 9, 2026 05:22
@blove blove merged commit cf12e5e into main May 9, 2026
13 checks passed
@blove blove deleted the b2-followup-3-autosize branch May 9, 2026 05:25
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Vercel preview ready

Preview: https://pretable-abfyptujy-cacheplane.vercel.app
Commit: 16ef3ffbea5dc07d218440891487fff6b26732c2

Updated automatically by the deploy-preview job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant