feat(bench): wire autosize end-to-end + H22 comparator-parity by blove · Pull Request #127 · cacheplane/pretable

blove · 2026-05-09T05:22:27Z

Summary

Wires the autosize script end-to-end through the bench harness and adds the H22 comparator-parity hypothesis evaluator. Closes B2 follow-up #3 — the autosize gap captured in the 2026-05-08 milestone is now resolved.

Pipeline: query-state, bench-types, and packages/bench-runner accept autosize (gated to S2 and to pretable | ag-grid | mui; tanstack returns unsupported).
Helper: measureBenchAutosizeRun(root, adapterId, autosize) in bench-runtime.ts — single-event "call-to-paint" timing (await the callback, then one rAF), reports interaction_latency_ms. Mirrors measureBenchKeySequenceRun shape.
Adapters: Pretable (grid.autosizeColumns()), AG Grid (gridApi.autoSizeColumns(colIds, false)), and MUI (apiRef.current.autosizeColumns({ includeOutliers: true }) — async on v7+) each accept onAutosizeReady and call back with a closure over their native autosize API. bench-app.tsx captures it in autosizeApiRef and dispatches on the autosize script. AG Grid's old pre-emptive mount-time autosize branch is replaced by the callback. MUI now exposes apiRef via useGridApiRef().
H22: evaluateH22(runs) in scripts/bench-matrix.mjs. Pretable autosize must complete within a 60Hz frame (≤ 16 ms) and within 10% of the best ag-grid/mui comparator on S2. Reuses H1's tight-zone min-repeat gate via a now-shared module-level COMPARATOR_PARITY_MIN_REPEATS = 10 constant.

Matrix re-run

S2/hypothesis/Chromium, all 13 scripts including autosize, repeats=3, ~5 min wall-clock. Output: status/milestones/2026-05-09-b2-with-autosize.hypotheses.json (the original 2026-05-08-b2-comparative-bench.hypotheses.json is unchanged).

Adapter	autosize interaction_latency_ms (n=3)
pretable	5.3 ms
mui	11 ms
ag-grid	(see milestone JSON)
tanstack	unsupported

H22 status

satisfied — pretable 5.3 ms vs MUI 11 ms (ratio 0.482, comfortably below the tight zone, so the n=3 min-repeat gate does not apply). A 20-repeat re-run is not required because the verdict resolved outside the tight zone.

Other status changes vs the 2026-05-08 milestone: H1 flipped from failing to satisfied (parity at n=3 with mui this run; matches the n=20 correction documented in the previous repo-memory entry). No other hypotheses changed status.

What's NOT in this PR

Column-width fidelity instrumentation (whether autosize actually fits the widest cell). Latency only for v1.
Post-autosize scroll measurement. The script measures the autosize event in isolation.
A 20-repeat autosize re-run for tight statistical confidence — not needed because H22 already resolved satisfied outside the tight zone at n=3.
Website /bench page changes — the page renders only H1 today.

Test plan

pnpm -w typecheck
pnpm -w test (all suites pass; bench-matrix tests cover the 5 H22 scenarios — satisfied / failing-floor / failing-parity / insufficient-tight-zone / insufficient-no-pretable / directional-no-comparator)
pnpm -w lint
pnpm format
pnpm bench:matrix end-to-end re-run (3 repeats, 13 scripts, ~5 min)

🤖 Generated with Claude Code

End-to-end autosize harness wiring (pretable + ag-grid + mui; tanstack unsupported), with H22 comparator-parity hypothesis evaluator reusing the min-repeat gate from PR #125, and a full B2 matrix re-run with autosize included. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Six-task plan for wiring autosize through the bench harness end-to-end, adding evaluateH22 with the min-repeat gate, and re-running the B2 matrix with autosize included. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds "autosize" to the bench-runner supportedScripts allowlist (gated to S2 and to pretable | ag-grid | mui — tanstack remains unsupported per the B2 spec), to the apps/bench query-state parser, and to the BenchScriptName Extract narrow in bench-types. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds a single-event autosize latency helper that awaits the adapter's autosize callback and one rAF, reporting interaction_latency_ms as "call-to-paint" timing. Mirrors the shape of measureBenchKeySequenceRun. Also unblocks the now-accepted "autosize" script in the query-state parser by retargeting the existing fallback-to-defaults test to an unrelated bogus value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Pretable, AG Grid, and MUI adapters now publish their autosize entry point through a new onAutosizeReady callback. bench-app.tsx captures it in autosizeApiRef and dispatches measureBenchAutosizeRun on the autosize script, mirroring the updateApiRef + measureBenchUpdatesRun chain. Replaces AG Grid's pre-emptive onGridReady autosize branch (which only ran at mount) with a callback so autosize fires on bench-script dispatch. MUI now exposes apiRef via useGridApiRef so the harness can call apiRef.current.autosizeColumns({ includeOutliers: true }) — async on v7+. TanStack accepts the prop for harness uniformity but the bench-runner returns "unsupported" before the adapter ever mounts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds H22 ("pretable autosize is within a single 60Hz frame and within 10% of the best ag-grid/mui comparator on S2"). Reuses the H1 comparator-parity pattern: 16 ms single-frame floor, 10% parity band, ≥10 repeats per side before resolving a tight-zone (0.9–1.2) ratio. Hoists COMPARATOR_PARITY_MIN_REPEATS to module scope so H1 and H22 share a single source of truth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

S2/hypothesis/Chromium, all 13 scripts including autosize, repeats=3, ~5 min wall-clock. H22 satisfied: pretable autosize 5.3 ms vs MUI 11 ms (ratio 0.482, outside the tight zone — gate does not apply). H1 also flipped from failing → satisfied vs the 2026-05-08 milestone (parity at n=3 with mui this run; matches the n=20 correction documented in the previous repo-memory entry). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-05-09T05:22:31Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
pretable	Ready	Preview, Comment	May 9, 2026 5:23am

github-actions · 2026-05-09T05:28:54Z

Vercel preview ready

Preview: https://pretable-abfyptujy-cacheplane.vercel.app
Commit: 16ef3ffbea5dc07d218440891487fff6b26732c2

_{Updated automatically by the deploy-preview job.}

blove and others added 8 commits May 8, 2026 22:00

chore(format): prettier formatting for B2 follow-up #3

16ef3ff

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

blove enabled auto-merge (squash) May 9, 2026 05:22

blove merged commit cf12e5e into main May 9, 2026
13 checks passed

blove deleted the b2-followup-3-autosize branch May 9, 2026 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): wire autosize end-to-end + H22 comparator-parity#127

feat(bench): wire autosize end-to-end + H22 comparator-parity#127
blove merged 8 commits into
mainfrom
b2-followup-3-autosize

blove commented May 9, 2026

Uh oh!

vercel Bot commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blove commented May 9, 2026

Summary

Matrix re-run

H22 status

What's NOT in this PR

Test plan

Uh oh!

vercel Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 9, 2026

Vercel preview ready

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 9, 2026 •

edited

Loading