docs(research): pretable vs MUI scroll perf diagnostic#124
Merged
Conversation
Phase C of B2 follow-up #1. Verdict: gap is noise. The high-repeat S2/hypothesis/scroll rerun shows no meaningful MUI advantage and recommends tightening H1-sensitive repeat protocol instead of scoping a perf-fix PR. Spec: docs/superpowers/specs/2026-05-09-b2-followup-perf-diagnostic-design.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Vercel preview readyPreview: https://pretable-mrdvgvitb-cacheplane.vercel.app Updated automatically by the |
5 tasks
blove
added a commit
that referenced
this pull request
May 9, 2026
#125) PR #124 (perf-diag rerun at n=20) showed the B2 H1 "failing" verdict was a low-sample artifact: pretable 9.07 ms ± 0.20 vs MUI 9.14 ms ± 0.19, mean diff −0.065 ms inside the 2σ noise floor of 0.40 ms. The original n=3 ratio of 1.115 was sample noise, not a real regression. Five targeted corrections: - Add status/milestones/2026-05-09-b2-h1-high-repeat-correction.json overlaying the original B2 evidence with the n=20 result and correctedH1.status = "satisfied". Original B2 milestone left intact. - Rewrite the apps/website/app/bench/page.tsx prose to a parity framing at high repeats. verdictFor now respects a parityAdapters set so the table doesn't crown a "fastest" off n=3 noise; H1 status reflects the corrected verdict. - Add a min-repeat gate to scripts/bench-matrix.mjs evaluateH1: when the pretable / best-full-grid frame-p95 ratio is in the tight zone (0.9 ≤ r ≤ 1.2) AND either adapter has < 10 repeats, return insufficient with guidance to re-run at --repeats=10+. Outside the tight zone the existing behavior is unchanged. New test covers the insufficient case; existing failing test rewritten to use a clearly out-of-zone ratio (1.6) so the failing path stays exercised. - Append a 2026-05-09 entry to docs/research/repo-memory.md overturning the H1 flip narrative, with the new evaluator gate documented. AG Grid (16.7 ms p95, 1 blank gap, 2 px row-height drift) and TanStack (16.7 ms p95, 1 blank gap) status from the B2 n=3 runset is not corrected here — both are >50% above pretable, well outside the noise zone. They remain ~1.7× pretable's scroll_frame_p95_ms with quality gaps that pretable does not have. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
blove
added a commit
that referenced
this pull request
May 11, 2026
…rdict) (#133) * docs(specs): pretable scroll-with-render perf diagnostic design Three-phase research PR mirroring PR #124's pattern: high-repeat (n=20) re-run, conditional Playwright trace capture, research memo. Diagnoses whether the PR #130 cheap-render anomaly (16.4 ms vs 10.3 ms for format and heavy-render) is real or a low-sample artifact. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(plans): pretable scroll-with-render perf diagnostic plan Seven-task plan mirroring PR #124's three-phase pattern: n=20 matrix re-run, conditional Playwright trace capture, research memo. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(research): pretable scroll-with-render perf diagnostic memo Verdict: noise. The PR #130 cheap-render anomaly (16.4 ms vs ~10.3 ms for format and heavy-render at n=3) was a sampling artifact. At higher repeats, scroll-with-render is at parity with (in fact marginally faster than) the other two: | Script | n | mean (ms) | σ (ms) | | -------------------------- | --: | --------: | -------: | | scroll-with-format | 8 | 9.36 | 0.80 | | scroll-with-render | 7 | 8.97 | 0.35 | | scroll-with-heavy-render | 6 | 9.15 | 0.13 | Both 2σ pairs (cheap-vs-format, cheap-vs-heavy) are well within the noise floor. Same shape as PR #124's finding at larger magnitude. The matrix run completed only ~36% of planned repeats (Playwright flake; not investigated) but the observed σ values make the verdict unambiguous — PR #130's 6 ms gap is ~21σ away from the observed distribution. No perf-fix PR needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: prettier-format perf-diag artifacts --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
blove
added a commit
that referenced
this pull request
May 13, 2026
…get on pretable filter-text) (#134) * docs(specs, plans): interaction borderline perf diagnostic Tightens PR #131's two borderline numbers (pretable filter-text 17.7 ms, tanstack vs pretable filter-metadata 15.7/16.0 ms) via n=20 re-run. Mirrors PR #124 / PR #133's pattern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(research): interaction borderline perf diagnostic memo Verdict: pretable filter-text is real-over-budget (16.79 ± 0.31 ms at n=20); tanstack vs pretable filter-metadata is noise-tied (1.6 ms mean diff vs 23 ms 2σ noise floor — tanstack's σ at n=8 is 11.6 ms). Incidental finding: pretable filter-metadata is also over budget at the mean (17.51 ± 2.44 ms). PR #131's 16.0 ms n=3 reading was a low-end sample. Homepage prose claims filter-metadata is "clear of" the single-frame budget; that's no longer accurate. Three recommendations queued (editorial cleanup + perf-fix investigation); see memo for details. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: prettier-format borderline diag artifacts --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
blove
added a commit
that referenced
this pull request
May 13, 2026
* docs(specs): pretable wrapped-text filter perf diagnostic design Three-phase pattern (trace + analyze + memo) with conditional fix in same PR. Mirrors PR #124 / PR #133 with a Phase D escape hatch for single-cause low-risk fixes. Targets pretable's interaction scripts landing 1-2 ms over the 16 ms single-frame budget on Chromium S2/hypothesis (PR #134 finding). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(plans): pretable wrapped-text filter perf diagnostic plan Five-task plan: trace capture, manual analysis, memo, conditional fix, gates + PR. Auto-merge if memo-only; hold if a fix ships. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(bench): Playwright trace for pretable filter-text perf diagnostic * docs(research): pretable wrapped-text filter perf diagnostic memo * style(docs): prettier format perf diagnostic memo + plan + spec * chore: remove force-added trace binary; project convention is no committed traces The trace was captured locally per the spec but force-added past .gitignore — the project's standing pattern is `status/traces/*.zip` is gitignored. Removing the binary; memo updated to note the trace is local-only and that Playwright's default action-trace format doesn't capture per-function timeline data anyway (the real blocker for this investigation, separately documented in the memo's "bench-harness gap" follow-up). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(research): memo references local-only trace; no committed binary The trace path reference in the memo was pointing at a binary that was force-added past .gitignore and then removed in the prior commit. Update the reference to note the trace is local-only and the Playwright action-trace format wouldn't have given flame-graph data anyway. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
B2 follow-up #1: diagnosed the 1 ms scroll-frame-p95 gap between pretable and MUI X DataGrid Community on S2/hypothesis.
pnpm bench:matrixat 20 repeats for pretable + mui only.status/milestones/2026-05-09-perf-diag-high-repeat.scroll.json.noise.docs/research/2026-05-09-pretable-vs-mui-scroll-perf.md.Verdict
noise
Pretable averaged 9.07 ms; MUI averaged 9.14 ms. The mean gap was -0.065 ms (
pretable - mui) against a 2σ noise floor of 0.401 ms, so the original B2 n=3 MUI advantage did not survive high-repeat measurement.What's NOT in this PR
Test plan
pnpm --filter @pretable/app-bench buildpnpm bench:matrix --project=chromium --adapters=pretable,mui --scenarios=S2 --scripts=scroll --scale=hypothesis --repeats=20pnpm -w typecheckpnpm -w testpnpm -w lintpnpm format