Skip to content

feat: add S7 pinned-inspection scenario with H9-H12 hypotheses#1

Merged
blove merged 10 commits into
mainfrom
feat/pinned-inspection-scenario
Apr 20, 2026
Merged

feat: add S7 pinned-inspection scenario with H9-H12 hypotheses#1
blove merged 10 commits into
mainfrom
feat/pinned-inspection-scenario

Conversation

@blove
Copy link
Copy Markdown
Contributor

@blove blove commented Apr 20, 2026

Summary

  • Add S7 benchmark scenario (40 cols, 3 pinned left, 3 wrapped, variable-height, multilingual) with same row counts as S2
  • Wire S7 into bench app query parsing, bench-runner validation, and matrix runner
  • Refactor evaluateH1/H6-H8 to accept explicit scenarioId parameter; add H9-H12 thin wrappers evaluating S7 with identical thresholds
  • 29 bench-matrix tests passing (7 new for H9-H12), plus new tests in scenario-data, query-state, and bench-runner

Test plan

  • CI passes (test, typecheck, lint, format, build)
  • Dev-scale matrix run with S7: pnpm bench:matrix -- --project=chromium --adapters=pretable --scenarios=S7 --scripts=scroll,sort,filter-metadata,filter-text --scale=dev --repeats=3
  • Hypothesis-scale comparative: pnpm bench:matrix -- --project=chromium --adapters=pretable,gridalpha,gridbeta,gridgamma --scenarios=S7 --scripts=scroll --scale=hypothesis --repeats=3
  • Inspect *.hypotheses.json: H9-H12 present, 9 hypotheses total

blove and others added 10 commits April 20, 2026 13:16
New benchmark scenario with 3 pinned columns on variable-height
inspection content, plus H9-H12 hypotheses mirroring S2's proof surface.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 tasks: scenario-data definition, bench app wiring, bench-runner
validation, hypothesis refactor (scenarioId param), H9-H12 tests,
full verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
40 cols, 3 pinned left, 3 wrapped, variable-height, multilingual.
Same row counts as S2. Exercises pinned-column layout overhead.
…tion scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
evaluateH1, evaluateH6-H8 now accept explicit scenarioId.
Add H9-H12 wrappers for S7. Report grows from 5 to 9 hypotheses.
Add S7 to DEFAULT_SCENARIOS.
Tests cover satisfied, failing, and insufficient states for
composite scroll quality (H9) and interaction hypotheses (H10-H12)
on the pinned-inspection scenario.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove dead `compareValues` function from derived-rows.ts and replace
destructuring-based key removal with delete to avoid unused variable lint error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Suppress react-hooks/refs and react-hooks/set-state-in-effect false
positives for legitimate patterns (sync ref updates for callbacks,
DOM measurement in useLayoutEffect).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@blove blove merged commit e0bb24c into main Apr 20, 2026
5 checks passed
blove added a commit that referenced this pull request May 9, 2026
* chore(bench): high-repeat S2/scroll milestone for B2 follow-up perf diagnostic

* docs(research): pretable vs MUI scroll perf diagnostic memo

Phase C of B2 follow-up #1. Verdict: gap is noise. The high-repeat S2/hypothesis/scroll rerun shows no meaningful MUI advantage and recommends tightening H1-sensitive repeat protocol instead of scoping a perf-fix PR.

Spec: docs/superpowers/specs/2026-05-09-b2-followup-perf-diagnostic-design.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(bench): format perf diagnostic artifacts

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant