feat(capture): pipeline improvements — contact sheets, design styles, snapshot#988
feat(capture): pipeline improvements — contact sheets, design styles, snapshot#988ukimsanov wants to merge 1 commit into
Conversation
… snapshot
Capture pipeline work that came out of the 11-round website-to-video
eval branch. The wins that actually moved quality were the artifacts
agents read (contact sheets, design-styles) and the snapshot tool
visual-verification fixes; the rest are smaller follow-ons.
**Contact sheets (`contactSheet.ts`, new)**
- Replaces the embedded one-image-per-asset listing with paginated
labeled grids (3-col screenshots / 4-col raster / 5-col SVG). Each
page contains 9–15 cells with filename labels baked in via SVG
text overlay (`escapeXml` covers `&<>"'`).
- `fit: "contain"` keeps every asset visible at its real aspect
ratio; the old `fit: "cover"` cropped to the first image's box.
- Returns `string[]` (page paths) — single-page captures get one
file, multi-page produce `contact-sheet-1.jpg`, `contact-sheet-2.jpg`,
etc.
- `createSvgContactSheet` scans both `assets/svgs/` (inline-extracted
SVGs) and `assets/` root (external SVGs from `<img src="*.svg">`)
and de-dupes by filename. Sites with all-external SVGs (huly.io)
now get coverage they previously didn't.
**Design styles extractor (`designStyleExtractor.ts`, new)**
- Walks the live DOM and reads computed styles to produce
`extracted/design-styles.json`: typography hierarchy (every text
role with exact font-size / weight / line-height / letter-spacing),
button variants (background / padding / radius / shadow), card /
container / nav styles, spacing scale with base unit, border-radius
scale, box-shadow values with usage counts.
- Primary data source for DESIGN.md authoring at Step 1. Replaces
the prior "guess from screenshots" workflow.
**Snapshot tool (`snapshot.ts`)**
- HyperShader pre-rendering used to swallow the entire snapshot
capture window (every frame after the first showed the loading
overlay or final-opacity-zero exit fades). Wait signal is now
`window.__hf.shaderTransitions[].ready` (set after both warm and
cold cache paths complete); local-time seek for sub-comps means
exit fades read at their own t=0..duration, not global time.
- Gemini vision per-frame analysis runs by default (`descriptions.md`
next to the contact sheet). `--describe "custom Q"` overrides the
prompt; `--describe false` opts out.
- 3-column contact sheet generation for snapshot frames so reviewers
see all beats at a glance.
**Screenshot capture (`screenshotCapture.ts`)**
- Replaces `querySelectorAll('*') + getComputedStyle` overlay scan
with a TreeWalker that early-exits on cheap rect checks before
reaching the expensive style read. Caps at 5000 elements per page.
- Cookie/consent dismissal selectors are scoped under cookie /
consent / gdpr ancestors so we don't click "Accept invitation" or
similar unrelated buttons.
**Agent prompt (`agentPromptGenerator.ts`)**
- Auto-discovers contact-sheet page count (matches base name plus
paginated `-NNN` variants only, with regex escaping on the base
name and numeric sort for 10+ pages).
- `inferColorRole`: classifies extracted hex colors as bg-dark /
bg-light / accent / surface / neutral via luminance + saturation,
so the agent prompt shows `#533AFD (accent)` instead of bare hex.
- `design-styles.json` row is gated on `existsSync` — the upstream
write is wrapped in try/catch and may skip on failure, so the
prompt only points to files actually on disk.
**Other CLI ergonomics**
- `cli.ts`: auto-load `.env` from CWD on startup so subcommands like
`snapshot` don't need explicit `export GEMINI_API_KEY=…`. Handles
`export FOO=bar`, quoted values, inline `# comments`.
- `commands/transcribe.ts`: default output dir is the input file's
directory, not CWD. Stops the "wrote transcript.json somewhere
unexpected" footgun.
- `assetDownloader.ts`: improved asset naming uses catalog context;
de-duplicates inline SVG filenames.
- `contentExtractor.ts`: captions SVGs via Gemini (code-as-text) and
integrates them into asset descriptions.
- `tokenExtractor.ts` + `types.ts`: SVG bounding box dimensions and
new DesignStyles schema added.
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
miguel-heygen
left a comment
There was a problem hiding this comment.
Good capture pipeline work. Issues from the original #984 review addressed in the split:
- Cookie-dismiss selectors now scoped under cookie/consent/gdpr ancestors (not bare
button[id*="accept"]) - Regex interpolation uses
escapedBase— no longer fragile for special chars escapeXmlcovers all 5 entities including apostrophe--describerunning by default is now documented as intentional (opt-out with--describe false)
Contact sheet pagination, design-styles extractor, and TreeWalker overlay scan are all clean.
jrusso1020
left a comment
There was a problem hiding this comment.
Approve at 29baca1e. Magi covered the cookie-selector ancestor scope, regex escape, and escapeXml 5-entity coverage. Additive notes:
--describeresolution — PR body explicitly addresses my prior hf#984 concern: "runs by default,--describe \"custom Q\"overrides the prompt,--describe falseopts out." The three-valued semantic is well-thought-out; verify the boolean-parsing layer accepts"false"/"0"/"no"consistently with citty conventions elsewhere in the CLI (snapshot defaults to a numeric flag pattern; the docs and the parser need to agree)..envauto-load from CWD — same loader implementation I reviewed in hf#984. Safe shape (only sets if!(key in process.env), parses quoted values, silent on missing file). Standarddotenvconvention. Acceptable.- Contact sheet pagination —
agentPromptGenerator.tsauto-discovers page count by matching<base>-NNN.<ext>digits with regex metachars escaped + numeric sort. Solid; the 10+-pages numeric-sort fix is the kind of off-by-one-character edge case worth pinning by test. Recommend adding a unit test for the discovery function (12 → 9, 10, 11, 12; not lexicographic). - TreeWalker overlay scan + 5000-element cap — sensible defense-in-depth. The 5000 cap will silently truncate on extreme pages (e.g., asset-heavy CRM dashboards). Worth a one-line log when the cap is hit so we know the scan was incomplete.
- Snapshot wait signal via
window.__hf.shaderTransitions[].ready— cross-PR consistency: this assumes hyperframes core's runtime exposes that object. Verify the contract is documented in the runtime API surface (or pin it by test on the engine side).
Stack base correctly set to feat/capture-font-extractor. Direction is the bag of wins Ular identified as actually moving quality.
— Rames Jusso

What
Capture pipeline work that came out of the 11-round website-to-video eval branch. Five clusters of changes touch 12 files in
packages/cli/src/capture/andpackages/cli/src/commands/:contactSheet.ts, new)designStyleExtractor.ts, new).envauto-load from CWD, transcribe defaults output dir next to the input2 of 5 in the pipeline-quality stack. Stacked on #987.
Why
The wins that actually moved video quality across the evals were the artifacts agents read (contact sheets, design-styles JSON) and the snapshot-tool visual-verification fixes. Without the contact sheet pagination, sites with 30+ assets showed asset-1 at full size and nothing else. Without local-time seek, every beat after the first showed as black or final-opacity-zero in the snapshot. The Gemini frame-by-frame
descriptions.mdis the only objective check on what the video actually shows.How
Contact sheets (
contactSheet.ts, new) — paginated labeled grids (3-col screenshots / 4-col raster / 5-col SVG; 9–15 cells per page with filename labels via SVG text overlay).fit: "contain"keeps every asset visible at its real aspect ratio.createSvgContactSheetscans bothassets/svgs/andassets/root, de-dupes by filename — sites with all-external SVGs (huly.io) now get coverage.escapeXmlcovers&<>"'.Design styles (
designStyleExtractor.ts, new) — walks the live DOM, reads computed styles, writesextracted/design-styles.json: typography hierarchy with exact metrics, button variants, card/container/nav styles, spacing scale with base unit, border-radius scale, box-shadow values with usage counts. Primary data source for DESIGN.md at Step 1.Snapshot tool (
snapshot.ts) — wait signal is nowwindow.__hf.shaderTransitions[].ready(set after both warm and cold cache paths complete); local-time seek for sub-comps means exit fades read at their own t=0..duration.--describeresolution: runs by default,--describe "custom Q"overrides the prompt,--describe falseopts out.Screenshot capture (
screenshotCapture.ts) —querySelectorAll('*') + getComputedStyleoverlay scan replaced with a TreeWalker that early-exits on cheap rect checks before the expensive style read. Caps at 5000 elements per page. Cookie/consent dismissal selectors scoped under cookie/consent/gdpr ancestors so we don't click "Accept invitation" on unrelated buttons.Agent prompt (
agentPromptGenerator.ts) — auto-discovers contact-sheet page count (matches base name plus paginated-NNNdigits only; regex metachars escaped; numeric sort for 10+ pages).inferColorRoleclassifies extracted hex colors as bg-dark / bg-light / accent / surface / neutral so the agent prompt shows#533AFD (accent).design-styles.jsonrow gated onexistsSync.Other CLI ergonomics —
cli.tsauto-loads.envfrom CWD on startup (handlesexport FOO=bar, quoted values, inline# comments).commands/transcribe.tsdefault output dir is the input file's directory.Test plan