feat(skill): website-to-hyperframes — concept-first authoring + per-beat read protocol#990
feat(skill): website-to-hyperframes — concept-first authoring + per-beat read protocol#990ukimsanov wants to merge 8 commits into
Conversation
…eat read protocol Rewrite of the website-to-hyperframes skill that came out of 11 evaluation rounds. The honest read of those evals: prose-only guidance had hit its ceiling — sub-agents kept reporting "0 errors, looks good" without doing the work, producing slideshow-quality videos with mismatched brand colors, missing logos, and beats that didn't serve the storyboard. This restructure addresses the failure modes that real videos showed, not theoretical ones. **Step structure (replaces 7-step layout with concept-first 6-step)** Old: capture → design → script → storyboard → vo → build → validate New: capture → design → brief → storyboard → vo → build → validate The brief step (Step 2) is new: a conversation-shaped step that aligns message + audience + arc before any beat-writing happens. Concept-first throughout — message → arc → beats that serve the arc → which assets and techniques bring each beat to life. **Step 0 (capture)** - "View the contact sheets — carefully, every cell, not a glance" closes the failure mode where agents reported "viewed the contact sheet" after one scroll and later wrote beats referencing assets that didn't exist or missed the brand logo. - Names the right artifacts to read in order (tokens.json → design-styles.json → asset-descriptions.md → fonts-manifest.json), with read-on-demand guidance for the rest. **Step 1 (design)** - DESIGN.md authoring guide. Restored component CSS sections (Component Stylings, Spacing & Layout, Depth & Elevation) that earlier batches over-collapsed. **Step 2 (brief)** - Strategy/messaging step. Clear instruction for "Surprise me" / minimal direction: state the minimum context (where the video runs, who it's for) and proceed bold. **Step 3 (storyboard + script)** - Concept gate at the top — answer "what makes this video distinct" before writing beat 1. - Brand-floor MUST rules (logo in opener + closer; signature visual somewhere in the video). - Captured assets (SVG logos, illustrations, hero art, gradients) are first-class beat content alongside composed UIs — many of them carry beats outright. The constraint is only that you start from the message, not the asset inventory. **Step 4 (vo)** - TTS ranking: HeyGen first (auto word timestamps), ElevenLabs second, Kokoro free. Audio timing reconciliation gate: if actual audio duration ≠ storyboard planned ±15%, rescale beats or trim script before Step 5. **Step 5 (build) + beat-builder-guide.md** - Sub-agent template now pastes brand values inline rather than telling the sub-agent to re-read DESIGN.md. Targeted file reads with specific sections + line ranges. - "Patterns that ARE shots" affirmative list (captured logo draw-on, hero illustration push-in, captured screenshot with parallax layers, kinetic typography over captured asset). - Webpage-mimicry patterns (full CSS browser chrome, parked-camera composition, ±2px breathing motion) marked ⚠ rather than ❌ — fine when the storyboard genuinely calls for them as the subject. - Required cinematography per beat: shot type, camera move, depth strategy, purpose. **Step 6 (validate) — per-beat read protocol** This replaces the previous "spawn verify-beats CLI" gate. A grep of composition HTML can catch structural lies (missing hex codes, wrong asset paths) but it can't catch boring beats, off-screen logos, GSAP timelines that only cover the first 2 seconds, or camera moves that don't match the storyboard. Those failures only surface when somebody opens the file and reads it. Per-beat verdict template names the brand hex codes used, captured asset paths referenced, headline `font-size`, GSAP timeline coverage, and storyboard alignment. Critic sub-agent scores a "Captured asset utilization" dimension specifically so the eval captures whether captured SVGs/illustrations carried beats or got recreated as divs. **Asset bundle** - 20 Pixabay-licensed SFX files with `CREDITS.md` documenting provenance. SFX assignment moved to Step 3 (creative decision) so Step 5 implements rather than improvises. - Capabilities reference + html-in-canvas-patterns updated: Three.js 0.181.2 + ESM jsm imports, mulberry32 seeded PRNG for deterministic shatter, 24-effect text-animation catalog referenced (catalog itself lands in the hyperframes-skill PR). - Visual vocabulary rewritten: replaces user-word lookup tables with brand-first derivation across 6 axes; user words land as modifiers, not replacements.
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
- Delete `references/visual-vocabulary.md` and scrub the four call sites that referenced it. The 6-axis lookup framing it introduced contradicted the rest of the skill's "design from the brand, not from a table" stance. - Replace all `npx tsx packages/cli/src/cli.ts <cmd>` invocations with `npx hyperframes <cmd>` in step-0-capture.md, step-5-build.md, step-6-validate.md, and beat-builder-guide.md. The capture- and snapshot-pipeline improvements that previously required the local CLI now ship in the published CLI via the stack's PRs #987 and #988, so once the stack lands the published CLI is the right invocation for the skill prose. - Remove the now-contradictory "ALWAYS use the local CLI — never npx hyperframes" warnings in step-0-capture.md and step-6-validate.md.
SKILL.md grew to 192 lines from a 124-line baseline. Most of the
bloat was content duplicated in the step reference files it points
to. Removed 6 sections that duplicated step content, composed 2
small additions into the step files where they actually belonged.
Removed from SKILL.md (already covered elsewhere):
- "Take your time" / "Quality matters more than speed" paragraph
— operational philosophy already implicit in step-6-validate's
cell-by-cell review prose.
- "Creative Tension Principle" section — step-3-storyboard.md:21
already has the exact "What makes this video different from a
generic [video type] for any [industry] brand?" single-sentence
test. Duplicate removed; storyboard is the right home.
- "Step -1: What we're actually making" (30 lines: anti-patterns,
video grammar, shot framing, camera moves) — duplicates step-3-
storyboard.md:197+ (shot types), :229–232 (anti-patterns), and
beat-builder-guide.md:126+ (shot framing).
- "Sub-agent mode" + "No sub-agents" preamble — step-5-build.md:286
–292 already handles both parallel and serial runtimes.
- "Image-viewing capability" warning — operationally implicit in
step-0 ("View the contact sheets") and step-6 ("View snapshots/
contact-sheet.jpg cell-by-cell").
- "User Interaction Points" table — redundant with the inline 💬
markers on Steps 3 and 4.
Composed into step files (content that wasn't there yet):
- step-1-design.md "Target length" paragraph: added the fast-pacing
/ billboard-per-beat exception (50-line DESIGN.md is enough when
beats are single hero elements on full-bleed backgrounds, not
full UIs).
- step-2-brief.md "Surprise me" section: added the global-propagation
rule — when the user signals autonomous mode at Step 2, every 💬
gate downstream (Step 3 storyboard approval, Step 4 TTS choice) is
also skipped.
Step 5 SKILL.md gate paragraph trimmed from a 6-clause description
of the per-beat read to one line that points at step-5-build.md
for the full checklist.
Updated the techniques.md reference counts from "20" to "13" in
SKILL.md, beat-builder-guide.md, and step-3-storyboard.md to match
the techniques.md trim in the upstream branch.
Net: SKILL.md 192 → 131 lines.
Step 0 had bloated to 91 lines that did the work of Steps 1–3: viewing contact sheets cell-by-cell, reading 8 data files, listing promising assets, inferring product purpose / audience / value prop / brand voice. That meant the agent did all the heavy lifting upfront, produced summaries that went stale before they were used, and the actual "run the capture" instruction was buried. Step 0 now owns only what Step 0 is: run the capture command, sanity-check it succeeded, hand off. 91 → 55 lines. Moved (composed into destination files, verified each was the right home before adding): - Read tokens.json + design-styles.json → step-1-design.md replaces the passive "you read these in Step 0" line with an active "Read these now — primary data source for Sections 3–6." - Contact-sheet "every cell, name 5 assets per page" anti-glance prose → step-3-storyboard.md asset-discovery bullet (which already covered contact-sheet viewing generally, now strengthened with the anti-glance rule). - Strategic site summary (product / audience / voice / value prop) → step-2-brief.md absorbed this; the brief itself IS the summary. Replaced "After presenting the site summary (from Step 0)" with step-2 grounding itself by reading DESIGN.md + asset-descriptions + visible-text directly. Step 0's new structure: - Run the capture (CLI command + project-dir convention) — unchanged - Confirm it succeeded (1-line summary, error-out on bad capture) - Reference table mapping each capture/ file to the step that first reads it (explicit "DO NOT read these here") - Gate: capture exits 0 + counts non-zero
Cleans up two related overcorrections that crept across the skill
prose: (a) "compose UIs from divs/SVG/CSS" repeated 6+ times in
step-1, anchoring agents to website-shaped beats; (b) "every beat's
primary visual stays composed from divs / SVG / CSS / GSAP" and
"captured assets are accents — they decorate, they don't carry"
overstatements in step-3 and step-5 that contradicted the dial-back
done earlier in this stack.
The real framing: a beat composes from whatever primitives the scene
needs — HTML/CSS, SVG, captured assets, WebGL, Canvas, Three.js,
kinetic typography, Lottie — alone or in combination. They're inputs
to one output (the video frame). No rule maps intent → primitive.
The narrow no-go is one rule: never paste a product-UI screenshot as
load-bearing content (the slideshow pattern).
step-1-design.md (8 edits):
- L5 intro: drop "composed from divs/SVG/CSS at build time" detail.
- L7 length: drop "compose UIs from scratch (divs/SVG/CSS)" framing;
merge L290's "over-investing in prose" caveat in.
- L97: "composing UIs from divs in Step 5" → "building beats".
- L161: "compose the X UI" → "a beat featuring the X".
- L290: duplicate length bullet — deleted.
- L293: "sub-agents compose UIs at build time from divs/SVG/CSS..."
→ "No separate Components section — Quick Reference is where
components live."
step-3-storyboard.md (3 edits):
- L3 (intro): "alongside composed UIs" → "alongside composed beats".
- L276 ("Compose the load-bearing visuals yourself") paragraph
replaced with the primitive-toolkit framing — toolkit is open, the
only no-go is product-UI screenshots as load-bearing content.
- L381–383 ("The bar:") three bullets collapsed to one bullet:
primary visuals use whatever combination the scene needs; accents
are optional; brand-floor minimums are the minimum.
step-5-build.md (2 edits):
- L104 stacked-beats intro: "composed from divs, SVG, canvas, and
CSS. Never a full-bleed screenshot." → "composes from whatever
primitives the storyboard called for ... Narrow no-go: never a
full-bleed product-UI screenshot as load-bearing content."
- L147: "Build the UI element from divs and CSS" → "Build the
element from divs and CSS" — drops the UI bias since this rule
applies only when the asset IS a product-UI screenshot.
Net result: "compose from divs/SVG/CSS" mentions drop from 10+ to 0
as a generalized framing; the term survives only in concrete
examples (e.g. "cards-as-divs" when the beat is specifically a
kanban demo) where divs/CSS IS the right answer.
Three follow-ups caught by a post-restructure audit pass. All three were places where the earlier "compose primary, asset is accent" framing survived after the step-3 and step-5 paragraphs already got the primitive-toolkit rewrite. Cleans up the contradiction so the skill speaks with one voice: captured assets can be primary content; the narrow no-go is just pasting product-UI screenshots. - step-2-brief.md:80 — the "flip it" example said agents should reframe "the hero illustration centers the opener" into "kinetic typography ... hero illustration as ambient depth." That reverses the dial-back: captured illustrations CAN center an opener. The flip-it rule now applies narrowly to product-UI screenshots; for captured logos/illustrations/hero art, no flip is needed. - step-2-brief.md:149 — option-template guidance said "primary content is 'the screenshot of X'" was forbidden. Narrowed to "primary content is a pasted product-UI screenshot." Other captured assets (SVG logos, illustrations, hero art) are valid primaries when the concept calls for them. - step-3-storyboard.md:314 — Common-accent-uses bullet implied accents are always layered on "composed UI." Reframed: list accent uses for when the primary is something else; when the captured asset IS the primary (logo opener, hero parallax), document it under Composition, not Accents.
… normal Second-batch audit cleanup after Ular's "logo isn't a requirement, just a nice default" correction. Three related places still framed captured-asset-primary beats as rare exceptions and the brand-floor rules as hard MUSTs — both overstatements that contradict the rest of the dial-back. Plus a TOC-only callout on capabilities.md. - step-3:300 "for the RARE beat where a captured asset is the primary visual ... defaulted to the slideshow pattern this workflow exists to break" — rewritten. Captured-asset-primary beats are a normal valid choice. The narrow no-go is just pasting product-UI screenshots full-bleed. - step-3:351 "Each one has a composed visual that carries it" — rewritten to "Each one has a primary visual that carries it (composed UI, captured asset, kinetic typography, WebGL, etc.)". - step-3:353 "assets decorate concept-defined beats; they do not seed them" — kept "do not seed" (correct: don't write a beat because of a cool asset); dropped the "decorate" framing (overgeneralized — assets can be primary too). - step-3 brand-inflection floor section: relabeled from "REQUIRED minimums" to "Brand defaults (nice-to-haves for most brand videos)". "MUST appear" softened to "for most brand videos, the logo lands in the opener and the closer" with explicit "skippable when the storyboard's concept calls for it" language. - step-3:379 "The bar:" bullet: "brand-floor minimums ... the minimum, not the ceiling" → "brand-defaults section covers most brand videos but isn't a hard requirement." - step-5:413 "Brand-floor check" section in the per-beat read protocol: relabeled "Brand-defaults check", reframed each item as a default not a fail-condition; agent checks against the storyboard's intent rather than enforcing a hard rule. - capabilities.md top: added a "Scan the TOC; do NOT read this file linearly" callout — it's a 700+ line inventory; agents should jump to the section a beat needs, not read top-to-bottom.
miguel-heygen
left a comment
There was a problem hiding this comment.
Skill restructure is well-motivated by the eval findings. Concept-first authoring order makes sense — message → arc → beats → assets. Per-beat read protocol replacing the grep-based verify-beats CLI is the right call (structural lies aren't the real quality problem — boring beats are).
SFX files: PR says Pixabay-licensed with CREDITS.md documenting provenance — good.
One note: this is a large prose change (32 files) that affects the w2h pipeline's behavior significantly. Worth a manual test run on 1-2 sites after merge to validate the new step ordering produces better output.
jrusso1020
left a comment
There was a problem hiding this comment.
Approve at bb6a3d2b. Magi covered the concept-first direction + per-beat read protocol. Additive:
-
SFX licensing — my prior hf#984 concern fully addressed.
assets/sfx/CREDITS.mdexists, cites the Pixabay Content License, and notes attribution is appreciated for transparency despite Pixabay not requiring it. 19 MP3 files + manifest.json. Perreference_vendored_content_license_check.md— this is the gold standard for vendored-content licensing. ✓ Bonus: documented despite the license not requiring it. -
Old-step-file deletion to prevent dual-pipeline confusion — the 7-step → 6-step transition deletes
step-1-capture.mdthroughstep-7-validate.mdwhile introducingstep-0-capture.mdthroughstep-5-build.md+ newstep-6-validate.md. Clean migration; verify no internal cross-references in other skill docs still point to the old filenames (a one-shot grep forstep-7-validate/step-3-script/ etc. across the OTHER skills would catch any). Not a hard blocker since the deletions are explicit. -
Per-beat read protocol enforcement gap — the design is sound (main agent opens each
compositions/beat-N.htmland reads top-to-bottom), but the failure mode it replaces ("sub-agent reports 'looks good' without doing the work") can recur at the main-agent level. The Step 6 verdict template names specific evidence (hex codes, asset paths, headline px, timeline coverage) which structurally pushes for actual reads. Worth one-shot-validating in a future eval round that main agents actually produce these verdict artifacts (not just "I read them all, looks good"). -
Step 2 (brief) as the new strategy/messaging step — concrete "Surprise me" / minimal-direction instruction is the right reframe for an open-ended user prompt. Worth comparing the resulting eval-arena video quality on the same brand with v1 (prescriptive) vs v2 (concept-first) skill — Ular cited heygenverse.com/a/c927789b-... which I'll trust for the manual eval evidence.
Stack base correctly set to feat/lint-rules. The +3023/-863 is mostly new prose (the deletions are the old 7-step files); the gross delta overstates the cognitive load of review.
— Rames Jusso
…ugh-white regression Five fixes from Ular's first-pass workflow run: 1. step-1-design.md Fonts section — sub-agents pointed @font-face for "ES Build Neutral" at the Inter .woff2 files because DESIGN.md only named families, never emitted exact src: paths. Now the Fonts section example shows per-family + per-weight file paths AND a copy-verbatim @font-face block sub-agents can paste, so there's no inference step. Adds an explicit narrative of the real failure mode and how to avoid it. 2. beat-builder-guide.md FONTS rule — was "brand fonts with capture/assets/fonts/ path need @font-face in <style>." Now: "copy the @font-face block VERBATIM from DESIGN.md. Do NOT guess which .woff2 file belongs to which family — capture filenames are content-hashed and there is no visible mapping. If DESIGN.md doesn't include exact src: paths per family, STOP and ask the main agent; never pair an arbitrary .woff2 with a family name from memory." 3. step-1-design.md Colors section — Sub-agents reproduced brand colors faithfully and hit WCAG AA failures on dark surfaces (#68686A on #18191B = 3.16:1). Now the Colors section example computes per-pairing contrast ratios with ✅/⚠/❌ markers, documents the dark-surface substitute color when the brand's own palette fails, and points at the /hyperframes-contrast skill for ratio computation. Sub-agents pick text colors by surface context, not by "this is the brand's secondary text color." 4. capabilities.md flash-through-white entry — the "ideal as invisible bridge at duration: 0.01" framing caused agents to scatter white flashes through every composition as transition bridges. The fix was documented in the branch's HANDOFF but never landed. Now: "Fade through white midpoint — a visible white flash between scenes. Use only when the brand specifically calls for a white-flash beat boundary; this is NOT a neutral 'default' transition." 5. step-6-validate.md Warnings list — adds a paragraph on WCAG contrast false positives. The validator samples at fixed timestamps; elements at opacity:0 / mid-fade get measured as if fully visible, producing spurious failures. Tells the agent to verify visually before changing colors to clear a WCAG warning — bumping a color to fix a sampling artifact changes brand identity for no real benefit.

What
Rewrite of the
website-to-hyperframesskill — the agent-driven pipeline that turns a captured website into a HyperFrames video. 11 evaluation rounds shaped this restructure; the changes here address failure modes that real videos actually exhibited, not theoretical concerns.4 of 5 in the pipeline-quality stack. Stacked on #989.
Touches
skills/website-to-hyperframes/only — 32 files / +3276/-49 lines. The standaloneskills/hyperframes/rewrite is in the next PR (#5 in the stack).Why
The honest read of 11 eval rounds: prose-only guidance had hit its ceiling. Sub-agents reported "0 errors, looks good" without doing the work, producing slideshow-quality videos with mismatched brand colors, missing logos, and beats that didn't serve the storyboard. Restructure focuses on:
compositions/beat-N.htmland reads it top-to-bottom rather than running a grep-based CLI gate. A grep can catch structural lies (missing hex codes, wrong asset paths); it cannot catch boring beats, off-screen logos, or GSAP timelines that only cover the first 2 seconds.How
Step structure — old 7-step layout (capture → design → script → storyboard → vo → build → validate) replaced with concept-first 6 steps (capture → design → brief → storyboard → vo → build → validate). Old step files (
step-1-capture.md,step-2-design.md,step-3-script.md,step-4-storyboard.md,step-5-vo.md,step-6-build.md,step-7-validate.md) deleted to prevent dual-pipeline confusion.Step 0 (capture) — "View the contact sheets — carefully, every cell, not a glance" closes the failure mode where agents reported "viewed the contact sheet" after one scroll and then wrote beats referencing assets that didn't exist. Names the right artifacts to read in order (tokens.json → design-styles.json → asset-descriptions.md → fonts-manifest.json), with read-on-demand guidance for the rest.
Step 1 (design) — DESIGN.md authoring with restored Component Stylings, Spacing & Layout, Depth & Elevation sections that earlier batches over-collapsed.
Step 2 (brief, new) — strategy/messaging step. Concrete instruction for "Surprise me" / minimal direction.
Step 3 (storyboard) — concept gate at the top, brand-floor MUST rules (logo in opener + closer; signature visual somewhere), captured assets as first-class beat content alongside composed UIs.
Step 4 (vo) — TTS ranking (HeyGen → ElevenLabs → Kokoro). Audio timing reconciliation gate.
Step 5 (build) + beat-builder-guide.md — inline brand values in the sub-agent template instead of "re-read DESIGN.md". Targeted file reads with specific sections + line ranges. "Patterns that ARE shots" affirmative list. Webpage-mimicry patterns marked ⚠ rather than ❌ — fine when the storyboard genuinely calls for them as the subject.
Step 6 (validate) — per-beat verdict template names the brand hex codes used, captured asset paths referenced, headline font-size, GSAP timeline coverage, and storyboard alignment. Critic sub-agent scores a "Captured asset utilization" dimension specifically so the eval captures whether captured SVGs/illustrations carried beats or got recreated as divs.
Assets — 20 Pixabay-licensed SFX files with
CREDITS.mddocumenting provenance. SFX assignment moved to Step 3 (creative decision) so Step 5 implements rather than improvises.Capabilities / html-in-canvas-patterns — Three.js 0.181.2 + ESM jsm imports, mulberry32 seeded PRNG for deterministic shatter, 24-effect text-animation catalog referenced (catalog itself lands in PR #5).
Visual vocabulary — rewritten to replace user-word lookup tables with brand-first derivation across 6 axes; user words land as modifiers, not replacements.
Test plan