Skip to content

feat(skill): website-to-hyperframes — concept-first authoring + per-beat read protocol#990

Open
ukimsanov wants to merge 8 commits into
feat/lint-rulesfrom
feat/skill-website-to-hyperframes
Open

feat(skill): website-to-hyperframes — concept-first authoring + per-beat read protocol#990
ukimsanov wants to merge 8 commits into
feat/lint-rulesfrom
feat/skill-website-to-hyperframes

Conversation

@ukimsanov
Copy link
Copy Markdown
Collaborator

What

Rewrite of the website-to-hyperframes skill — the agent-driven pipeline that turns a captured website into a HyperFrames video. 11 evaluation rounds shaped this restructure; the changes here address failure modes that real videos actually exhibited, not theoretical concerns.

4 of 5 in the pipeline-quality stack. Stacked on #989.

Touches skills/website-to-hyperframes/ only — 32 files / +3276/-49 lines. The standalone skills/hyperframes/ rewrite is in the next PR (#5 in the stack).

Why

The honest read of 11 eval rounds: prose-only guidance had hit its ceiling. Sub-agents reported "0 errors, looks good" without doing the work, producing slideshow-quality videos with mismatched brand colors, missing logos, and beats that didn't serve the storyboard. Restructure focuses on:

  • A concept-first authoring order (message → arc → beats → assets) instead of an asset-first one
  • A new Step 2 (brief) that aligns message + audience + arc before any beat is written
  • Required cinematography per beat (shot type, camera move, depth strategy, purpose) so a "beat" is a shot, not a centered card breathing 1px
  • A per-beat read protocol at Step 6 — main agent opens every compositions/beat-N.html and reads it top-to-bottom rather than running a grep-based CLI gate. A grep can catch structural lies (missing hex codes, wrong asset paths); it cannot catch boring beats, off-screen logos, or GSAP timelines that only cover the first 2 seconds.

How

Step structure — old 7-step layout (capture → design → script → storyboard → vo → build → validate) replaced with concept-first 6 steps (capture → design → brief → storyboard → vo → build → validate). Old step files (step-1-capture.md, step-2-design.md, step-3-script.md, step-4-storyboard.md, step-5-vo.md, step-6-build.md, step-7-validate.md) deleted to prevent dual-pipeline confusion.

Step 0 (capture) — "View the contact sheets — carefully, every cell, not a glance" closes the failure mode where agents reported "viewed the contact sheet" after one scroll and then wrote beats referencing assets that didn't exist. Names the right artifacts to read in order (tokens.json → design-styles.json → asset-descriptions.md → fonts-manifest.json), with read-on-demand guidance for the rest.

Step 1 (design) — DESIGN.md authoring with restored Component Stylings, Spacing & Layout, Depth & Elevation sections that earlier batches over-collapsed.

Step 2 (brief, new) — strategy/messaging step. Concrete instruction for "Surprise me" / minimal direction.

Step 3 (storyboard) — concept gate at the top, brand-floor MUST rules (logo in opener + closer; signature visual somewhere), captured assets as first-class beat content alongside composed UIs.

Step 4 (vo) — TTS ranking (HeyGen → ElevenLabs → Kokoro). Audio timing reconciliation gate.

Step 5 (build) + beat-builder-guide.md — inline brand values in the sub-agent template instead of "re-read DESIGN.md". Targeted file reads with specific sections + line ranges. "Patterns that ARE shots" affirmative list. Webpage-mimicry patterns marked ⚠ rather than ❌ — fine when the storyboard genuinely calls for them as the subject.

Step 6 (validate) — per-beat verdict template names the brand hex codes used, captured asset paths referenced, headline font-size, GSAP timeline coverage, and storyboard alignment. Critic sub-agent scores a "Captured asset utilization" dimension specifically so the eval captures whether captured SVGs/illustrations carried beats or got recreated as divs.

Assets — 20 Pixabay-licensed SFX files with CREDITS.md documenting provenance. SFX assignment moved to Step 3 (creative decision) so Step 5 implements rather than improvises.

Capabilities / html-in-canvas-patterns — Three.js 0.181.2 + ESM jsm imports, mulberry32 seeded PRNG for deterministic shatter, 24-effect text-animation catalog referenced (catalog itself lands in PR #5).

Visual vocabulary — rewritten to replace user-word lookup tables with brand-first derivation across 6 axes; user words land as modifiers, not replacements.

Test plan

  • Unit tests added/updated — n/a (skill prose).
  • Manual testing performed — eval-arena videos at https://www.heygenverse.com/a/c927789b-7d96-4acb-b011-8b337e4cd5e3 across 7 brand sites (arc, daylight, framer, huly, mercury, raycast, workos). The per-beat read protocol and concept-first storyboard order produced visibly better brand fidelity than the v1 (prescriptive) skill.
  • Documentation updated (if applicable) — the skill IS documentation.

…eat read protocol

Rewrite of the website-to-hyperframes skill that came out of 11
evaluation rounds. The honest read of those evals: prose-only
guidance had hit its ceiling — sub-agents kept reporting "0 errors,
looks good" without doing the work, producing slideshow-quality
videos with mismatched brand colors, missing logos, and beats that
didn't serve the storyboard. This restructure addresses the
failure modes that real videos showed, not theoretical ones.

**Step structure (replaces 7-step layout with concept-first 6-step)**

Old: capture → design → script → storyboard → vo → build → validate
New: capture → design → brief → storyboard → vo → build → validate

The brief step (Step 2) is new: a conversation-shaped step that
aligns message + audience + arc before any beat-writing happens.
Concept-first throughout — message → arc → beats that serve the arc
→ which assets and techniques bring each beat to life.

**Step 0 (capture)**

- "View the contact sheets — carefully, every cell, not a glance"
  closes the failure mode where agents reported "viewed the contact
  sheet" after one scroll and later wrote beats referencing assets
  that didn't exist or missed the brand logo.
- Names the right artifacts to read in order (tokens.json →
  design-styles.json → asset-descriptions.md → fonts-manifest.json),
  with read-on-demand guidance for the rest.

**Step 1 (design)**

- DESIGN.md authoring guide. Restored component CSS sections
  (Component Stylings, Spacing & Layout, Depth & Elevation) that
  earlier batches over-collapsed.

**Step 2 (brief)**

- Strategy/messaging step. Clear instruction for "Surprise me" /
  minimal direction: state the minimum context (where the video
  runs, who it's for) and proceed bold.

**Step 3 (storyboard + script)**

- Concept gate at the top — answer "what makes this video distinct"
  before writing beat 1.
- Brand-floor MUST rules (logo in opener + closer; signature visual
  somewhere in the video).
- Captured assets (SVG logos, illustrations, hero art, gradients)
  are first-class beat content alongside composed UIs — many of
  them carry beats outright. The constraint is only that you start
  from the message, not the asset inventory.

**Step 4 (vo)**

- TTS ranking: HeyGen first (auto word timestamps), ElevenLabs
  second, Kokoro free. Audio timing reconciliation gate: if actual
  audio duration ≠ storyboard planned ±15%, rescale beats or trim
  script before Step 5.

**Step 5 (build) + beat-builder-guide.md**

- Sub-agent template now pastes brand values inline rather than
  telling the sub-agent to re-read DESIGN.md. Targeted file reads
  with specific sections + line ranges.
- "Patterns that ARE shots" affirmative list (captured logo
  draw-on, hero illustration push-in, captured screenshot with
  parallax layers, kinetic typography over captured asset).
- Webpage-mimicry patterns (full CSS browser chrome, parked-camera
  composition, ±2px breathing motion) marked ⚠ rather than ❌ —
  fine when the storyboard genuinely calls for them as the subject.
- Required cinematography per beat: shot type, camera move, depth
  strategy, purpose.

**Step 6 (validate) — per-beat read protocol**

This replaces the previous "spawn verify-beats CLI" gate. A grep
of composition HTML can catch structural lies (missing hex codes,
wrong asset paths) but it can't catch boring beats, off-screen
logos, GSAP timelines that only cover the first 2 seconds, or
camera moves that don't match the storyboard. Those failures only
surface when somebody opens the file and reads it.

Per-beat verdict template names the brand hex codes used, captured
asset paths referenced, headline `font-size`, GSAP timeline
coverage, and storyboard alignment. Critic sub-agent scores a
"Captured asset utilization" dimension specifically so the eval
captures whether captured SVGs/illustrations carried beats or got
recreated as divs.

**Asset bundle**

- 20 Pixabay-licensed SFX files with `CREDITS.md` documenting
  provenance. SFX assignment moved to Step 3 (creative decision)
  so Step 5 implements rather than improvises.
- Capabilities reference + html-in-canvas-patterns updated:
  Three.js 0.181.2 + ESM jsm imports, mulberry32 seeded PRNG for
  deterministic shatter, 24-effect text-animation catalog
  referenced (catalog itself lands in the hyperframes-skill PR).
- Visual vocabulary rewritten: replaces user-word lookup tables
  with brand-first derivation across 6 axes; user words land as
  modifiers, not replacements.
Copy link
Copy Markdown
Collaborator Author

ukimsanov commented May 20, 2026

ukimsanov added 6 commits May 20, 2026 14:55
- Delete `references/visual-vocabulary.md` and scrub the four call
  sites that referenced it. The 6-axis lookup framing it introduced
  contradicted the rest of the skill's "design from the brand, not
  from a table" stance.
- Replace all `npx tsx packages/cli/src/cli.ts <cmd>` invocations
  with `npx hyperframes <cmd>` in step-0-capture.md, step-5-build.md,
  step-6-validate.md, and beat-builder-guide.md. The capture- and
  snapshot-pipeline improvements that previously required the local
  CLI now ship in the published CLI via the stack's PRs #987 and
  #988, so once the stack lands the published CLI is the right
  invocation for the skill prose.
- Remove the now-contradictory "ALWAYS use the local CLI — never
  npx hyperframes" warnings in step-0-capture.md and step-6-validate.md.
SKILL.md grew to 192 lines from a 124-line baseline. Most of the
bloat was content duplicated in the step reference files it points
to. Removed 6 sections that duplicated step content, composed 2
small additions into the step files where they actually belonged.

Removed from SKILL.md (already covered elsewhere):

- "Take your time" / "Quality matters more than speed" paragraph
  — operational philosophy already implicit in step-6-validate's
  cell-by-cell review prose.
- "Creative Tension Principle" section — step-3-storyboard.md:21
  already has the exact "What makes this video different from a
  generic [video type] for any [industry] brand?" single-sentence
  test. Duplicate removed; storyboard is the right home.
- "Step -1: What we're actually making" (30 lines: anti-patterns,
  video grammar, shot framing, camera moves) — duplicates step-3-
  storyboard.md:197+ (shot types), :229–232 (anti-patterns), and
  beat-builder-guide.md:126+ (shot framing).
- "Sub-agent mode" + "No sub-agents" preamble — step-5-build.md:286
  –292 already handles both parallel and serial runtimes.
- "Image-viewing capability" warning — operationally implicit in
  step-0 ("View the contact sheets") and step-6 ("View snapshots/
  contact-sheet.jpg cell-by-cell").
- "User Interaction Points" table — redundant with the inline 💬
  markers on Steps 3 and 4.

Composed into step files (content that wasn't there yet):

- step-1-design.md "Target length" paragraph: added the fast-pacing
  / billboard-per-beat exception (50-line DESIGN.md is enough when
  beats are single hero elements on full-bleed backgrounds, not
  full UIs).
- step-2-brief.md "Surprise me" section: added the global-propagation
  rule — when the user signals autonomous mode at Step 2, every 💬
  gate downstream (Step 3 storyboard approval, Step 4 TTS choice) is
  also skipped.

Step 5 SKILL.md gate paragraph trimmed from a 6-clause description
of the per-beat read to one line that points at step-5-build.md
for the full checklist.

Updated the techniques.md reference counts from "20" to "13" in
SKILL.md, beat-builder-guide.md, and step-3-storyboard.md to match
the techniques.md trim in the upstream branch.

Net: SKILL.md 192 → 131 lines.
Step 0 had bloated to 91 lines that did the work of Steps 1–3:
viewing contact sheets cell-by-cell, reading 8 data files, listing
promising assets, inferring product purpose / audience / value prop
/ brand voice. That meant the agent did all the heavy lifting
upfront, produced summaries that went stale before they were used,
and the actual "run the capture" instruction was buried.

Step 0 now owns only what Step 0 is: run the capture command,
sanity-check it succeeded, hand off. 91 → 55 lines.

Moved (composed into destination files, verified each was the right
home before adding):

- Read tokens.json + design-styles.json → step-1-design.md replaces
  the passive "you read these in Step 0" line with an active
  "Read these now — primary data source for Sections 3–6."
- Contact-sheet "every cell, name 5 assets per page" anti-glance
  prose → step-3-storyboard.md asset-discovery bullet (which already
  covered contact-sheet viewing generally, now strengthened with
  the anti-glance rule).
- Strategic site summary (product / audience / voice / value prop)
  → step-2-brief.md absorbed this; the brief itself IS the summary.
  Replaced "After presenting the site summary (from Step 0)" with
  step-2 grounding itself by reading DESIGN.md + asset-descriptions
  + visible-text directly.

Step 0's new structure:
- Run the capture (CLI command + project-dir convention) — unchanged
- Confirm it succeeded (1-line summary, error-out on bad capture)
- Reference table mapping each capture/ file to the step that
  first reads it (explicit "DO NOT read these here")
- Gate: capture exits 0 + counts non-zero
Cleans up two related overcorrections that crept across the skill
prose: (a) "compose UIs from divs/SVG/CSS" repeated 6+ times in
step-1, anchoring agents to website-shaped beats; (b) "every beat's
primary visual stays composed from divs / SVG / CSS / GSAP" and
"captured assets are accents — they decorate, they don't carry"
overstatements in step-3 and step-5 that contradicted the dial-back
done earlier in this stack.

The real framing: a beat composes from whatever primitives the scene
needs — HTML/CSS, SVG, captured assets, WebGL, Canvas, Three.js,
kinetic typography, Lottie — alone or in combination. They're inputs
to one output (the video frame). No rule maps intent → primitive.
The narrow no-go is one rule: never paste a product-UI screenshot as
load-bearing content (the slideshow pattern).

step-1-design.md (8 edits):
- L5 intro: drop "composed from divs/SVG/CSS at build time" detail.
- L7 length: drop "compose UIs from scratch (divs/SVG/CSS)" framing;
  merge L290's "over-investing in prose" caveat in.
- L97: "composing UIs from divs in Step 5" → "building beats".
- L161: "compose the X UI" → "a beat featuring the X".
- L290: duplicate length bullet — deleted.
- L293: "sub-agents compose UIs at build time from divs/SVG/CSS..."
  → "No separate Components section — Quick Reference is where
  components live."

step-3-storyboard.md (3 edits):
- L3 (intro): "alongside composed UIs" → "alongside composed beats".
- L276 ("Compose the load-bearing visuals yourself") paragraph
  replaced with the primitive-toolkit framing — toolkit is open, the
  only no-go is product-UI screenshots as load-bearing content.
- L381–383 ("The bar:") three bullets collapsed to one bullet:
  primary visuals use whatever combination the scene needs; accents
  are optional; brand-floor minimums are the minimum.

step-5-build.md (2 edits):
- L104 stacked-beats intro: "composed from divs, SVG, canvas, and
  CSS. Never a full-bleed screenshot." → "composes from whatever
  primitives the storyboard called for ... Narrow no-go: never a
  full-bleed product-UI screenshot as load-bearing content."
- L147: "Build the UI element from divs and CSS" → "Build the
  element from divs and CSS" — drops the UI bias since this rule
  applies only when the asset IS a product-UI screenshot.

Net result: "compose from divs/SVG/CSS" mentions drop from 10+ to 0
as a generalized framing; the term survives only in concrete
examples (e.g. "cards-as-divs" when the beat is specifically a
kanban demo) where divs/CSS IS the right answer.
Three follow-ups caught by a post-restructure audit pass. All three
were places where the earlier "compose primary, asset is accent"
framing survived after the step-3 and step-5 paragraphs already got
the primitive-toolkit rewrite. Cleans up the contradiction so the
skill speaks with one voice: captured assets can be primary content;
the narrow no-go is just pasting product-UI screenshots.

- step-2-brief.md:80 — the "flip it" example said agents should
  reframe "the hero illustration centers the opener" into "kinetic
  typography ... hero illustration as ambient depth." That reverses
  the dial-back: captured illustrations CAN center an opener. The
  flip-it rule now applies narrowly to product-UI screenshots; for
  captured logos/illustrations/hero art, no flip is needed.
- step-2-brief.md:149 — option-template guidance said "primary
  content is 'the screenshot of X'" was forbidden. Narrowed to
  "primary content is a pasted product-UI screenshot." Other
  captured assets (SVG logos, illustrations, hero art) are valid
  primaries when the concept calls for them.
- step-3-storyboard.md:314 — Common-accent-uses bullet implied
  accents are always layered on "composed UI." Reframed: list
  accent uses for when the primary is something else; when the
  captured asset IS the primary (logo opener, hero parallax),
  document it under Composition, not Accents.
… normal

Second-batch audit cleanup after Ular's "logo isn't a requirement,
just a nice default" correction. Three related places still framed
captured-asset-primary beats as rare exceptions and the brand-floor
rules as hard MUSTs — both overstatements that contradict the rest
of the dial-back. Plus a TOC-only callout on capabilities.md.

- step-3:300 "for the RARE beat where a captured asset is the
  primary visual ... defaulted to the slideshow pattern this
  workflow exists to break" — rewritten. Captured-asset-primary
  beats are a normal valid choice. The narrow no-go is just pasting
  product-UI screenshots full-bleed.
- step-3:351 "Each one has a composed visual that carries it" —
  rewritten to "Each one has a primary visual that carries it
  (composed UI, captured asset, kinetic typography, WebGL, etc.)".
- step-3:353 "assets decorate concept-defined beats; they do not
  seed them" — kept "do not seed" (correct: don't write a beat
  because of a cool asset); dropped the "decorate" framing
  (overgeneralized — assets can be primary too).
- step-3 brand-inflection floor section: relabeled from "REQUIRED
  minimums" to "Brand defaults (nice-to-haves for most brand
  videos)". "MUST appear" softened to "for most brand videos,
  the logo lands in the opener and the closer" with explicit
  "skippable when the storyboard's concept calls for it" language.
- step-3:379 "The bar:" bullet: "brand-floor minimums ... the
  minimum, not the ceiling" → "brand-defaults section covers most
  brand videos but isn't a hard requirement."
- step-5:413 "Brand-floor check" section in the per-beat read
  protocol: relabeled "Brand-defaults check", reframed each item
  as a default not a fail-condition; agent checks against the
  storyboard's intent rather than enforcing a hard rule.
- capabilities.md top: added a "Scan the TOC; do NOT read this file
  linearly" callout — it's a 700+ line inventory; agents should
  jump to the section a beat needs, not read top-to-bottom.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skill restructure is well-motivated by the eval findings. Concept-first authoring order makes sense — message → arc → beats → assets. Per-beat read protocol replacing the grep-based verify-beats CLI is the right call (structural lies aren't the real quality problem — boring beats are).

SFX files: PR says Pixabay-licensed with CREDITS.md documenting provenance — good.

One note: this is a large prose change (32 files) that affects the w2h pipeline's behavior significantly. Worth a manual test run on 1-2 sites after merge to validate the new step ordering produces better output.

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve at bb6a3d2b. Magi covered the concept-first direction + per-beat read protocol. Additive:

  • SFX licensing — my prior hf#984 concern fully addressed. assets/sfx/CREDITS.md exists, cites the Pixabay Content License, and notes attribution is appreciated for transparency despite Pixabay not requiring it. 19 MP3 files + manifest.json. Per reference_vendored_content_license_check.md — this is the gold standard for vendored-content licensing. ✓ Bonus: documented despite the license not requiring it.

  • Old-step-file deletion to prevent dual-pipeline confusion — the 7-step → 6-step transition deletes step-1-capture.md through step-7-validate.md while introducing step-0-capture.md through step-5-build.md + new step-6-validate.md. Clean migration; verify no internal cross-references in other skill docs still point to the old filenames (a one-shot grep for step-7-validate / step-3-script / etc. across the OTHER skills would catch any). Not a hard blocker since the deletions are explicit.

  • Per-beat read protocol enforcement gap — the design is sound (main agent opens each compositions/beat-N.html and reads top-to-bottom), but the failure mode it replaces ("sub-agent reports 'looks good' without doing the work") can recur at the main-agent level. The Step 6 verdict template names specific evidence (hex codes, asset paths, headline px, timeline coverage) which structurally pushes for actual reads. Worth one-shot-validating in a future eval round that main agents actually produce these verdict artifacts (not just "I read them all, looks good").

  • Step 2 (brief) as the new strategy/messaging step — concrete "Surprise me" / minimal-direction instruction is the right reframe for an open-ended user prompt. Worth comparing the resulting eval-arena video quality on the same brand with v1 (prescriptive) vs v2 (concept-first) skill — Ular cited heygenverse.com/a/c927789b-... which I'll trust for the manual eval evidence.

Stack base correctly set to feat/lint-rules. The +3023/-863 is mostly new prose (the deletions are the old 7-step files); the gross delta overstates the cognitive load of review.

— Rames Jusso

…ugh-white regression

Five fixes from Ular's first-pass workflow run:

1. step-1-design.md Fonts section — sub-agents pointed @font-face for
   "ES Build Neutral" at the Inter .woff2 files because DESIGN.md
   only named families, never emitted exact src: paths. Now the
   Fonts section example shows per-family + per-weight file paths
   AND a copy-verbatim @font-face block sub-agents can paste, so
   there's no inference step. Adds an explicit narrative of the
   real failure mode and how to avoid it.

2. beat-builder-guide.md FONTS rule — was "brand fonts with
   capture/assets/fonts/ path need @font-face in <style>." Now:
   "copy the @font-face block VERBATIM from DESIGN.md. Do NOT guess
   which .woff2 file belongs to which family — capture filenames
   are content-hashed and there is no visible mapping. If DESIGN.md
   doesn't include exact src: paths per family, STOP and ask the
   main agent; never pair an arbitrary .woff2 with a family name
   from memory."

3. step-1-design.md Colors section — Sub-agents reproduced brand
   colors faithfully and hit WCAG AA failures on dark surfaces
   (#68686A on #18191B = 3.16:1). Now the Colors section example
   computes per-pairing contrast ratios with ✅/⚠/❌ markers,
   documents the dark-surface substitute color when the brand's own
   palette fails, and points at the /hyperframes-contrast skill for
   ratio computation. Sub-agents pick text colors by surface
   context, not by "this is the brand's secondary text color."

4. capabilities.md flash-through-white entry — the "ideal as
   invisible bridge at duration: 0.01" framing caused agents to
   scatter white flashes through every composition as transition
   bridges. The fix was documented in the branch's HANDOFF but
   never landed. Now: "Fade through white midpoint — a visible
   white flash between scenes. Use only when the brand specifically
   calls for a white-flash beat boundary; this is NOT a neutral
   'default' transition."

5. step-6-validate.md Warnings list — adds a paragraph on WCAG
   contrast false positives. The validator samples at fixed
   timestamps; elements at opacity:0 / mid-fade get measured as if
   fully visible, producing spurious failures. Tells the agent to
   verify visually before changing colors to clear a WCAG warning
   — bumping a color to fix a sampling artifact changes brand
   identity for no real benefit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants