feat(skills): remotion-to-hyperframes corpus T1+T2 (3/7) by jrusso1020 · Pull Request #508 · heygen-com/hyperframes

jrusso1020 · 2026-04-27T05:28:14Z

What

The first two test fixtures the remotion-to-hyperframes skill is graded against. Each fixture is a self-contained directory:

tier-N-name/
├── remotion-src/   full Remotion project (package.json, src/, remotion.config.ts, tsconfig.json)
├── hf-src/         hand-translated HyperFrames composition (index.html)
├── expected.json   tier metadata + SSIM threshold + translation notes + measured validation
├── README.md       human walk-through of the translation choices
└── setup.sh        (T2 only) generates binary assets via ffmpeg

T1 — title-card-fade

3 s @ 30 fps, 1280×720. Single AbsoluteFill, single useCurrentFrame-driven interpolate with multi-segment input [0, 15, 75, 90] → [0, 1, 1, 0] (fade in / hold / fade out). No audio, no media, no custom components.

Validated mean SSIM: 0.974 · threshold 0.95.

T2 — title-image-outro

6 s @ 30 fps, 1280×720, three <Sequence> scenes:

TitleScene (0–2 s) — spring({damping:12, stiffness:100, mass:1}) driving scale on text
ImageScene (2–4 s) — <Img src={staticFile("square.png")}> with linear fade-in + scale
OutroScene (4–6 s) — 1-second linear fade-in
<Audio src={staticFile("music.wav")} volume={0.5} /> throughout

setup.sh generates the 200×200 PNG and 6-second silent WAV via ffmpeg so binaries stay out of the repo.

Validated mean SSIM: 0.985 · threshold 0.95. The spring → back.out(1.4) translation came out cleaner than the original ~0.05 SSIM budget anticipated.

End-to-end validation

Rendered Remotion baseline + HF translation locally, ran scripts/render_diff.sh. Both fixtures meet their thresholds with comfortable margin.

Tier	Measured mean	Measured p05	Threshold	Margin
T1	0.974	0.972	0.95	+0.022 from p05
T2	0.985	0.966	0.95	+0.016 from p05

The dominant non-translation noise floor is system font fallback divergence between Remotion's bundled Chromium and HF's chrome-headless-shell. Same font-weight: 800 renders perceptibly bolder on HF — costs ~0.025 mean SSIM and is the lower bound on what threshold can be set.

Critical Remotion config

Both fixtures' remotion.config.ts set setVideoImageFormat("png") + setColorSpace("bt709"). Remotion's default JPEG output writes yuvj420p (full-range) which costs ~0.05 SSIM vs HF's yuv420p (limited-range). Without this, T1 lands at 0.958 instead of 0.974 — render_diff would be measuring an encoder difference, not translation fidelity.

Why

#3 in the 7-PR stack.

T1 + T2 together exercise:

AbsoluteFill, Sequence
useCurrentFrame, useVideoConfig
interpolate (single-segment + multi-segment, with extrapolation)
spring
Audio, Img, staticFile

That's the bulk of what real Remotion compositions use day-to-day. T3 (PR 4) adds custom React subcomponents + Zod schemas; T4 (PR 5) covers escape-hatch cases.

Stack

#506 (1/7) — scaffold
#507 (2/7) — eval harness
this PR (3/7) — T1 + T2 fixtures
#509 (4/7) — T3 data-driven fixture
5/7 — T4 escape-hatch fixtures
6/7 — references/*.md (translation map)
7/7 — SKILL.md body + corpus orchestrator

Test plan

T1: lint_source.py reports 0 blockers / 0 warnings / 0 infos
T2: lint_source.py reports 0 blockers / 0 warnings / 2 infos (the two staticFile() references — correctly classified as translatable)
T2: setup.sh runs successfully, generates PNG + WAV
All fixture .tsx, .ts, .json, .md files pass oxfmt --check and oxlint
T1: end-to-end render + SSIM diff (mean 0.974, ≥ 0.95 threshold)
T2: end-to-end render + SSIM diff (mean 0.985, ≥ 0.95 threshold)

miguel-heygen

The fixture content is directionally useful, but the human-facing corpus docs disagree with the machine-readable thresholds. Since agents will read both, these should be made consistent before merging this layer.

miguel-heygen · 2026-04-27T20:05:52Z

+../../../scripts/render_diff.sh ./remotion-src/out/baseline.mp4 ./hf.mp4 ./diff
+```
+
+`expected.json` documents the SSIM threshold (0.97) for this fixture.


This says the fixture threshold is 0.97, but expected.json sets ssim_threshold to 0.95 and the rationale also says 0.95. Please make the README match the executable contract; otherwise the skill/reference reader gets two different gates for the same fixture.

miguel-heygen · 2026-04-27T20:05:52Z

+../../../scripts/render_diff.sh ./remotion-src/out/baseline.mp4 ./hf.mp4 ./diff
+```
+
+## Why threshold 0.92?


Same issue here: this section is written around a 0.92 threshold and compares against T1 at 0.97, but the checked-in expected.json uses 0.95 for T2 and T1 also uses 0.95. The corpus needs one source of truth, especially because the final orchestrator reads expected.json while humans/agents read this README.

jrusso1020 · 2026-04-27T23:10:15Z

@miguel-heygen — addressed in the amended commit abaa7430:

T1 README: The trailing line "expected.json documents the SSIM threshold (0.97)" is now "0.95" (matching expected.json). Calibrated mean (0.974) is also called out so the reader knows where the value comes from.

T2 README: The "Why threshold 0.92?" header and surrounding paragraph now reference 0.95 (matching expected.json). The reasoning was rewritten to reflect what calibration actually showed: spring → back.out(1.4) came in cleaner than the original 0.05-SSIM budget anticipated, so 0.95 is the gate (validated mean 0.985).

grep -n "0\.97\|0\.92" tier-1-title-card/README.md tier-2-multi-scene/README.md now only matches the validated-mean number 0.974 (one occurrence in T1's note about the calibrated SSIM), no threshold mentions.

miguel-heygen

Re-reviewed the latest head. The T1/T2 threshold documentation now matches the executable expected.json values, and I do not have remaining blockers on this layer.

Adds the deterministic eval primitives the skill calls into: scripts/render_diff.sh SSIM diff between two MP4s, JSON summary, configurable threshold scripts/frame_strip.sh side-by-side comparison strip for visual debugging scripts/lint_source.py pre-translation lint over Remotion source — blocks/warnings/infos The harness is decoupled from the render pipeline: it accepts paths to already-rendered MP4s. The skill orchestrator (PR 7) drives both renders and feeds the outputs in. This keeps the harness usable in CI, in sandboxes, and on any machine that has ffmpeg without needing the full Remotion + HyperFrames toolchain. Lint catches the patterns from the skill's out-of-scope list: - useState / useReducer (state-machine driven animation) - useEffect with deps (side effects) - async calculateMetadata (Promise-returning composition metadata) - @remotion/lambda imports - third-party React UI libraries (MUI, Chakra, Mantine, antd, shadcn, Radix, NextUI) - delayRender / useCallback / useMemo (warnings) - staticFile / interpolateColors (info — translatable but flagged) Smoke test (scripts/tests/smoke.sh) exercises all three scripts against synthetic inputs: identical ffmpeg testsrc videos pass at threshold 0.99, different ffmpeg testsrc videos fail at 0.99, frame_strip produces a strip.png, lint produces 0 blockers on a clean fixture and >=3 blockers on a fixture that uses useState + useEffect + MUI + async metadata. Validated locally: smoke.sh exits 0.

Adds the first two test fixtures the skill is graded against. Each fixture ships: - remotion-src/ full Remotion project (package.json, src/, remotion.config.ts, tsconfig.json) - hf-src/ hand-translated HyperFrames composition (index.html) - expected.json tier metadata + SSIM threshold + translation notes + measured validation - README.md human walk-through of the translation choices - setup.sh (T2 only) generates binary assets (PNG, WAV) via ffmpeg T1 — title-card-fade - 3 s @ 30 fps, 1280x720 - Single AbsoluteFill, single useCurrentFrame interpolate with multi-segment input [0,15,75,90] -> [0,1,1,0] - Validated mean SSIM 0.974, threshold 0.95 (~0.025 gap from font-fallback divergence between Remotion's bundled Chromium and HF's chrome-headless-shell) T2 — title-image-outro - 6 s @ 30 fps, 1280x720, three Sequences (TitleScene, ImageScene, OutroScene) - Exercises spring, interpolate, Audio, Img, staticFile - Spring -> GSAP back.out(1.4) translation - Validated mean SSIM 0.985, threshold 0.95 (translation came out cleaner than predicted; spring->back.out drift was smaller than the ~0.05 budget I'd expected) - setup.sh generates a 200x200 blue PNG and a 6 s silent WAV via ffmpeg so binaries stay out of the repo Calibration done end-to-end: rendered Remotion baseline + HF translation, ran scripts/render_diff.sh, set thresholds ~0.02 below measured p05. Critical Remotion config: setVideoImageFormat("png") + setColorSpace("bt709"). The default JPEG output writes yuvj420p (full-range) which costs ~0.05 SSIM vs HF's yuv420p (limited-range). Both fixtures' remotion.config.ts encode this so render_diff.sh measures translation fidelity, not encoder differences. Both fixtures lint clean (0 blockers via scripts/lint_source.py). T2 staticFile() references correctly flagged as info-level findings. The fixtures are not yet wired into CI — that comes with PR 7's orchestrator. For now, render and eval are documented in each README and run by hand.

jrusso1020 mentioned this pull request Apr 27, 2026

feat(skills): remotion-to-hyperframes corpus T3 (4/7) #509

Merged

4 tasks

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from d8242cd to bb3e7d5 Compare April 27, 2026 17:05

jrusso1020 force-pushed the skill/r2hf-eval-harness branch from 51c058f to 00de7e4 Compare April 27, 2026 18:40

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from bb3e7d5 to 08fa028 Compare April 27, 2026 18:40

This was referenced Apr 27, 2026

feat(skills): remotion-to-hyperframes corpus T4 (5/7) #515

Merged

feat(skills): remotion-to-hyperframes references (6/7) #516

Merged

feat(skills): remotion-to-hyperframes SKILL.md + orchestrator (7/7) #517

Merged

miguel-heygen approved these changes Apr 27, 2026

View reviewed changes

miguel-heygen requested changes Apr 27, 2026

View reviewed changes

jrusso1020 force-pushed the skill/r2hf-eval-harness branch from 00de7e4 to 2a309f2 Compare April 27, 2026 23:03

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from 08fa028 to abaa743 Compare April 27, 2026 23:04

jrusso1020 requested a review from miguel-heygen April 27, 2026 23:11

miguel-heygen approved these changes Apr 27, 2026

View reviewed changes

jrusso1020 force-pushed the skill/r2hf-eval-harness branch from 2a309f2 to 90845dd Compare April 27, 2026 23:31

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from abaa743 to 2649d6d Compare April 27, 2026 23:31

jrusso1020 requested a review from miguel-heygen April 27, 2026 23:34

jrusso1020 force-pushed the skill/r2hf-eval-harness branch from 90845dd to 70e0b8b Compare April 27, 2026 23:54

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from 2649d6d to 9ff46d7 Compare April 27, 2026 23:54

jrusso1020 marked this pull request as ready for review April 28, 2026 00:29

jrusso1020 changed the base branch from skill/r2hf-eval-harness to main April 28, 2026 05:13

jrusso1020 merged commit 4ab4576 into main Apr 28, 2026
20 checks passed

jrusso1020 deleted the skill/r2hf-corpus-t1-t2 branch April 28, 2026 05:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): remotion-to-hyperframes corpus T1+T2 (3/7)#508

feat(skills): remotion-to-hyperframes corpus T1+T2 (3/7)#508
jrusso1020 merged 2 commits into
mainfrom
skill/r2hf-corpus-t1-t2

jrusso1020 commented Apr 27, 2026 •

edited

Loading

Uh oh!

miguel-heygen left a comment

Uh oh!

miguel-heygen Apr 27, 2026

Uh oh!

miguel-heygen Apr 27, 2026

Uh oh!

jrusso1020 commented Apr 27, 2026 •

edited

Loading

Uh oh!

miguel-heygen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jrusso1020 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

T1 — title-card-fade

T2 — title-image-outro

End-to-end validation

Critical Remotion config

Why

Stack

Test plan

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

miguel-heygen Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

miguel-heygen Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

jrusso1020 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jrusso1020 commented Apr 27, 2026 •

edited

Loading

jrusso1020 commented Apr 27, 2026 •

edited

Loading