heygen-com · jrusso1020 · Apr 28, 2026 · Apr 27, 2026 · Apr 27, 2026 · Apr 27, 2026
diff --git a/skills/remotion-to-hyperframes/SKILL.md b/skills/remotion-to-hyperframes/SKILL.md
@@ -7,26 +7,95 @@ description: Translate a Remotion (React-based) video composition into a HyperFr
 
 ## Overview
 
-Translate Remotion (React-based) video compositions into HyperFrames (HTML + GSAP) compositions. Most Remotion idioms have direct HyperFrames equivalents — the translation is mechanical for ~80% of typical compositions. This skill encodes the mapping and guards against the lossy 20%.
+Translate Remotion (React-based) video compositions into HyperFrames (HTML + GSAP) compositions. Most Remotion idioms have direct HyperFrames equivalents — the translation is mechanical for ~80% of typical compositions. This skill encodes the mapping and guards against the lossy 20% by refusing to translate patterns that don't fit HF's seek-driven model and recommending the runtime interop pattern from [PR #214](https://github.com/heygen-com/hyperframes/pull/214) instead.
+
+The skill ships with a **tiered test corpus** (T1–T4, 4 fixtures total) that grades translations against measured SSIM thresholds. Don't translate without running the eval — a translation that "looks right" but renders 0.05 SSIM lower than the validated baseline is silently wrong.
 
 ## Workflow
 
-1. **Lint the source.** Run the source-lint script against the Remotion project to surface any patterns that can't translate cleanly (React state hooks, async metadata, third-party React components). If the source uses any blocker pattern, recommend the runtime interop escape hatch (PR #214 pattern) instead of attempting a translation.
+### Step 1: Lint the source
+
+Run [`scripts/lint_source.py`](scripts/lint_source.py) over the Remotion source directory. The lint detects patterns that can't translate cleanly:
+
+- **Blockers** (refuse + recommend interop): `useState`, `useReducer`, `useEffect`/`useLayoutEffect` with non-empty deps, async `calculateMetadata`, third-party React UI libraries (MUI, Chakra, Mantine, antd, shadcn, Radix, NextUI).
+- **Warnings** (translate after dropping the construct): `@remotion/lambda` config, `delayRender`, `useCallback`, `useMemo`, custom hooks.
+- **Info** (translate with note): `staticFile`, `interpolateColors`.
+
+If any blocker fires, **stop**. Read [`references/escape-hatch.md`](references/escape-hatch.md) and surface the recommendation message. Warnings don't stop translation — drop the offending construct in step 3 and note the gap in `TRANSLATION_NOTES.md`. `@remotion/lambda` config is the canonical warning case: the skill drops the import + `renderMediaOnLambda(...)` calls but translates the rest of the composition.
+
+### Step 2: Plan the translation
+
+Read [`references/api-map.md`](references/api-map.md) — the index of every Remotion API and its HF equivalent or per-topic reference. Identify which topic references you'll need based on what the source uses:
+
+| Source contains                                                           | Load reference                                |
+| ------------------------------------------------------------------------- | --------------------------------------------- |
+| `Composition`, `defaultProps`, `schema`, `calculateMetadata`              | [`parameters.md`](references/parameters.md)   |
+| `Sequence`, `Series`, `Loop`, `AbsoluteFill`, `Freeze`                    | [`sequencing.md`](references/sequencing.md)   |
+| `useCurrentFrame`, `interpolate`, `spring`, `Easing`, `interpolateColors` | [`timing.md`](references/timing.md)           |
+| `Audio`, `Video`, `Img`, `IFrame`, `staticFile`, `delayRender`            | [`media.md`](references/media.md)             |
+| `TransitionSeries`, `@remotion/transitions`                               | [`transitions.md`](references/transitions.md) |
+| `@remotion/lottie`                                                        | [`lottie.md`](references/lottie.md)           |
+| `@remotion/google-fonts/<Family>`, `Font.loadFont`, `@font-face`          | [`fonts.md`](references/fonts.md)             |
+
+Don't load all of them — load only what the specific source needs.
+
+### Step 3: Generate the HF composition
+
+Emit `index.html` with:
+
+- Root `<div id="stage">` carrying the composition's `data-composition-id`, `data-start="0"`, `data-duration` (in seconds), `data-fps`, `data-width`, `data-height`, plus one `data-*` per scalar prop.
+- A flat list of scene divs with `data-start` / `data-duration` / `data-track-index`.
+- Inline `<style>` for layout; CSS sets the `from` state of every animated property.
+- A single `<script>` tag at the bottom containing one paused `gsap.timeline({paused: true})`. Every Remotion `useCurrentFrame()` derivation becomes a tween on this timeline at the right offset.
+- `window.__timelines["<composition-id>"] = tl;` registers the timeline with HF's runtime.
 
-2. **Scaffold the translation.** Generate a HyperFrames HTML skeleton from the Remotion source — `Composition` props become `data-*` attributes on the root `#stage` div, `<Sequence>` wrappers become elements with `data-start` / `data-duration` / `data-track-index`, `<AbsoluteFill>` becomes `<div style="position:absolute;inset:0">`. Leave timing-sensitive and easing-sensitive sections marked for refinement.
+Custom React subcomponents inline as repeated HTML using the prop interface as the template (see [`parameters.md`](references/parameters.md) for the per-instance `data-*` pattern).
 
-3. **Refine timing and easing.** Convert each `useCurrentFrame`-driven `interpolate` / `spring` call into an equivalent paused GSAP tween on the composition timeline. This is the part where translation correctness matters most — easing curves and stagger timing are what readers notice.
+### Step 4: Validate
 
-4. **Validate by frame-diff.** Render both the original Remotion composition and the translated HyperFrames composition, then compute per-frame SSIM. Threshold-based pass/fail tells the user which scenes are visually correct and which need another pass.
+Run the eval harness — [`references/eval.md`](references/eval.md) for the full guide. Quick path:
 
-5. **Document the gaps.** Any Remotion features that didn't translate (custom React subcomponents requiring manual rewrite, library transitions without a HyperFrames equivalent, etc.) get listed in a `TRANSLATION_NOTES.md` next to the output so the user can finish them or decide to use the runtime interop instead.
+```bash
+# Render Remotion baseline (after npm install in the fixture)
+cd remotion-src && npx remotion render <CompositionId> out/baseline.mp4
+
+# Render HF translation
+cd ../hf-src && npx hyperframes render --output ../hf.mp4
+
+# SSIM diff
+../../scripts/render_diff.sh ./remotion-src/out/baseline.mp4 ./hf.mp4 ./diff
+```
+
+Threshold: ~0.02 below `p05` of the source's complexity tier (see `eval.md`'s validated thresholds table). If the diff fails, run [`scripts/frame_strip.sh`](scripts/frame_strip.sh) to see _which_ frames diverged, then re-read the relevant timing/sequencing/media reference.
+
+**Critical**: both renders must use matching pixel format. Set `Config.setVideoImageFormat("png")` + `Config.setColorSpace("bt709")` in the Remotion source's `remotion.config.ts` — otherwise the diff measures encoder differences (~0.05 SSIM hit), not translation fidelity.
+
+### Step 5: Document gaps
+
+Anything that didn't translate cleanly (volume ramps dropped, custom presentations approximated, fonts substituted) gets a `TRANSLATION_NOTES.md` written next to the HF output. See [`references/limitations.md`](references/limitations.md) for the format.
 
 ## What this skill explicitly does NOT do
 
-- **Translate React state machines.** Remotion compositions that drive animation via `useState` + `useEffect` are not deterministic frame-capture targets in HyperFrames' model; recommend the runtime interop escape hatch.
-- **Translate `@remotion/lambda` configuration.** HyperFrames is single-machine today; Lambda-specific code drops with a note.
-- **Run Remotion's render pipeline alongside HyperFrames.** That's the runtime interop pattern from [PR #214](https://github.com/heygen-com/hyperframes/pull/214) — a separate problem with a separate (and existing) solution.
+- **Translate React state machines.** Compositions that drive animation via `useState` + `useEffect` are not deterministic frame-capture targets in HyperFrames' seek-driven model. Recommend the runtime interop pattern.
+- **Run Remotion's render pipeline alongside HyperFrames.** That's the runtime interop pattern from [PR #214](https://github.com/heygen-com/hyperframes/pull/214) — a separate solution for compositions that fail this skill's lint.
+
+(`@remotion/lambda` is _not_ a blocker — Lambda config is deployment, not animation. The skill drops it as a warning and translates the rest. See [`references/escape-hatch.md`](references/escape-hatch.md).)
+
+## How to grade your own translation
+
+Run the test corpus orchestrator:
+
+```bash
+./assets/test-corpus/run.sh
+```
+
+It runs T1, T2, T3 (render + diff) and T4 (lint validation), prints a per-tier pass/fail table, and emits an aggregate JSON report. Use this to verify the skill is working end-to-end on a clean checkout — and as a regression check after editing any reference.
 
-## Status
+Validated baseline (as of 2026-04-27):
 
-Skill scaffold landed; eval harness, test corpus, and translation references are added in subsequent PRs in the stack. Until then, this skill should bow out and recommend the user hand-translate or use the runtime interop pattern.
+| Tier | Composition shape                           | Mean SSIM | Threshold |
+| ---- | ------------------------------------------- | --------- | --------- |
+| T1   | single-element fade-in                      | 0.974     | 0.95      |
+| T2   | multi-scene + spring + audio + image        | 0.985     | 0.95      |
+| T3   | data-driven, custom subcomponents, count-up | 0.953     | 0.90      |
+| T4   | escape-hatch (8 lint cases)                 | 8/8 pass  | n/a       |
diff --git a/skills/remotion-to-hyperframes/assets/test-corpus/.gitignore b/skills/remotion-to-hyperframes/assets/test-corpus/.gitignore
@@ -0,0 +1 @@
+run-report.json
diff --git a/skills/remotion-to-hyperframes/assets/test-corpus/run.sh b/skills/remotion-to-hyperframes/assets/test-corpus/run.sh
@@ -0,0 +1,249 @@
+#!/usr/bin/env bash
+# run.sh — corpus orchestrator. Runs every tier and prints a pass/fail summary.
+#
+# Tiers 1-3: render Remotion baseline + HF translation, run SSIM diff,
+#            assert mean >= ssim_threshold from each fixture's expected.json.
+# Tier 4:    runs cases/validate.sh which lints each case and asserts against
+#            expected.json.
+#
+# Usage:
+#   ./run.sh                    run all tiers
+#   ./run.sh tier-1-title-card  run a single tier
+#
+# Requirements:
+#   - ffmpeg, ffprobe, python3 on PATH
+#   - node 22 (for the HF CLI)
+#   - npm (for Remotion installs)
+#   - HF CLI built at packages/cli/dist/cli.js (run `bun run --filter @hyperframes/cli build`
+#     in the repo root if missing)
+#
+# Output:
+#   <fixture>/diff/summary.json   per-fixture SSIM summary
+#   <fixture>/strip/strip.png     per-fixture comparison strip (only on fail)
+#   ./run-report.json             aggregate report
+
+set -euo pipefail
+
+THIS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+SKILL_DIR="$(cd "$THIS_DIR/../.." && pwd)"
+REPO_ROOT="$(cd "$SKILL_DIR/../.." && pwd)"
+
+LINT="$SKILL_DIR/scripts/lint_source.py"
+DIFF="$SKILL_DIR/scripts/render_diff.sh"
+STRIP="$SKILL_DIR/scripts/frame_strip.sh"
+HF_CLI="$REPO_ROOT/packages/cli/dist/cli.js"
+REPORT="$THIS_DIR/run-report.json"
+
+# Per-fixture results land here as one JSON file each, then the aggregator
+# globs them. This is safer than building JSON via bash string concatenation
+# (a fixture name containing a quote would break the previous approach).
+RESULTS_DIR="$(mktemp -d)"
+trap 'rm -rf "$RESULTS_DIR"' EXIT
+
+# T4 is lint-only — no ffmpeg or HF CLI needed. Defer the render-tier
+# toolchain checks until run_render_tier() actually runs, so
+# `./run.sh tier-4-escape-hatch` works on a clean checkout.
+require_render_tier_tools() {
+  if [[ ! -f "$HF_CLI" ]]; then
+    echo "error: HF CLI not built at $HF_CLI" >&2
+    echo "       Run 'bun run --filter @hyperframes/cli build' in $REPO_ROOT" >&2
+    return 2
+  fi
+  if ! command -v ffmpeg >/dev/null 2>&1; then
+    echo "error: ffmpeg not on PATH" >&2
+    return 2
+  fi
+  return 0
+}
+
+# Write one fixture's result as a JSON file. Values are passed via argv so
+# bash string interpolation can't corrupt the JSON or inject Python source.
+write_result() {
+  local fixture_name="$1"
+  local status="$2"
+  shift 2
+  python3 - "$RESULTS_DIR/$fixture_name.json" "$fixture_name" "$status" "$@" <<'PY'
+import json
+import sys
+
+out_path, fixture_name, status, *kvs = sys.argv[1:]
+result = {"fixture": fixture_name, "status": status}
+for i in range(0, len(kvs), 2):
+    k, v = kvs[i], kvs[i + 1]
+    try:
+        result[k] = float(v) if "." in v or v.lstrip("-").isdigit() else v
+    except ValueError:
+        result[k] = v
+with open(out_path, "w") as f:
+    json.dump(result, f)
+PY
+}
+
+# Read a top-level scalar value from a JSON file. Falls back to $3 if the
+# key is missing (used to default composition_id for older fixtures).
+read_json_value() {
+  local file="$1"
+  local key="$2"
+  local default="${3:-}"
+  python3 - "$file" "$key" "$default" <<'PY'
+import json
+import sys
+
+path, key, default = sys.argv[1], sys.argv[2], sys.argv[3]
+with open(path) as f:
+    data = json.load(f)
+val = data.get(key, default)
+print(val if val is not None else "")
+PY
+}
+
+run_render_tier() {
+  local fixture_dir="$1"
+  local fixture_name
+  fixture_name=$(basename "$fixture_dir")
+  local expected="$fixture_dir/expected.json"
+
+  if ! require_render_tier_tools; then
+    echo "  ⚠ $fixture_name: render toolchain unavailable, skipping"
+    write_result "$fixture_name" "skipped" reason "render toolchain unavailable"
+    return 0
+  fi
+
+  local threshold composition_id
+  threshold=$(read_json_value "$expected" "ssim_threshold")
+  composition_id=$(read_json_value "$expected" "composition_id" "Composition")
+
+  echo "  ▶ $fixture_name (threshold $threshold, composition $composition_id)"
+
+  if [[ -x "$fixture_dir/setup.sh" ]]; then
+    "$fixture_dir/setup.sh" >/dev/null
+  fi
+
+  if ! python3 "$LINT" "$fixture_dir/remotion-src/src/" >/dev/null; then
+    echo "    ✗ lint failed (blockers in Remotion source)"
+    write_result "$fixture_name" "fail" stage "lint"
+    return 0
+  fi
+
+  if [[ ! -d "$fixture_dir/remotion-src/node_modules" ]]; then
+    echo "    ⏳ npm install (first run)"
+    (cd "$fixture_dir/remotion-src" && npm install --silent --no-progress >/dev/null 2>&1)
+  fi
+
+  echo "    ⏳ render Remotion baseline"
+  if ! (cd "$fixture_dir/remotion-src" && \
+        npx --no-install remotion render "$composition_id" out/baseline.mp4 >/dev/null 2>&1); then
+    echo "    ✗ Remotion render failed"
+    write_result "$fixture_name" "fail" stage "remotion-render"
+    return 0
+  fi
+
+  echo "    ⏳ render HF translation"
+  if ! (cd "$fixture_dir" && \
+        node "$HF_CLI" render hf-src/ --output hf.mp4 --quiet >/dev/null 2>&1); then
+    echo "    ✗ HF render failed"
+    write_result "$fixture_name" "fail" stage "hf-render"
+    return 0
+  fi
+
+  if R2HF_SSIM_THRESHOLD="$threshold" "$DIFF" \
+      "$fixture_dir/remotion-src/out/baseline.mp4" \
+      "$fixture_dir/hf.mp4" \
+      "$fixture_dir/diff" >/dev/null; then
+    local mean
+    mean=$(read_json_value "$fixture_dir/diff/summary.json" "mean")
+    echo "    ✓ pass (mean SSIM $mean, threshold $threshold)"
+    write_result "$fixture_name" "pass" mean_ssim "$mean" threshold "$threshold"
+  else
+    local mean
+    mean=$(read_json_value "$fixture_dir/diff/summary.json" "mean")
+    echo "    ✗ fail (mean SSIM $mean, threshold $threshold)"
+    "$STRIP" \
+      "$fixture_dir/remotion-src/out/baseline.mp4" \
+      "$fixture_dir/hf.mp4" \
+      "$fixture_dir/strip" 8 >/dev/null
+    write_result "$fixture_name" "fail" stage "ssim" mean_ssim "$mean" threshold "$threshold"
+  fi
+}
+
+run_lint_tier() {
+  local fixture_dir="$1"
+  local fixture_name
+  fixture_name=$(basename "$fixture_dir")
+
+  echo "  ▶ $fixture_name (lint-only)"
+  if "$fixture_dir/validate.sh" >/dev/null 2>&1; then
+    echo "    ✓ pass (8/8 cases)"
+    write_result "$fixture_name" "pass" mode "lint"
+  else
+    echo "    ✗ fail (some cases mismatched expected.json)"
+    write_result "$fixture_name" "fail" mode "lint"
+  fi
+}
+
+echo "remotion-to-hyperframes corpus run"
+echo "=================================="
+
+for tier in tier-1-title-card tier-2-multi-scene tier-3-data-driven; do
+  if [[ -n "${1:-}" && "$1" != "$tier" ]]; then
+    continue
+  fi
+  if [[ -d "$THIS_DIR/$tier" ]]; then
+    run_render_tier "$THIS_DIR/$tier"
+  fi
+done
+
+if [[ -z "${1:-}" || "$1" == "tier-4-escape-hatch" ]]; then
+  if [[ -d "$THIS_DIR/tier-4-escape-hatch" ]]; then
+    run_lint_tier "$THIS_DIR/tier-4-escape-hatch"
+  fi
+fi
+
+# Aggregate the per-fixture JSON files into one report.
+#
+# Skipped fixtures are *not* a pass — they mean a tier didn't run because
+# tooling or fixtures were unavailable. The orchestrator exits non-zero on
+# any skip so a clean checkout that lacks the HF CLI doesn't accidentally
+# report "passed 1/4" (T4 alone) and look like the corpus is healthy.
+#
+# Single-tier mode (`./run.sh tier-N`) only writes a result file for the
+# selected tier; tiers that weren't run aren't counted as skips.
+python3 - "$RESULTS_DIR" "$REPORT" <<'PY'
+import json
+import sys
+from pathlib import Path
+
+results_dir, out_path = Path(sys.argv[1]), Path(sys.argv[2])
+results = sorted(
+    (json.loads(p.read_text()) for p in results_dir.glob("*.json")),
+    key=lambda r: r["fixture"],
+)
+
+total = len(results)
+passed = sum(1 for r in results if r["status"] == "pass")
+failed = sum(1 for r in results if r["status"] == "fail")
+skipped = sum(1 for r in results if r["status"] == "skipped")
+report = {
+    "total": total,
+    "passed": passed,
+    "failed": failed,
+    "skipped": skipped,
+    "results": results,
+}
+out_path.write_text(json.dumps(report, indent=2))
+
+print()
+print("=" * 50)
+print(f"  passed {passed}/{total}, failed {failed}, skipped {skipped}")
+print(f"  report → {out_path}")
+if skipped > 0:
+    skipped_fixtures = [r["fixture"] for r in results if r["status"] == "skipped"]
+    skipped_reasons = sorted({r.get("reason", "unknown") for r in results if r["status"] == "skipped"})
+    print()
+    print(f"  ⚠ {skipped} skipped: {', '.join(skipped_fixtures)}")
+    for reason in skipped_reasons:
+        print(f"    reason: {reason}")
+    print("  Skipped fixtures count as failures for the aggregate.")
+print("=" * 50)
+sys.exit(0 if failed == 0 and skipped == 0 else 1)
+PY
diff --git a/skills/remotion-to-hyperframes/assets/test-corpus/tier-1-title-card/.gitignore b/skills/remotion-to-hyperframes/assets/test-corpus/tier-1-title-card/.gitignore
@@ -0,0 +1,10 @@
+# Render output
+remotion-src/out/
+hf-src/out/
+hf.mp4
+diff/
+strip/
+
+# Remotion / HF dependencies
+node_modules/
+package-lock.json