feat(lambda): warn when slowest chunk approaches the 15-min cap by jrusso1020 · Pull Request #1024 · heygen-com/hyperframes

jrusso1020 · 2026-05-21T23:23:32Z

What

Track the slowest RenderChunk Lambda invocation across a render. When that chunk's billed duration crosses 80% of Lambda's 15-min cap, surface a warning at the end of --wait with a --max-parallel-chunks bump suggestion sized to the actual fan-out the user ran with.

Why

The cost-analysis sweep hit Lambda's Sandbox.Timedout twice — inspector-launch at 1080p/60fps with default mpc=16 and 4K@anything — both with cryptic SFN errors. The next user to push fps or composition complexity on a heavy WebGL render will hit the same wall. Without this, the only signal they get is a generic SFN failure 15+ minutes into the render. With it, the previous successful render warned them ahead of time.

This is also the practical answer to "what's the longest render Lambda can handle?" The hard limit is chunkSize × maxParallelChunks (default 32 min @ 30fps, up to 8.5h cranked); the operational limit is per-chunk runtime, which depends on composition complexity.

How

RenderProgress.maxChunkDurationMs: new field on the SDK's progress snapshot. Tracks the max billed-duration across RenderChunk Lambda invocations only (Plan + Assemble are off-path — they have their own runtime profiles that aren't gated by the 15-min cap). Null until the first chunk reports back.
bumpMaxChunkDuration(current, state, billedMs) helper shared between the TaskSucceeded and LambdaFunctionSucceeded switch arms.
packages/aws-lambda/src/sdk/chunkRuntime.ts: new module exporting LAMBDA_TIMEOUT_MS = 900_000 and CHUNK_RUNTIME_WARN_MS = LAMBDA_TIMEOUT_MS × 0.8. Re-exported from the SDK barrel.
warnIfChunkRuntimeIsCloseToCap(progress, currentMaxParallelChunks) in the CLI: synchronous, uses the static SDK constants (no dynamic loadSDK() for a constant lookup).
suggestFanOut(current, slowestMs): scales the suggestion from the actual --max-parallel-chunks value the user ran with — doubles until projected per-chunk duration clears the threshold, rounds to the next power of two, caps at 256. So a user running mpc=64 who hits 800s gets --max-parallel-chunks 128, not the hardcoded 32 an earlier version would have suggested.

Test plan

Unit tests: tracks slowest RenderChunk duration · ignores Plan/Assemble durations · null before any chunk completes · doesn't break existing test fixtures.
All 112 aws-lambda tests pass (3 new).
Manual: render that exceeds the threshold prints the warning + the scaled fan-out suggestion at the end of --wait; renders under the threshold are silent.
Typecheck + format + lint clean across both packages.

jrusso1020 · 2026-05-21T23:23:39Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

miguel-heygen

Clean feature — the plumbing through summarizeHistory is solid, tests cover the right cases, and suggestFanOut math checks out. One issue I'd fix before merge:

Static import defeats the lazy-loading pattern (packages/cli/src/commands/lambda/render.ts)

The file converts the existing import type { ... } from "@hyperframes/aws-lambda/sdk" into a runtime import to pull CHUNK_RUNTIME_WARN_MS and LAMBDA_TIMEOUT_MS. Type-only imports are erased at compile time, but value imports execute the barrel — which re-exports deploySite.js, renderToLambda.js, getRenderProgress.js, etc., all of which pull in @aws-sdk/client-sfn, @aws-sdk/client-s3, and friends. The whole point of loadSDK() in that file was to keep the SFN/S3 clients out of the static-import head of the CLI bundle.

Since chunkRuntime.ts is a pure-constants module with zero transitive deps, import the constants from the leaf module directly:

import { CHUNK_RUNTIME_WARN_MS, LAMBDA_TIMEOUT_MS } from "@hyperframes/aws-lambda/src/sdk/chunkRuntime.js";

or, if the package doesn't expose a subpath for that, inline the two constants (they're trivial: 900_000 and 720_000). Either way, keep the barrel import as import type so the lazy-load contract holds.

Everything else looks good — bumpMaxChunkDuration correctly gates on "RenderChunk", the null-before-any-chunk semantics are clean, and the three new tests cover the key scenarios (max tracking, Plan/Assemble filtering, null baseline).

jrusso1020 · 2026-05-22T18:02:35Z

You're right — I missed that the barrel pulls the AWS SDK clients in at static-import time. The dynamic loadSDK() was load-bearing; my "lift to static import" change defeated it.

Fixed in the latest push:

Restored import type { ... } from "@hyperframes/aws-lambda/sdk" so the barrel stays type-only at compile time.
Inlined the two constants in render.ts with a comment pointing at chunkRuntime.ts as the source of truth. chunkRuntime.ts is still a pure-constants leaf module in the SDK (so anyone consuming the SDK directly still gets them), but the CLI side reads its own copies — cheap duplication, big win on bundle behavior.

Confirmed with grep that @aws-sdk/client-sfn and @aws-sdk/client-s3 don't appear in the static-import head of render.ts anymore. All 38 CLI tests + 16 getRenderProgress tests still pass.

miguel-heygen

The import issue is fixed correctly. The constants are inlined in render.ts with a clear comment pointing at chunkRuntime.ts as the source of truth. The barrel import stays out of the static-import head, so loadSDK() still lazy-loads @aws-sdk/* as intended.

Rest of the PR is solid — bumpMaxChunkDuration, suggestFanOut, the maxChunkDurationMs plumbing through summarizeHistory, and the three new tests all look good.

vanceingalls

No blockers. Good end-to-end feature. maxChunkDurationMs on RenderProgress is the right carrier — observable state any CLI consumer can act on. bumpMaxChunkDuration correctly gates on currentLambdaState === "RenderChunk" so Plan/Assemble don't inflate the max. suggestFanOut power-of-2 rounding and 256 cap are reasonable guardrails.

Important

Constant duplication between chunkRuntime.ts and the CLI inline copy of LAMBDA_TIMEOUT_MS / CHUNK_RUNTIME_WARN_MS. The comment explains the reason accurately (importing from the SDK barrel pulls in @aws-sdk/client-sfn at static import time, defeating loadSDK()). Acceptable as a shipping pragmatism, but the inline copy is a silent orphan — if someone changes the SDK constant without knowing the CLI copy exists, the warning threshold silently drifts. At minimum the inline copy should have a comment pointing to the source of truth in chunkRuntime.ts. A cleaner long-term solution is a deep sub-path export (@hyperframes/aws-lambda/sdk/chunkRuntime) that the CLI can import without the barrel penalty.

Nits

suggestFanOut linear scaling assumption: for cold-start-dominated chunks (large asset prefetch, GPU init), doubling maxParallelChunks halves frame count but may not halve runtime. Consider softening "Mitigate with: --max-parallel-chunks N" to "may help."
DEFAULT_MAX_PARALLEL_CHUNKS = 16 is hardcoded inline. If the SDK default ever changes, the suggestion math silently uses the wrong baseline. Add a co-location comment.
warnIfChunkRuntimeIsCloseToCap only fires on the success path (Assemble completed). If the render times out before reaching SUCCEEDED, the warning never fires. Defensible, just noting for future reference.

Note: Stacked — Build/Test/Typecheck did not run on this PR head.

— Vai

Render-time, post-hoc warning: when the slowest RenderChunk Lambda invocation burned through more than 80% of the 900-second cap, surface a warning at the end of --wait mode pointing at --max-parallel-chunks. The cost-analysis sweep hit this twice — inspector-launch at 1080p/60 and 4K@anything blew past the cap with default 16-way fan-out, producing a Sandbox.Timedout retry storm. The next user to push fps or duration on a heavy composition will hit the same wall; this turns a cryptic SFN failure into a one-line hint they can act on before the next render. Plumbing: - getRenderProgress tracks max billed-duration across RenderChunk invocations (the only state whose runtime is gated by the 15-min cap; Plan + Assemble are off-path). - RenderProgress.maxChunkDurationMs is null before the first chunk reports back. - LAMBDA_TIMEOUT_MS / CHUNK_RUNTIME_WARN_RATIO / CHUNK_RUNTIME_WARN_MS live in chunkRuntime.ts and are exported from the SDK so external callers (custom CLIs, monitoring) can match the threshold. - CLI's render --wait path prints the warning with a suggested --max-parallel-chunks value scaled by the observed headroom ratio. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

miguel-heygen

Approved — no issues found in review.

jrusso1020 force-pushed the fix/lambda-chromium-executable-guard branch from 3e5b059 to d76baad Compare May 22, 2026 16:09

jrusso1020 force-pushed the feat/lambda-warn-on-heavy-chunk-budget branch from 23fb168 to c591745 Compare May 22, 2026 16:09

miguel-heygen requested changes May 22, 2026

View reviewed changes

jrusso1020 force-pushed the feat/lambda-warn-on-heavy-chunk-budget branch from c591745 to 91c8489 Compare May 22, 2026 18:02

jrusso1020 force-pushed the fix/lambda-chromium-executable-guard branch from d76baad to 19b97ad Compare May 22, 2026 18:02

miguel-heygen approved these changes May 22, 2026

View reviewed changes

vanceingalls approved these changes May 22, 2026

View reviewed changes

jrusso1020 force-pushed the feat/lambda-warn-on-heavy-chunk-budget branch from 91c8489 to 4140937 Compare May 22, 2026 18:47

jrusso1020 force-pushed the fix/lambda-chromium-executable-guard branch 2 times, most recently from 82aff43 to a62f062 Compare May 22, 2026 19:09

jrusso1020 force-pushed the feat/lambda-warn-on-heavy-chunk-budget branch from 4140937 to 0afa161 Compare May 22, 2026 19:10

miguel-heygen approved these changes May 22, 2026

View reviewed changes

jrusso1020 merged commit 41352e9 into fix/lambda-chromium-executable-guard May 22, 2026
14 of 15 checks passed

jrusso1020 deleted the feat/lambda-warn-on-heavy-chunk-budget branch May 22, 2026 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lambda): warn when slowest chunk approaches the 15-min cap#1024

feat(lambda): warn when slowest chunk approaches the 15-min cap#1024
jrusso1020 merged 1 commit into
fix/lambda-chromium-executable-guardfrom
feat/lambda-warn-on-heavy-chunk-budget

jrusso1020 commented May 21, 2026 •

edited

Loading

Uh oh!

jrusso1020 commented May 21, 2026 •

edited

Loading

Uh oh!

miguel-heygen left a comment

Uh oh!

jrusso1020 commented May 22, 2026

Uh oh!

miguel-heygen left a comment

Uh oh!

vanceingalls left a comment

Uh oh!

miguel-heygen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jrusso1020 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Test plan

Uh oh!

jrusso1020 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

jrusso1020 commented May 22, 2026

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

vanceingalls left a comment

Choose a reason for hiding this comment

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jrusso1020 commented May 21, 2026 •

edited

Loading

jrusso1020 commented May 21, 2026 •

edited

Loading