Skip to content

feat(lambda): warn when slowest chunk approaches the 15-min cap#1024

Merged
jrusso1020 merged 1 commit into
fix/lambda-chromium-executable-guardfrom
feat/lambda-warn-on-heavy-chunk-budget
May 22, 2026
Merged

feat(lambda): warn when slowest chunk approaches the 15-min cap#1024
jrusso1020 merged 1 commit into
fix/lambda-chromium-executable-guardfrom
feat/lambda-warn-on-heavy-chunk-budget

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 commented May 21, 2026

What

Track the slowest RenderChunk Lambda invocation across a render. When that chunk's billed duration crosses 80% of Lambda's 15-min cap, surface a warning at the end of --wait with a --max-parallel-chunks bump suggestion sized to the actual fan-out the user ran with.

Why

The cost-analysis sweep hit Lambda's Sandbox.Timedout twice — inspector-launch at 1080p/60fps with default mpc=16 and 4K@anything — both with cryptic SFN errors. The next user to push fps or composition complexity on a heavy WebGL render will hit the same wall. Without this, the only signal they get is a generic SFN failure 15+ minutes into the render. With it, the previous successful render warned them ahead of time.

This is also the practical answer to "what's the longest render Lambda can handle?" The hard limit is chunkSize × maxParallelChunks (default 32 min @ 30fps, up to 8.5h cranked); the operational limit is per-chunk runtime, which depends on composition complexity.

How

  • RenderProgress.maxChunkDurationMs: new field on the SDK's progress snapshot. Tracks the max billed-duration across RenderChunk Lambda invocations only (Plan + Assemble are off-path — they have their own runtime profiles that aren't gated by the 15-min cap). Null until the first chunk reports back.
  • bumpMaxChunkDuration(current, state, billedMs) helper shared between the TaskSucceeded and LambdaFunctionSucceeded switch arms.
  • packages/aws-lambda/src/sdk/chunkRuntime.ts: new module exporting LAMBDA_TIMEOUT_MS = 900_000 and CHUNK_RUNTIME_WARN_MS = LAMBDA_TIMEOUT_MS × 0.8. Re-exported from the SDK barrel.
  • warnIfChunkRuntimeIsCloseToCap(progress, currentMaxParallelChunks) in the CLI: synchronous, uses the static SDK constants (no dynamic loadSDK() for a constant lookup).
  • suggestFanOut(current, slowestMs): scales the suggestion from the actual --max-parallel-chunks value the user ran with — doubles until projected per-chunk duration clears the threshold, rounds to the next power of two, caps at 256. So a user running mpc=64 who hits 800s gets --max-parallel-chunks 128, not the hardcoded 32 an earlier version would have suggested.

Test plan

  • Unit tests: tracks slowest RenderChunk duration · ignores Plan/Assemble durations · null before any chunk completes · doesn't break existing test fixtures.
  • All 112 aws-lambda tests pass (3 new).
  • Manual: render that exceeds the threshold prints the warning + the scaled fan-out suggestion at the end of --wait; renders under the threshold are silent.
  • Typecheck + format + lint clean across both packages.

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean feature — the plumbing through summarizeHistory is solid, tests cover the right cases, and suggestFanOut math checks out. One issue I'd fix before merge:

Static import defeats the lazy-loading pattern (packages/cli/src/commands/lambda/render.ts)

The file converts the existing import type { ... } from "@hyperframes/aws-lambda/sdk" into a runtime import to pull CHUNK_RUNTIME_WARN_MS and LAMBDA_TIMEOUT_MS. Type-only imports are erased at compile time, but value imports execute the barrel — which re-exports deploySite.js, renderToLambda.js, getRenderProgress.js, etc., all of which pull in @aws-sdk/client-sfn, @aws-sdk/client-s3, and friends. The whole point of loadSDK() in that file was to keep the SFN/S3 clients out of the static-import head of the CLI bundle.

Since chunkRuntime.ts is a pure-constants module with zero transitive deps, import the constants from the leaf module directly:

import { CHUNK_RUNTIME_WARN_MS, LAMBDA_TIMEOUT_MS } from "@hyperframes/aws-lambda/src/sdk/chunkRuntime.js";

or, if the package doesn't expose a subpath for that, inline the two constants (they're trivial: 900_000 and 720_000). Either way, keep the barrel import as import type so the lazy-load contract holds.

Everything else looks good — bumpMaxChunkDuration correctly gates on "RenderChunk", the null-before-any-chunk semantics are clean, and the three new tests cover the key scenarios (max tracking, Plan/Assemble filtering, null baseline).

@jrusso1020 jrusso1020 force-pushed the feat/lambda-warn-on-heavy-chunk-budget branch from c591745 to 91c8489 Compare May 22, 2026 18:02
@jrusso1020 jrusso1020 force-pushed the fix/lambda-chromium-executable-guard branch from d76baad to 19b97ad Compare May 22, 2026 18:02
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

You're right — I missed that the barrel pulls the AWS SDK clients in at static-import time. The dynamic loadSDK() was load-bearing; my "lift to static import" change defeated it.

Fixed in the latest push:

  • Restored import type { ... } from "@hyperframes/aws-lambda/sdk" so the barrel stays type-only at compile time.
  • Inlined the two constants in render.ts with a comment pointing at chunkRuntime.ts as the source of truth. chunkRuntime.ts is still a pure-constants leaf module in the SDK (so anyone consuming the SDK directly still gets them), but the CLI side reads its own copies — cheap duplication, big win on bundle behavior.

Confirmed with grep that @aws-sdk/client-sfn and @aws-sdk/client-s3 don't appear in the static-import head of render.ts anymore. All 38 CLI tests + 16 getRenderProgress tests still pass.

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import issue is fixed correctly. The constants are inlined in render.ts with a clear comment pointing at chunkRuntime.ts as the source of truth. The barrel import stays out of the static-import head, so loadSDK() still lazy-loads @aws-sdk/* as intended.

Rest of the PR is solid — bumpMaxChunkDuration, suggestFanOut, the maxChunkDurationMs plumbing through summarizeHistory, and the three new tests all look good.

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers. Good end-to-end feature. maxChunkDurationMs on RenderProgress is the right carrier — observable state any CLI consumer can act on. bumpMaxChunkDuration correctly gates on currentLambdaState === "RenderChunk" so Plan/Assemble don't inflate the max. suggestFanOut power-of-2 rounding and 256 cap are reasonable guardrails.

Important

Constant duplication between chunkRuntime.ts and the CLI inline copy of LAMBDA_TIMEOUT_MS / CHUNK_RUNTIME_WARN_MS. The comment explains the reason accurately (importing from the SDK barrel pulls in @aws-sdk/client-sfn at static import time, defeating loadSDK()). Acceptable as a shipping pragmatism, but the inline copy is a silent orphan — if someone changes the SDK constant without knowing the CLI copy exists, the warning threshold silently drifts. At minimum the inline copy should have a comment pointing to the source of truth in chunkRuntime.ts. A cleaner long-term solution is a deep sub-path export (@hyperframes/aws-lambda/sdk/chunkRuntime) that the CLI can import without the barrel penalty.

Nits

  • suggestFanOut linear scaling assumption: for cold-start-dominated chunks (large asset prefetch, GPU init), doubling maxParallelChunks halves frame count but may not halve runtime. Consider softening "Mitigate with: --max-parallel-chunks N" to "may help."
  • DEFAULT_MAX_PARALLEL_CHUNKS = 16 is hardcoded inline. If the SDK default ever changes, the suggestion math silently uses the wrong baseline. Add a co-location comment.
  • warnIfChunkRuntimeIsCloseToCap only fires on the success path (Assemble completed). If the render times out before reaching SUCCEEDED, the warning never fires. Defensible, just noting for future reference.

Note: Stacked — Build/Test/Typecheck did not run on this PR head.

— Vai

@jrusso1020 jrusso1020 force-pushed the feat/lambda-warn-on-heavy-chunk-budget branch from 91c8489 to 4140937 Compare May 22, 2026 18:47
@jrusso1020 jrusso1020 force-pushed the fix/lambda-chromium-executable-guard branch 2 times, most recently from 82aff43 to a62f062 Compare May 22, 2026 19:09
Render-time, post-hoc warning: when the slowest RenderChunk Lambda
invocation burned through more than 80% of the 900-second cap, surface
a warning at the end of --wait mode pointing at --max-parallel-chunks.
The cost-analysis sweep hit this twice — inspector-launch at 1080p/60
and 4K@anything blew past the cap with default 16-way fan-out, producing
a Sandbox.Timedout retry storm. The next user to push fps or duration
on a heavy composition will hit the same wall; this turns a cryptic
SFN failure into a one-line hint they can act on before the next
render.

Plumbing:
 - getRenderProgress tracks max billed-duration across RenderChunk
   invocations (the only state whose runtime is gated by the 15-min
   cap; Plan + Assemble are off-path).
 - RenderProgress.maxChunkDurationMs is null before the first chunk
   reports back.
 - LAMBDA_TIMEOUT_MS / CHUNK_RUNTIME_WARN_RATIO / CHUNK_RUNTIME_WARN_MS
   live in chunkRuntime.ts and are exported from the SDK so external
   callers (custom CLIs, monitoring) can match the threshold.
 - CLI's render --wait path prints the warning with a suggested
   --max-parallel-chunks value scaled by the observed headroom ratio.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jrusso1020 jrusso1020 force-pushed the feat/lambda-warn-on-heavy-chunk-budget branch from 4140937 to 0afa161 Compare May 22, 2026 19:10
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved — no issues found in review.

@jrusso1020 jrusso1020 merged commit 41352e9 into fix/lambda-chromium-executable-guard May 22, 2026
14 of 15 checks passed
@jrusso1020 jrusso1020 deleted the feat/lambda-warn-on-heavy-chunk-budget branch May 22, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants