feat(lambda): warn when slowest chunk approaches the 15-min cap#1024
Conversation
3e5b059 to
d76baad
Compare
23fb168 to
c591745
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Clean feature — the plumbing through summarizeHistory is solid, tests cover the right cases, and suggestFanOut math checks out. One issue I'd fix before merge:
Static import defeats the lazy-loading pattern (packages/cli/src/commands/lambda/render.ts)
The file converts the existing import type { ... } from "@hyperframes/aws-lambda/sdk" into a runtime import to pull CHUNK_RUNTIME_WARN_MS and LAMBDA_TIMEOUT_MS. Type-only imports are erased at compile time, but value imports execute the barrel — which re-exports deploySite.js, renderToLambda.js, getRenderProgress.js, etc., all of which pull in @aws-sdk/client-sfn, @aws-sdk/client-s3, and friends. The whole point of loadSDK() in that file was to keep the SFN/S3 clients out of the static-import head of the CLI bundle.
Since chunkRuntime.ts is a pure-constants module with zero transitive deps, import the constants from the leaf module directly:
import { CHUNK_RUNTIME_WARN_MS, LAMBDA_TIMEOUT_MS } from "@hyperframes/aws-lambda/src/sdk/chunkRuntime.js";or, if the package doesn't expose a subpath for that, inline the two constants (they're trivial: 900_000 and 720_000). Either way, keep the barrel import as import type so the lazy-load contract holds.
Everything else looks good — bumpMaxChunkDuration correctly gates on "RenderChunk", the null-before-any-chunk semantics are clean, and the three new tests cover the key scenarios (max tracking, Plan/Assemble filtering, null baseline).
c591745 to
91c8489
Compare
d76baad to
19b97ad
Compare
|
You're right — I missed that the barrel pulls the AWS SDK clients in at static-import time. The dynamic Fixed in the latest push:
Confirmed with |
miguel-heygen
left a comment
There was a problem hiding this comment.
The import issue is fixed correctly. The constants are inlined in render.ts with a clear comment pointing at chunkRuntime.ts as the source of truth. The barrel import stays out of the static-import head, so loadSDK() still lazy-loads @aws-sdk/* as intended.
Rest of the PR is solid — bumpMaxChunkDuration, suggestFanOut, the maxChunkDurationMs plumbing through summarizeHistory, and the three new tests all look good.
vanceingalls
left a comment
There was a problem hiding this comment.
No blockers. Good end-to-end feature. maxChunkDurationMs on RenderProgress is the right carrier — observable state any CLI consumer can act on. bumpMaxChunkDuration correctly gates on currentLambdaState === "RenderChunk" so Plan/Assemble don't inflate the max. suggestFanOut power-of-2 rounding and 256 cap are reasonable guardrails.
Important
Constant duplication between chunkRuntime.ts and the CLI inline copy of LAMBDA_TIMEOUT_MS / CHUNK_RUNTIME_WARN_MS. The comment explains the reason accurately (importing from the SDK barrel pulls in @aws-sdk/client-sfn at static import time, defeating loadSDK()). Acceptable as a shipping pragmatism, but the inline copy is a silent orphan — if someone changes the SDK constant without knowing the CLI copy exists, the warning threshold silently drifts. At minimum the inline copy should have a comment pointing to the source of truth in chunkRuntime.ts. A cleaner long-term solution is a deep sub-path export (@hyperframes/aws-lambda/sdk/chunkRuntime) that the CLI can import without the barrel penalty.
Nits
suggestFanOutlinear scaling assumption: for cold-start-dominated chunks (large asset prefetch, GPU init), doublingmaxParallelChunkshalves frame count but may not halve runtime. Consider softening "Mitigate with:--max-parallel-chunks N" to "may help."DEFAULT_MAX_PARALLEL_CHUNKS = 16is hardcoded inline. If the SDK default ever changes, the suggestion math silently uses the wrong baseline. Add a co-location comment.warnIfChunkRuntimeIsCloseToCaponly fires on the success path (Assemblecompleted). If the render times out before reachingSUCCEEDED, the warning never fires. Defensible, just noting for future reference.
Note: Stacked — Build/Test/Typecheck did not run on this PR head.
— Vai
91c8489 to
4140937
Compare
82aff43 to
a62f062
Compare
Render-time, post-hoc warning: when the slowest RenderChunk Lambda invocation burned through more than 80% of the 900-second cap, surface a warning at the end of --wait mode pointing at --max-parallel-chunks. The cost-analysis sweep hit this twice — inspector-launch at 1080p/60 and 4K@anything blew past the cap with default 16-way fan-out, producing a Sandbox.Timedout retry storm. The next user to push fps or duration on a heavy composition will hit the same wall; this turns a cryptic SFN failure into a one-line hint they can act on before the next render. Plumbing: - getRenderProgress tracks max billed-duration across RenderChunk invocations (the only state whose runtime is gated by the 15-min cap; Plan + Assemble are off-path). - RenderProgress.maxChunkDurationMs is null before the first chunk reports back. - LAMBDA_TIMEOUT_MS / CHUNK_RUNTIME_WARN_RATIO / CHUNK_RUNTIME_WARN_MS live in chunkRuntime.ts and are exported from the SDK so external callers (custom CLIs, monitoring) can match the threshold. - CLI's render --wait path prints the warning with a suggested --max-parallel-chunks value scaled by the observed headroom ratio. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4140937 to
0afa161
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Approved — no issues found in review.
41352e9
into
fix/lambda-chromium-executable-guard

What
Track the slowest
RenderChunkLambda invocation across a render. When that chunk's billed duration crosses 80% of Lambda's 15-min cap, surface a warning at the end of--waitwith a--max-parallel-chunksbump suggestion sized to the actual fan-out the user ran with.Why
The cost-analysis sweep hit Lambda's
Sandbox.Timedouttwice — inspector-launch at 1080p/60fps with defaultmpc=16and 4K@anything — both with cryptic SFN errors. The next user to push fps or composition complexity on a heavy WebGL render will hit the same wall. Without this, the only signal they get is a generic SFN failure 15+ minutes into the render. With it, the previous successful render warned them ahead of time.This is also the practical answer to "what's the longest render Lambda can handle?" The hard limit is
chunkSize × maxParallelChunks(default 32 min @ 30fps, up to 8.5h cranked); the operational limit is per-chunk runtime, which depends on composition complexity.How
RenderProgress.maxChunkDurationMs: new field on the SDK's progress snapshot. Tracks the max billed-duration acrossRenderChunkLambda invocations only (Plan + Assemble are off-path — they have their own runtime profiles that aren't gated by the 15-min cap). Null until the first chunk reports back.bumpMaxChunkDuration(current, state, billedMs)helper shared between theTaskSucceededandLambdaFunctionSucceededswitch arms.packages/aws-lambda/src/sdk/chunkRuntime.ts: new module exportingLAMBDA_TIMEOUT_MS = 900_000andCHUNK_RUNTIME_WARN_MS = LAMBDA_TIMEOUT_MS × 0.8. Re-exported from the SDK barrel.warnIfChunkRuntimeIsCloseToCap(progress, currentMaxParallelChunks)in the CLI: synchronous, uses the static SDK constants (no dynamicloadSDK()for a constant lookup).suggestFanOut(current, slowestMs): scales the suggestion from the actual--max-parallel-chunksvalue the user ran with — doubles until projected per-chunk duration clears the threshold, rounds to the next power of two, caps at 256. So a user runningmpc=64who hits 800s gets--max-parallel-chunks 128, not the hardcoded32an earlier version would have suggested.Test plan
--wait; renders under the threshold are silent.