fix(chat/runs): abort headless run when platform-api skill is missing#724
fix(chat/runs): abort headless run when platform-api skill is missing#724sweetmantech wants to merge 1 commit into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Review limit reached
Next review available in: 43 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
1 issue found across 2 files
Confidence score: 3/5
- In
lib/chat/runs/provisionRunSession.ts, the abort/error path can leave a sandbox/session persisted as ACTIVE when required skill discovery fails, which risks orphaned resources and inconsistent run state after merge; add explicit cleanup/failed-session handling before throwing, or perform skill validation before writing ACTIVE state.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="lib/chat/runs/provisionRunSession.ts">
<violation number="1" location="lib/chat/runs/provisionRunSession.ts:114">
P2: Abort path leaks a provisioned active sandbox/session when required skill discovery fails. Add cleanup/failed-session handling before throwing, or move the validation before persisting ACTIVE state if possible.</violation>
</file>
Architecture diagram
sequenceDiagram
participant Client as Chat Client
participant Handler as handleStartChatRun
participant Provision as provisionRunSession
participant Installer as installSessionGlobalSkills
participant Discoverer as discoverSkills
participant SkillList as Discovered Skills
participant SessionDB as sessions table
Note over Client,SessionDB: HEADLESS RUN PROVISION FLOW
Client->>Handler: Start headless chat run
Handler->>Provision: provisionRunSession(accountId, title)
Provision->>Provision: Create session record
Provision->>Installer: Install global skills (best-effort)
alt Install fails (network/registry/timeout)
Installer-->>Provision: Error thrown
Note over Provision: Continues anyway (best-effort)
end
Provision->>Discoverer: discoverSkills(sandbox)
Discoverer-->>Provision: List of discovered skills
Note over Provision: REQUIRED_PLATFORM_API_SKILL = "recoup-platform-api-access"
alt Skill "recoup-platform-api-access" IS in discovered list
Provision-->>Handler: Return ProvisionedRunSession
Handler->>Handler: Proceed with run
Handler->>Client: 200 OK + run results
else Skill "recoup-platform-api-access" NOT in discovered list
Note over Provision: Fail closed - abort immediately
Provision-->>Handler: Throw Error
Handler->>Handler: Map to 5xx error
Handler->>Handler: Revoke ephemeral key
Handler-->>Client: 500 Internal Server Error
Note over Client: Run aborted - recoverable state
end
Note over Handler,Client: DECISION: Fail-closed chosen over<br/>degraded "no data" email because<br/>fabricated report is worst outcome
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| // A missed run is recoverable; a fabricated report sent to a customer is not. | ||
| // The caller maps this throw to a 5xx and revokes any minted ephemeral key. | ||
| if (!skills.some(skill => skill.name === REQUIRED_PLATFORM_API_SKILL)) { | ||
| throw new Error( |
There was a problem hiding this comment.
P2: Abort path leaks a provisioned active sandbox/session when required skill discovery fails. Add cleanup/failed-session handling before throwing, or move the validation before persisting ACTIVE state if possible.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lib/chat/runs/provisionRunSession.ts, line 114:
<comment>Abort path leaks a provisioned active sandbox/session when required skill discovery fails. Add cleanup/failed-session handling before throwing, or move the validation before persisting ACTIVE state if possible.</comment>
<file context>
@@ -101,6 +105,17 @@ export async function provisionRunSession({
+ // A missed run is recoverable; a fabricated report sent to a customer is not.
+ // The caller maps this throw to a 5xx and revokes any minted ephemeral key.
+ if (!skills.some(skill => skill.name === REQUIRED_PLATFORM_API_SKILL)) {
+ throw new Error(
+ `[provisionRunSession] required skill '${REQUIRED_PLATFORM_API_SKILL}' unavailable after install/discovery — aborting to avoid an ungrounded run`,
+ );
</file context>
Fail closed: after install + discovery, if recoup-platform-api-access isn't among the discovered skills, throw so the run aborts (caller maps to 5xx + revokes the ephemeral key) instead of running an agent that can't reach the Recoup API and fabricates ungrounded data (chat#1822). A missed run is recoverable; a fabricated report sent to a customer is not. Stacked on the headless skill-install change (same file). Refs recoupable/chat#1822 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
630d67e to
b3b40ee
Compare
|
Closing per YAGNI. recoupable/skills#65 + #722 are the actual fix (the skill points at a live host + headless runs now install it, so the agent gets the If we ever observe skill-less runs in practice, the lighter follow-up is observability (log/metric on a zero-skill provision), not a hard abort. Decision recorded on recoupable/chat#1822. |
What
After install + discovery in
provisionRunSession, fail closed: ifrecoup-platform-api-accessisn't among the discovered skills, throw. The caller (handleStartChatRun) already maps aprovisionRunSessionthrow to a 5xx and revokes any minted ephemeral key — so the run aborts cleanly instead of proceeding.Why
Installing skills in the headless path (the stacked PR below) is necessary but not sufficient: a best-effort install can still silently fail (network/registry/timeout), leaving the agent with no
skilltool. Today that degrades to the agent guessing API endpoints and fabricating a report it then emails to a customer (recoupable/chat#1822).This makes the failure observable and safe: a missed run is recoverable; a fabricated report sent to a label/manager is not.
Tests (TDD, RED→GREEN)
Extends
lib/chat/runs/__tests__/provisionRunSession.test.ts:discoverSkills → []⇒provisionRunSessionrejects with/recoup-platform-api-access/(was RED; now GREEN).lib/chat/runssuite green (20 passed), eslint clean, zero newtscerrors in changed files.Stacking
Stacked on #722 (
fix/headless-install-global-skills) — base is that branch so this PR's diff is only the abort check. Retarget totestafter #722 merges.Refs recoupable/chat#1822 · merge order: skills#65 → api#722 → this · base
fix/headless-install-global-skills🤖 Generated with Claude Code
Summary by cubic
Abort headless chat runs if the required
recoup-platform-api-accessskill isn’t present after install + discovery. This prevents ungrounded runs and avoids sending fabricated reports.provisionRunSession: throw whenrecoup-platform-api-accessisn’t discovered; caller maps to 5xx and revokes the ephemeral key.Written for commit b3b40ee. Summary will update on new commits.