Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Types that stay in THIS repo because they're runtime-shaped (coupled to a runnin
- `run-loop.ts` — `runLoop`, the round-synchronous leaf kernel. Per round: `driver.plan()`→N tasks→one sandbox/iteration (bounded by `maxConcurrency`, round-robin `agentRuns`)→`streamPrompt`→`output.parse`→`validator.validate`→`driver.decide`. Owns iteration accounting, concurrency, abort, cost+token aggregation, trace emission, box teardown. Exports `defaultSelectWinner` (best-valid-score, ties→earliest) — the single-sourced selection the personify combinators reuse.
- `supervise/` — the recursive execution atom (keystone): `Scope` + `Supervisor` over the open `Executor` port, spawn/settle on a **conserved budget pool** so equal-compute holds by construction; journal→replay/resume. `runtime.ts` also holds `createExecutor({backend})` — the ONE built-in executor (backend-as-data: `router`/`router-tools`/`bridge`/`cli`/`sandbox`; `router-tools` is the off-box tool-using agentic loop — chat→tool_calls→`executeToolCall`→repeat — over the router's tool-calling, no sandbox); the per-backend bodies are internal case-arms, BYO agents implement `Executor` directly.
- `personify/` — the content-free generic combinators (`fanout`/`loopUntil`/`widen`/`panel`/`verify`/`pipeline`) + `definePersona`/`runPersonified` + the cross-run `Corpus` + `createScopeAnalyst` (the analyst-on-scope steer firewall).
- the **agent-driver** is the canonical "drive an agent" path: an `AgentProfile` driving another `AgentProfile` via the coordination toolbox (`createCoordinationTools`, `src/mcp/tools/coordination.ts`) over the `Scope`/`Supervisor`, plus `runAgentic`/`defineStrategy`/`runPersonified` (`strategy.ts`/`personify/persona.ts`) on the Supervisor. Child→parent messages ride ONE typed pipe — `createEventBus` (`supervise/event-bus.ts`): settled outputs, `ask_parent` questions, and analyst findings are all `CoordinationEvent` kinds, delivered pass-through (`subscribe`/`onEvent`, immediate) AND queued for the driver to pull (`await_event`, kind-filterable; `await_next` is the settled-only view). The pull queue is **priority-ordered** — a blocking question (urgency→priority: `blocks-run`=20/`blocks-step`=10) is bumped ahead of queued settles/findings; ties FIFO by `seq`. Observability is first-class: every event is stamped (`seq`/`at`/`priority`), the full `history()` is an audit/replay trail, `stats()` counts throughput (both surfaced on `CoordinationTools` and the MCP handle). `analyzeOnSettle` auto-fires trace analysts when a worker settles `done`, re-entering each result as a `finding` on the same bus (cost-governed opt-in; the firewall stays in the analyst registry). The in-process queue and a future cross-box durable mailbox share this one interface. `assertTraceDerivedFindings` (`personify/analyst.ts`) is the steer-firewall (selector≠judge). `types.ts` holds `Driver`/`AgentRunSpec`/`OutputAdapter`/`Validator`/`Iteration`/`LoopResult`/`SandboxClient` + the `LoopTraceEvent` union. `sandbox-run.ts` is `openSandboxRun` — the one run/stream/resume sandbox seam; `inline-sandbox-client.ts` is `inlineSandboxClient` — the one adapter presenting any non-box `Executor` as a `SandboxClient` for `runLoop`. `loop-dispatch.ts` adapts `runLoop`→agent-eval campaigns; `report-usage.ts` forwards token usage so the integrity guard sees a real backend.
- the **agent-driver** is the canonical "drive an agent" path: an `AgentProfile` driving another `AgentProfile` via the coordination toolbox (`createCoordinationTools`, `src/mcp/tools/coordination.ts`) over the `Scope`/`Supervisor`, plus `runAgentic`/`defineStrategy`/`runPersonified` (`strategy.ts`/`personify/persona.ts`) on the Supervisor. Child→parent messages ride ONE typed pipe — `createEventBus` (`supervise/event-bus.ts`): settled outputs, `ask_parent` questions, and analyst findings are all `CoordinationEvent` kinds, delivered pass-through (`subscribe`/`onEvent`, immediate) AND queued for the driver to pull (`await_event({kinds?})` — the ONE wait verb; `kinds:['settled']` = next finished worker, omit = also questions/findings). The pull queue is **priority-ordered** — a blocking question (urgency→priority: `blocks-run`=20/`blocks-step`=10) is bumped ahead of queued settles/findings; ties FIFO by `seq`. The bus is **bidirectional**: UP (settled/question/finding) is queued+pullable; DOWN (`steer_worker` for any live worker — instruction/correction/continuation; `answer_question` routes an answer down) goes to the child inbox via `scope.send`→`deliver` AND records a `queue:false` event (history + subscribers, never pulled back). The receive end is `createInbox` (`supervise/inbox.ts`), which the owned tool-loop executor (`routerToolsInlineExecutor`) exposes as `Executor.deliver`: QUEUED messages flush at each step boundary AND before the worker may settle (it can't finish with an unread steer); a FORCEFUL `steer_worker({interrupt:true})` aborts the in-flight turn so the worker re-plans immediately. Black-box CLI harnesses can't be interrupted mid-step, so there the down-leg degrades to the next spawn. Observability is first-class: every event both ways is stamped (`seq`/`at`/`priority`), the full `history()` is an audit/replay trail, `stats()` counts throughput (both surfaced on `CoordinationTools` and the MCP handle). `analyzeOnSettle` auto-fires trace analysts when a worker settles `done`, re-entering each result as a `finding` on the same bus (cost-governed opt-in; the firewall stays in the analyst registry). Trace analysis is **substrate-agnostic** via `TraceSource` (`supervise/trace-source.ts`) — a worker's tool calls as agent-eval `ToolSpan`s from EITHER an owned loop (`createPushTraceSource`; `routerToolsInlineExecutor`'s `onToolStep` feeds `record`) OR a sandbox/fleet box (`sandboxSessionTraceSource(box, sessionId)` decodes `box.messages()` session parts; `decodeToolPart` is defensive across OpenAI + harness shapes). Two consumers ride a source: ONLINE `watchTrace` (`detector-monitor.ts`) folds live spans through agent-eval's published streaming kernel (`repeatedActionDetector`/`errorStreakDetector`, the SAME kernel `control-runtime` folds) → `onSignal` → a `finding`; SETTLE `analyzeTrace` (`trajectory-recorder.ts`) collects the spans and runs the published BATCH analyzers (`buildTrajectory`/`stuckLoopView`/`toolWasteView`). `ToolSpan` is the common currency; detection logic + the failure taxonomy live in agent-eval — never reimplement here. Production target = sandbox/fleet; the owned-loop push path is for local/router/cli-bridge. The in-process queue and a future cross-box durable mailbox share this one interface. `assertTraceDerivedFindings` (`personify/analyst.ts`) is the steer-firewall (selector≠judge). `types.ts` holds `Driver`/`AgentRunSpec`/`OutputAdapter`/`Validator`/`Iteration`/`LoopResult`/`SandboxClient` + the `LoopTraceEvent` union. `sandbox-run.ts` is `openSandboxRun` — the one run/stream/resume sandbox seam; `inline-sandbox-client.ts` is `inlineSandboxClient` — the one adapter presenting any non-box `Executor` as a `SandboxClient` for `runLoop`. `loop-dispatch.ts` adapts `runLoop`→agent-eval campaigns; `report-usage.ts` forwards token usage so the integrity guard sees a real backend.

Two substrates coexist for the same "recursive agent decision" atom: the round-synchronous `runLoop` kernel (the leaf, what most sandbox benches drive today) and the reactive `Scope`/`Supervisor`+combinators (the canonical core — the agent-driver, `runAgentic`/`defineStrategy`/`runPersonified`). Prefer the latter for new recursive/keystone work. Both run over the one `Executor` port.

Expand Down
2 changes: 1 addition & 1 deletion bench/src/atom-humaneval.mts
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ function humanEvalWorker(task: HumanEvalTask, label: string): Agent<unknown, unk
}
}

const driverSystem = `You are an orchestrator driving worker agents to solve a Python coding task. You do NOT write code yourself. Each worker independently attempts the task and is graded by a deterministic, hidden test suite. Tools: spawn_worker (dispatch one attempt; the "profile" argument may be {} and "task" a short note), await_next (collect the next settled worker — its result tells you valid:true if its tests PASSED, valid:false if they failed), and stopping (reply with NO tool call) once a worker has DELIVERED. Spawn one worker, await it; if it delivered, stop; if not, spawn another, up to ${K} workers total. You cannot declare success yourself — only a delivered (valid:true) worker counts.`
const driverSystem = `You are an orchestrator driving worker agents to solve a Python coding task. You do NOT write code yourself. Each worker independently attempts the task and is graded by a deterministic, hidden test suite. Tools: spawn_worker (dispatch one attempt; the "profile" argument may be {} and "task" a short note), await_event (collect the next settled worker — its result tells you valid:true if its tests PASSED, valid:false if they failed), and stopping (reply with NO tool call) once a worker has DELIVERED. Spawn one worker, await it; if it delivered, stop; if not, spawn another, up to ${K} workers total. You cannot declare success yourself — only a delivered (valid:true) worker counts.`

interface TaskOutcome {
taskId: string
Expand Down
2 changes: 1 addition & 1 deletion bench/src/atom-mcp-e2e.mts
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ async function main(): Promise<void> {
messages: [
{
role: 'user',
content: `${TASK}\n\nYou are a SUPERVISOR. You have the "supervise" skill and a "coordination" MCP with tools spawn_worker, await_next, stop. Do NOT write code yourself. Author a worker profile (a JSON object with name + a rich systemPrompt telling the worker exactly what to implement) and call spawn_worker with it, then await_next, and stop once a worker delivered (valid:true).`,
content: `${TASK}\n\nYou are a SUPERVISOR. You have the "supervise" skill and a "coordination" MCP with tools spawn_worker, await_event, stop. Do NOT write code yourself. Author a worker profile (a JSON object with name + a rich systemPrompt telling the worker exactly what to implement) and call spawn_worker with it, then await_event, and stop once a worker delivered (valid:true).`,
},
],
cwd: supCwd,
Expand Down
8 changes: 4 additions & 4 deletions bench/src/mcp-mount-probe.mts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* actually MOUNT my coordination MCP and CALL spawn_worker — landing on a real Scope.spawn?
*
* Serves the coordination MCP over a live Scope, then asks the bridge's opencode (with that MCP in
* its config) to call spawn_worker + await_next. If the Scope spawned+settled, the in-box driving
* its config) to call spawn_worker + await_event. If the Scope spawned+settled, the in-box driving
* path is real. No mock.
*
* ROUTER_BASE=http://127.0.0.1:3355/v1 TANGLE_API_KEY=<bridge-bearer> \
Expand Down Expand Up @@ -86,9 +86,9 @@ async function main(): Promise<void> {
{
role: 'user',
content:
'You have an MCP server named "coordination" with tools: spawn_worker, await_next, stop. ' +
'Call spawn_worker with arguments {"profile":{},"task":"hello"}. Then call await_next. ' +
'Then reply with exactly what await_next returned.',
'You have an MCP server named "coordination" with tools: spawn_worker, await_event, stop. ' +
'Call spawn_worker with arguments {"profile":{},"task":"hello"}. Then call await_event. ' +
'Then reply with exactly what await_event returned.',
},
],
mcp.url,
Expand Down
4 changes: 2 additions & 2 deletions bench/src/profiles.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ export const OPERATOR_TOOLS = [
'run_analyst', // run an analyst over a worker's trace → findings (selector≠judge: trace, not score)
'observe_worker', // a worker's in-flight trace, or its last finished episode/shot
'spawn_worker', // start a worker (or a sub-analyst) — drive many; parallelize when independent
'steer_worker', // send a running/parked worker its next instruction / an interrupt
'steer_worker', // send a live worker a message down: instruction, course-correction, or continuation (interrupt? for forceful)
'stop', // declare the task complete (verified) or abandon a line
] as const

Expand Down Expand Up @@ -95,7 +95,7 @@ export const driverProfile: RoleProfile = {
' analysts are cheap; make them when a worker’s failure mode needs a focused lens.',
'- observe_worker(worker): the worker’s IN-FLIGHT trace if it is still running, else its last',
' finished episode/shot.',
'- spawn_worker(profile, task) / steer_worker(worker, instruction) / stop.',
'- spawn_worker(profile, task) / steer_worker(worker, instruction, interrupt?) / stop.',
'- the artifact’s own tools (read/edit/run) — use them to inspect the workspace and to contribute',
' decisive work yourself.',
'',
Expand Down
2 changes: 1 addition & 1 deletion docs/architecture-visual.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ that keeps it honest.
Scope: spawn child agent(s) → run → settle → verdict on the artifact
└──▶ await_next → terminal? → winner = argmax(valid score)
└──▶ await_event → terminal? → winner = argmax(valid score)
```

The firewall is the load-bearing line: the **analyst reads the trace and may not cite the score**, so
Expand Down
6 changes: 3 additions & 3 deletions docs/execution-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@ Before, each bench hand-rolled its own pseudo-box client. Now there is **one exe
│ each round it decides the TOPOLOGY MOVE ─────┐ this IS
│ refine │ fanout │ select │ stop │ │ "topology grown
│ then drives workers via the toolbox: │ │ by LLM decision"
│ spawn_worker · await_next · steer_worker │ │ (driver.ts:52)
│ spawn_worker · await_event · steer_worker │ │ (driver.ts:52)
└───────────────┬────────────────────────────┘ │
spawn_worker(profile,task) ──┤ reserves budget (fails │
steer_worker(id,msg) ────────┤ CLOSED if the pool is dry) │
await_next ──────────────────┘ │
await_event ──────────────────┘ │
┌───────────────┼───────────────┐ │
▼ ▼ ▼ │
┌───────────┐ ┌───────────┐ ┌───────────┐ │
Expand Down Expand Up @@ -113,7 +113,7 @@ Before, each bench hand-rolled its own pseudo-box client. Now there is **one exe
└─ 4. settle ──► pool.reconcile(ticket, actualSpend)
await_next wakes the driver with this child's result
await_event wakes the driver with this child's result
```

**Net:** the "unified thing" is the `Executor` port. Everything that runs work — a router call, a cli-bridge turn, a `claude -p` subprocess, a full sandbox rollout, or a BYO agent — is an `Executor`, chosen by data via `createExecutor`, metered by one budget pool. Drivers and workers are both `act`s over that port; the only structural difference is the driver carries the operator toolbox (so it can spawn/steer) and the worker does not.
Loading