tangle-network · drewstone · Jun 16, 2026 · Jun 14, 2026 · Jun 15, 2026 · Jun 15, 2026
diff --git a/.gitignore b/.gitignore
@@ -12,3 +12,4 @@ bench/scripts/__pycache__/
 
 # local rollout-corpus scratch (raw jsonl, per work-line)
 corpus/
+test_repo/
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -33,7 +33,7 @@ The global style rule (lead with the answer, define every term, no stacked jargo
 
 This repo's bottleneck is agents paying a **re-discovery tax**: re-reading 15 files to rebuild a mental model that already exists. Before exploring, read, in order:
 
-0. **`docs/canonical-api.md`** — THE API reference + anti-reinvention decision table ("I want to ___ → use ___ → NOT ___"). The genome→run→optimize→gate spine, the recursive atom (persona=driver, `spawnChild`=worker|sub-driver, isolated|`Workspace` artifact, conserved sub-budgets, analyst dimensions+gaps), every signature `file:line`-verified. **Read before writing ANY orchestration/optimization/measurement code** — if you're about to write `runConversation`, a "skill optimizer", a "profile-seam", or a `new Sandbox(...)` loop, it already exists.
+0. **`docs/canonical-api.md`** — THE API reference + anti-reinvention decision table ("I want to ___ → use ___ → NOT ___"). The genome→run→optimize→gate spine, the recursive atom (persona=driver, `spawnChild`=worker|sub-driver, isolated|`Workspace` artifact, conserved sub-budgets, analyst dimensions+gaps), every signature `file:line`-verified. **Read before writing ANY orchestration/optimization/measurement code** — if you're about to write `runConversation`, a "skill optimizer", a "profile-seam", or a `new Sandbox(...)` loop, it already exists. **§1.5 is the AgentProfile law we keep forgetting:** an agent IS its full profile (prompt+skills+tools+mcp+subagents+hooks); you change behavior by AUTHORING the profile and letting the sandbox substrate materialize it into harness shapes — never write a verify-loop or harness-specific config (self-verification is a hook/process, not code; opencode is only the cli-bridge test target — generalize, never specialize).
 1. **`docs/architecture.md`** — the canonical spine (one recursive `Agent` atom; two timescales; benchmark-as-adapter; selector≠judge). Wins on any architecture conflict. `docs/README.md` indexes the rest; `docs/roadmap-rsi.md` is the dependency-ordered build plan; `docs/architecture-interpretations.md` defines **the decision gate**.
 2. **`bench/HARNESS.md`** — the experiment-harness map: commands, the `rollout → corpus → selector → CI → gate` data flow, the wired/needs-creds/scaffolded matrix, and run-the-gate-in-2-lines. Read it before touching `bench/`.
 3. **`.evolve/current.json`** — the single source of truth for the active goal + generation + the live science state. Then `.evolve/progress.md` and the newest `.evolve/pursuits/*.md`.
@@ -62,9 +62,9 @@ Types that stay in THIS repo because they're runtime-shaped (coupled to a runnin
 - `run-loop.ts` — `runLoop`, the round-synchronous leaf kernel. Per round: `driver.plan()`→N tasks→one sandbox/iteration (bounded by `maxConcurrency`, round-robin `agentRuns`)→`streamPrompt`→`output.parse`→`validator.validate`→`driver.decide`. Owns iteration accounting, concurrency, abort, cost+token aggregation, trace emission, box teardown. Exports `defaultSelectWinner` (best-valid-score, ties→earliest) — the single-sourced selection the personify combinators reuse.
 - `supervise/` — the recursive execution atom (keystone): `Scope` + `Supervisor` over the open `Executor` port, spawn/settle on a **conserved budget pool** so equal-compute holds by construction; journal→replay/resume. `runtime.ts` also holds `createExecutor({backend})` — the ONE built-in executor (backend-as-data: `router`/`router-tools`/`bridge`/`cli`/`sandbox`; `router-tools` is the off-box tool-using agentic loop — chat→tool_calls→`executeToolCall`→repeat — over the router's tool-calling, no sandbox); the per-backend bodies are internal case-arms, BYO agents implement `Executor` directly.
 - `personify/` — the content-free generic combinators (`fanout`/`loopUntil`/`widen`/`panel`/`verify`/`pipeline`) + `definePersona`/`runPersonified` + the cross-run `Corpus` + `createScopeAnalyst` (the analyst-on-scope steer firewall).
-- `driver.ts` — `createDriver` (agent authors topology via a `TopologyPlanner`); `PlannerContext.analyses` is the analyst→driver wire (built + tested, but **not yet fed live** by any bench); `assertTraceDerivedFindings` is the steer-firewall (selector≠judge). `types.ts` holds `Driver`/`AgentRunSpec`/`OutputAdapter`/`Validator`/`Iteration`/`LoopResult`/`SandboxClient` + the `LoopTraceEvent` union. `sandbox-run.ts` is `openSandboxRun` — the one run/stream/resume sandbox seam; `inline-sandbox-client.ts` is `inlineSandboxClient` — the one adapter presenting any non-box `Executor` as a `SandboxClient` for `runLoop`. `loop-dispatch.ts` adapts `runLoop`→agent-eval campaigns; `report-usage.ts` forwards token usage so the integrity guard sees a real backend.
+- the **agent-driver** is the canonical "drive an agent" path: an `AgentProfile` driving another `AgentProfile` via the coordination toolbox (`createCoordinationTools`, `src/mcp/tools/coordination.ts`) over the `Scope`/`Supervisor`, plus `runAgentic`/`defineStrategy`/`runPersonified` (`strategy.ts`/`personify/persona.ts`) on the Supervisor. `assertTraceDerivedFindings` (`personify/analyst.ts`) is the steer-firewall (selector≠judge). `types.ts` holds `Driver`/`AgentRunSpec`/`OutputAdapter`/`Validator`/`Iteration`/`LoopResult`/`SandboxClient` + the `LoopTraceEvent` union. `sandbox-run.ts` is `openSandboxRun` — the one run/stream/resume sandbox seam; `inline-sandbox-client.ts` is `inlineSandboxClient` — the one adapter presenting any non-box `Executor` as a `SandboxClient` for `runLoop`. `loop-dispatch.ts` adapts `runLoop`→agent-eval campaigns; `report-usage.ts` forwards token usage so the integrity guard sees a real backend.
 
-Two substrates coexist for the same "recursive agent decision" atom: the round-synchronous `runLoop`+`createDriver` (what most benches drive today) and the reactive `Scope`/`Supervisor`+combinators (the newer canonical core). Prefer the latter for new recursive/keystone work. Both run over the one `Executor` port.
+Two substrates coexist for the same "recursive agent decision" atom: the round-synchronous `runLoop` kernel (the leaf, what most sandbox benches drive today) and the reactive `Scope`/`Supervisor`+combinators (the canonical core — the agent-driver, `runAgentic`/`defineStrategy`/`runPersonified`). Prefer the latter for new recursive/keystone work. Both run over the one `Executor` port.
 
 Headline entrypoints: `runAgentTask`/`runAgentTaskStream` (`src/run.ts`), the multi-agent conversation engine (`src/conversation/`), `handleChatTurn` (`src/durable/`), the named delegated loops (`src/loop-runner.ts`).
 

diff --git a/README.md b/README.md
@@ -58,7 +58,8 @@ That is the common case. Everything below is for when one chat turn is not enoug
 | Run a one-shot task with verification and eval | `runAgentTask` | root |
 | Compare optimization strategies on YOUR domain (5 hooks) | `runBenchmark` + `defineStrategy` | `/loops` |
 | Let the system author + evolve its own strategies, gated | `runStrategyEvolution` · `authorStrategy` · `promotionGate` | `/loops` |
-| Run a multi-attempt loop with a custom driver | `runLoop` + `createDriver` | `/loops` |
+| Run a multi-attempt loop with a custom driver | `runLoop` + an inline `Driver` | `/loops` |
+| Drive one agent profile from another (the canonical driver) | `createCoordinationTools` over `Supervisor` (`/runtime`) | `/mcp` |
 | Delegate a disciplined loop by mode (code, research, ...) | `runDelegatedLoop` or `agent-runtime-loop` | root |
 | Build code reliably (reviewed, gated) | `createDefaultCoderDelegate` | `/mcp` |
 | Grow a knowledge base with only grounded facts | `createKbGate` | `/mcp` |
@@ -108,10 +109,15 @@ evidence ledger live in [`bench/HARNESS.md`](./bench/HARNESS.md).
 `runLoop` is a topology-agnostic kernel. Each iteration spawns a sandbox on an `AgentRunSpec`, decodes the output, validates it, and asks a driver what to do next. The driver owns topology. The validator owns scoring. The kernel owns iteration accounting, concurrency, cost and token aggregation, and trace emission.
 
 ```ts
-import { runLoop, createDriver } from '@tangle-network/agent-runtime/loops'
+import { runLoop, type Driver } from '@tangle-network/agent-runtime/loops'
+
+const driver: Driver<Task, Output, 'pick-winner' | 'fail'> = {
+  plan: async (task, history) => (history.length === 0 ? [task, task] : []), // fan out, then stop
+  decide: (history) => (history.some((i) => i.verdict?.valid) ? 'pick-winner' : 'fail'),
+}
 
 const result = await runLoop({
-  driver: createDriver({ planner }),           // the planner emits one TopologyMove per round
+  driver,                                       // the driver owns topology; the kernel owns accounting
   agentRuns: [claudeSpec, codexSpec, glmSpec], // heterogeneous: one harness per branch
   output,                                       // events to typed Output
   validator,                                    // Output to { valid, score }
@@ -121,13 +127,13 @@ const result = await runLoop({
 result.winner // highest-scoring valid attempt
 ```
 
-`createDriver` lets a planner author the topology at runtime: one `TopologyMove` per round
-(`refine`, `fanout`, `select`, or `stop`); a malformed move throws `PlannerError`, so the loop never
-runs a topology nobody chose. Topology is orthogonal to harness: the planner never names a backend,
-and the kernel's `agentRuns` decide which harness runs each branch. For fixed shapes, write a small
-inline `Driver` (see `examples/coder-loop`) or use the `personify` combinators (`fanout`, `loopUntil`,
-`panel`, `pipeline`) over the recursive `Scope`/`Supervisor` core — the newer canonical path for
-recursive work.
+A `Driver` is `plan` (emit the round's `Task[]` — `[]` ends the loop) plus `decide` (the terminal
+`Decision` over the history). Topology is orthogonal to harness: the driver never names a backend,
+and the kernel's `agentRuns` decide which harness runs each branch. See `examples/coder-loop` for a
+fixed-shape inline `Driver`. For recursive work prefer the **agent-driver** — an `AgentProfile`
+driving another via `createCoordinationTools` (`/mcp`) over the budget-conserving `Scope`/`Supervisor`
+core (`/runtime`) — plus the `personify` combinators (`fanout`, `loopUntil`, `panel`, `pipeline`) and
+`runPersonified` on that same core.
 
 ## Self-improvement
 
@@ -209,12 +215,17 @@ Delegation state is in-memory by default — a server restart drops pending dele
 ## The experiment harness (bench/)
 
 `bench/` is the internal harness; [`bench/HARNESS.md`](./bench/HARNESS.md) is its map — read that
-first. The canonical path is the optimization suite (`runBenchmark`/`flywheel-evolve` over real
-domains: the EnterpriseOps gym, commit0, answer-shaped math); the older selection-gate paths
-(`runExperiment`, corpus-replay) remain for the legacy evidence. The live evidence ledger is
+first. The canonical path is the optimization suite (`runBenchmark`/`runStrategyEvolution` over real
+domains: the EnterpriseOps gym, commit0, answer-shaped math). The live evidence ledger is
 `.evolve/current.json` — results never live in this README.
 
-One entrypoint, `runExperiment(adapter, { sandboxClient, agentRun, arms, ... })`: N instances times a set of arms, each arm a topology driven through `runLoop`, judged by the adapter, written to a durable canonical corpus. An arm is one steer function `f(rootPrompt, history) => nextPrompt`: `random` ignores history (the compute control), `refine` carries the prior answer plus a directive, `diverse` rotates a strategy lens. The cost dial is the backend type (`hermes` for a direct router call, `opencode` or `claude-code` or `codex` for agent CLIs). The deep statistics (paired bootstrap with Benjamini-Hochberg correction, selector replay) come from `corpus-report.mts` and `corpus-replay.mts` over the written corpus, computed once. See `bench/HARNESS.md` and `docs/learning-flywheel.md`.
+The recursive diverse-vs-blind gate runs through the keystone: `gate-cli.mts` →
+`runGate` composes a `Persona` + the generic `fanout` combinator over the budget-conserving
+`Supervisor`, with each child solved via the router and graded by the benchmark's own deployable
+`adapter.judge` (selector ≠ oracle). Each rollout is written to a durable canonical corpus; the deep
+statistics (paired bootstrap with Benjamini-Hochberg correction, selector replay) come from
+`corpus-report.mts` and `corpus-replay.mts` over that corpus, computed once and offline. See
+`bench/HARNESS.md` and `docs/learning-flywheel.md`.
 
 ## Defaults
 
@@ -225,7 +236,7 @@ One entrypoint, `runExperiment(adapter, { sandboxClient, agentRun, arms, ... })`
 | Router base URL | `https://router.tangle.tools/v1` | `TANGLE_ROUTER_BASE_URL` env |
 | Sandbox base URL | `https://sandbox.tangle.tools` | `SANDBOX_API_URL` env |
 | Loop iteration cap | 10 (`runLoop`) | `runLoop({ maxIterations })` |
-| Driver | none, required by `runLoop` | `createDriver` or an inline `Driver` |
+| Driver | none, required by `runLoop` | an inline `Driver` (`plan`/`decide`) |
 | Strategy budget (suite) | 3 rollouts/shots per strategy per task | `runBenchmark({ budget })` |
 | Winner selection (coder delegate) | `highest-score` | `winnerSelection` option |
 | KB gate min passage | 12 chars | `createKbGate({ minPassageChars })` |
@@ -257,7 +268,7 @@ sandbox         AgentProfile, Sandbox.create, streamPrompt, exportTraceBundle. T
 |---|---|
 | `@tangle-network/agent-runtime` | chat turns, delegated loop-runner, OTEL export, errors, model resolution |
 | `.../agent` | `defineAgent` plus surface and outcome adapters |
-| `.../loops` | **the optimization suite** (`Environment`, `defineStrategy`, `runBenchmark`, `runStrategyEvolution`, `authorStrategy`, `promotionGate`) + the `runLoop` kernel, `createDriver`, `loopDispatch` |
+| `.../loops` | **the optimization suite** (`Environment`, `defineStrategy`, `runBenchmark`, `runStrategyEvolution`, `authorStrategy`, `promotionGate`) + the `runLoop` kernel, the `Driver` type, `loopDispatch` |
 | `.../profiles` | `coderProfile`, `researcherProfile` presets |
 | `.../mcp` | `createMcpServer`, `createDefaultCoderDelegate`, `createKbGate`, the `agent-runtime-mcp` bin |
 | `.../improvement` | `improvementDriver` (code/worktree `CandidateGenerator`), `agenticGenerator`, `reflectiveGenerator` — the code-surface driver you pass to agent-eval's `selfImprove` |
Original file line number	Diff line number	Diff line change
Expand Up		@@ -12,3 +12,4 @@ bench/scripts/__pycache__/

		# local rollout-corpus scratch (raw jsonl, per work-line)
		corpus/
		test_repo/