Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
f8d9137
feat(intelligence): capability-delivery manifest — composeCertifiedPr…
drewstone Jun 14, 2026
931aa15
docs(rsi): correct depth>breadth to the POWER-16 tie at n=48 (not the…
drewstone Jun 15, 2026
bdae618
chore(clean): remove dead mock loop + orphan re-exports/interface (43…
drewstone Jun 15, 2026
472904a
docs(research): atom-compression plan, harness-compat matrix, long-ho…
drewstone Jun 15, 2026
743525f
chore(deps): bump @types/node 25.9.3 + playwright 1.61.0 (dev)
drewstone Jun 15, 2026
e7397dc
docs(research): RSI atom masterplan + build tracker (single source of…
drewstone Jun 15, 2026
4c1d065
docs(research): collapse N driver prompts → one cached generator (sof…
drewstone Jun 15, 2026
3640c8c
docs(research): active push — RUN/DELETE/IMPROVE worklist (delete cre…
drewstone Jun 15, 2026
4ae15be
docs(research): createDriver delete BLOCKED (paradigm diff, evidenced…
drewstone Jun 15, 2026
2101f2d
refactor(runtime): full nuke of the createDriver/string-prompt measur…
drewstone Jun 15, 2026
f57c055
docs(research): full nuke DONE (-3492 LOC); doc/skill-rot follow-up t…
drewstone Jun 15, 2026
29e9659
docs(cleanup): retarget all docs+skills off the nuked createDriver/ru…
drewstone Jun 15, 2026
9d188e1
feat(supervise): recursive driver-executor — agents driving agents dr…
drewstone Jun 15, 2026
4d3e83d
refactor(bench)+docs: reclaim runKeystoneGate -> runGate; strip 4 doc…
drewstone Jun 15, 2026
68a582d
docs(research): keystone recursion ✅ (9d188e1); createDriver retire ✅…
drewstone Jun 15, 2026
7e14003
feat(supervise): coordinationDriverAgent — the cheap/offline driver (…
drewstone Jun 15, 2026
40ff000
docs(research): dual-purpose resolution (one substrate serves product…
drewstone Jun 15, 2026
bd58761
feat(supervise): completion-oracle — settled ⟺ delivered (Foreman 0/18)
drewstone Jun 15, 2026
9dc744d
docs(research): completion-oracle #3 ✅ (bd58761) — settled ⟺ delivere…
drewstone Jun 15, 2026
d5609d9
feat(bench): atom-humaneval — agents-driving-agents on a live deploya…
drewstone Jun 15, 2026
525dc5a
feat(topology): animated visual replay of a recursive agent run
drewstone Jun 15, 2026
1b114f7
fix(bench): atom-humaneval blind arm survives transient router errors…
drewstone Jun 15, 2026
8ed2982
feat(supervise): the supervisor AUTHORS worker profiles from a skill …
drewstone Jun 15, 2026
c053e37
feat(supervise): coordination MCP over a live Scope — the real keysto…
drewstone Jun 16, 2026
2f7212d
feat(bench): prove a coding harness drives the Scope via the coordina…
drewstone Jun 16, 2026
972707f
feat(bench): WHOLE real e2e — opencode supervisor drives opencode wor…
drewstone Jun 16, 2026
9853f6a
docs(canonical-api): the AgentProfile law — author the profile, the s…
drewstone Jun 16, 2026
daff034
docs(research): consolidate docs/research 28→14 — retire shipped/subs…
drewstone Jun 16, 2026
39d5b44
Merge origin/main into chore/atom-deep-clean — resolve 13 conflicts
drewstone Jun 16, 2026
a6eab4e
docs(canonical-api): substrate pin 0.89 → 0.92 (matches the merged pa…
drewstone Jun 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ bench/scripts/__pycache__/

# local rollout-corpus scratch (raw jsonl, per work-line)
corpus/
test_repo/
6 changes: 3 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The global style rule (lead with the answer, define every term, no stacked jargo

This repo's bottleneck is agents paying a **re-discovery tax**: re-reading 15 files to rebuild a mental model that already exists. Before exploring, read, in order:

0. **`docs/canonical-api.md`** — THE API reference + anti-reinvention decision table ("I want to ___ → use ___ → NOT ___"). The genome→run→optimize→gate spine, the recursive atom (persona=driver, `spawnChild`=worker|sub-driver, isolated|`Workspace` artifact, conserved sub-budgets, analyst dimensions+gaps), every signature `file:line`-verified. **Read before writing ANY orchestration/optimization/measurement code** — if you're about to write `runConversation`, a "skill optimizer", a "profile-seam", or a `new Sandbox(...)` loop, it already exists.
0. **`docs/canonical-api.md`** — THE API reference + anti-reinvention decision table ("I want to ___ → use ___ → NOT ___"). The genome→run→optimize→gate spine, the recursive atom (persona=driver, `spawnChild`=worker|sub-driver, isolated|`Workspace` artifact, conserved sub-budgets, analyst dimensions+gaps), every signature `file:line`-verified. **Read before writing ANY orchestration/optimization/measurement code** — if you're about to write `runConversation`, a "skill optimizer", a "profile-seam", or a `new Sandbox(...)` loop, it already exists. **§1.5 is the AgentProfile law we keep forgetting:** an agent IS its full profile (prompt+skills+tools+mcp+subagents+hooks); you change behavior by AUTHORING the profile and letting the sandbox substrate materialize it into harness shapes — never write a verify-loop or harness-specific config (self-verification is a hook/process, not code; opencode is only the cli-bridge test target — generalize, never specialize).
1. **`docs/architecture.md`** — the canonical spine (one recursive `Agent` atom; two timescales; benchmark-as-adapter; selector≠judge). Wins on any architecture conflict. `docs/README.md` indexes the rest; `docs/roadmap-rsi.md` is the dependency-ordered build plan; `docs/architecture-interpretations.md` defines **the decision gate**.
2. **`bench/HARNESS.md`** — the experiment-harness map: commands, the `rollout → corpus → selector → CI → gate` data flow, the wired/needs-creds/scaffolded matrix, and run-the-gate-in-2-lines. Read it before touching `bench/`.
3. **`.evolve/current.json`** — the single source of truth for the active goal + generation + the live science state. Then `.evolve/progress.md` and the newest `.evolve/pursuits/*.md`.
Expand Down Expand Up @@ -62,9 +62,9 @@ Types that stay in THIS repo because they're runtime-shaped (coupled to a runnin
- `run-loop.ts` — `runLoop`, the round-synchronous leaf kernel. Per round: `driver.plan()`→N tasks→one sandbox/iteration (bounded by `maxConcurrency`, round-robin `agentRuns`)→`streamPrompt`→`output.parse`→`validator.validate`→`driver.decide`. Owns iteration accounting, concurrency, abort, cost+token aggregation, trace emission, box teardown. Exports `defaultSelectWinner` (best-valid-score, ties→earliest) — the single-sourced selection the personify combinators reuse.
- `supervise/` — the recursive execution atom (keystone): `Scope` + `Supervisor` over the open `Executor` port, spawn/settle on a **conserved budget pool** so equal-compute holds by construction; journal→replay/resume. `runtime.ts` also holds `createExecutor({backend})` — the ONE built-in executor (backend-as-data: `router`/`router-tools`/`bridge`/`cli`/`sandbox`; `router-tools` is the off-box tool-using agentic loop — chat→tool_calls→`executeToolCall`→repeat — over the router's tool-calling, no sandbox); the per-backend bodies are internal case-arms, BYO agents implement `Executor` directly.
- `personify/` — the content-free generic combinators (`fanout`/`loopUntil`/`widen`/`panel`/`verify`/`pipeline`) + `definePersona`/`runPersonified` + the cross-run `Corpus` + `createScopeAnalyst` (the analyst-on-scope steer firewall).
- `driver.ts` — `createDriver` (agent authors topology via a `TopologyPlanner`); `PlannerContext.analyses` is the analyst→driver wire (built + tested, but **not yet fed live** by any bench); `assertTraceDerivedFindings` is the steer-firewall (selector≠judge). `types.ts` holds `Driver`/`AgentRunSpec`/`OutputAdapter`/`Validator`/`Iteration`/`LoopResult`/`SandboxClient` + the `LoopTraceEvent` union. `sandbox-run.ts` is `openSandboxRun` — the one run/stream/resume sandbox seam; `inline-sandbox-client.ts` is `inlineSandboxClient` — the one adapter presenting any non-box `Executor` as a `SandboxClient` for `runLoop`. `loop-dispatch.ts` adapts `runLoop`→agent-eval campaigns; `report-usage.ts` forwards token usage so the integrity guard sees a real backend.
- the **agent-driver** is the canonical "drive an agent" path: an `AgentProfile` driving another `AgentProfile` via the coordination toolbox (`createCoordinationTools`, `src/mcp/tools/coordination.ts`) over the `Scope`/`Supervisor`, plus `runAgentic`/`defineStrategy`/`runPersonified` (`strategy.ts`/`personify/persona.ts`) on the Supervisor. `assertTraceDerivedFindings` (`personify/analyst.ts`) is the steer-firewall (selector≠judge). `types.ts` holds `Driver`/`AgentRunSpec`/`OutputAdapter`/`Validator`/`Iteration`/`LoopResult`/`SandboxClient` + the `LoopTraceEvent` union. `sandbox-run.ts` is `openSandboxRun` — the one run/stream/resume sandbox seam; `inline-sandbox-client.ts` is `inlineSandboxClient` — the one adapter presenting any non-box `Executor` as a `SandboxClient` for `runLoop`. `loop-dispatch.ts` adapts `runLoop`→agent-eval campaigns; `report-usage.ts` forwards token usage so the integrity guard sees a real backend.

Two substrates coexist for the same "recursive agent decision" atom: the round-synchronous `runLoop`+`createDriver` (what most benches drive today) and the reactive `Scope`/`Supervisor`+combinators (the newer canonical core). Prefer the latter for new recursive/keystone work. Both run over the one `Executor` port.
Two substrates coexist for the same "recursive agent decision" atom: the round-synchronous `runLoop` kernel (the leaf, what most sandbox benches drive today) and the reactive `Scope`/`Supervisor`+combinators (the canonical core — the agent-driver, `runAgentic`/`defineStrategy`/`runPersonified`). Prefer the latter for new recursive/keystone work. Both run over the one `Executor` port.

Headline entrypoints: `runAgentTask`/`runAgentTaskStream` (`src/run.ts`), the multi-agent conversation engine (`src/conversation/`), `handleChatTurn` (`src/durable/`), the named delegated loops (`src/loop-runner.ts`).

Expand Down
43 changes: 27 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ That is the common case. Everything below is for when one chat turn is not enoug
| Run a one-shot task with verification and eval | `runAgentTask` | root |
| Compare optimization strategies on YOUR domain (5 hooks) | `runBenchmark` + `defineStrategy` | `/loops` |
| Let the system author + evolve its own strategies, gated | `runStrategyEvolution` · `authorStrategy` · `promotionGate` | `/loops` |
| Run a multi-attempt loop with a custom driver | `runLoop` + `createDriver` | `/loops` |
| Run a multi-attempt loop with a custom driver | `runLoop` + an inline `Driver` | `/loops` |
| Drive one agent profile from another (the canonical driver) | `createCoordinationTools` over `Supervisor` (`/runtime`) | `/mcp` |
| Delegate a disciplined loop by mode (code, research, ...) | `runDelegatedLoop` or `agent-runtime-loop` | root |
| Build code reliably (reviewed, gated) | `createDefaultCoderDelegate` | `/mcp` |
| Grow a knowledge base with only grounded facts | `createKbGate` | `/mcp` |
Expand Down Expand Up @@ -108,10 +109,15 @@ evidence ledger live in [`bench/HARNESS.md`](./bench/HARNESS.md).
`runLoop` is a topology-agnostic kernel. Each iteration spawns a sandbox on an `AgentRunSpec`, decodes the output, validates it, and asks a driver what to do next. The driver owns topology. The validator owns scoring. The kernel owns iteration accounting, concurrency, cost and token aggregation, and trace emission.

```ts
import { runLoop, createDriver } from '@tangle-network/agent-runtime/loops'
import { runLoop, type Driver } from '@tangle-network/agent-runtime/loops'

const driver: Driver<Task, Output, 'pick-winner' | 'fail'> = {
plan: async (task, history) => (history.length === 0 ? [task, task] : []), // fan out, then stop
decide: (history) => (history.some((i) => i.verdict?.valid) ? 'pick-winner' : 'fail'),
}

const result = await runLoop({
driver: createDriver({ planner }), // the planner emits one TopologyMove per round
driver, // the driver owns topology; the kernel owns accounting
agentRuns: [claudeSpec, codexSpec, glmSpec], // heterogeneous: one harness per branch
output, // events to typed Output
validator, // Output to { valid, score }
Expand All @@ -121,13 +127,13 @@ const result = await runLoop({
result.winner // highest-scoring valid attempt
```

`createDriver` lets a planner author the topology at runtime: one `TopologyMove` per round
(`refine`, `fanout`, `select`, or `stop`); a malformed move throws `PlannerError`, so the loop never
runs a topology nobody chose. Topology is orthogonal to harness: the planner never names a backend,
and the kernel's `agentRuns` decide which harness runs each branch. For fixed shapes, write a small
inline `Driver` (see `examples/coder-loop`) or use the `personify` combinators (`fanout`, `loopUntil`,
`panel`, `pipeline`) over the recursive `Scope`/`Supervisor` core — the newer canonical path for
recursive work.
A `Driver` is `plan` (emit the round's `Task[]` — `[]` ends the loop) plus `decide` (the terminal
`Decision` over the history). Topology is orthogonal to harness: the driver never names a backend,
and the kernel's `agentRuns` decide which harness runs each branch. See `examples/coder-loop` for a
fixed-shape inline `Driver`. For recursive work prefer the **agent-driver** — an `AgentProfile`
driving another via `createCoordinationTools` (`/mcp`) over the budget-conserving `Scope`/`Supervisor`
core (`/runtime`) — plus the `personify` combinators (`fanout`, `loopUntil`, `panel`, `pipeline`) and
`runPersonified` on that same core.

## Self-improvement

Expand Down Expand Up @@ -209,12 +215,17 @@ Delegation state is in-memory by default — a server restart drops pending dele
## The experiment harness (bench/)

`bench/` is the internal harness; [`bench/HARNESS.md`](./bench/HARNESS.md) is its map — read that
first. The canonical path is the optimization suite (`runBenchmark`/`flywheel-evolve` over real
domains: the EnterpriseOps gym, commit0, answer-shaped math); the older selection-gate paths
(`runExperiment`, corpus-replay) remain for the legacy evidence. The live evidence ledger is
first. The canonical path is the optimization suite (`runBenchmark`/`runStrategyEvolution` over real
domains: the EnterpriseOps gym, commit0, answer-shaped math). The live evidence ledger is
`.evolve/current.json` — results never live in this README.

One entrypoint, `runExperiment(adapter, { sandboxClient, agentRun, arms, ... })`: N instances times a set of arms, each arm a topology driven through `runLoop`, judged by the adapter, written to a durable canonical corpus. An arm is one steer function `f(rootPrompt, history) => nextPrompt`: `random` ignores history (the compute control), `refine` carries the prior answer plus a directive, `diverse` rotates a strategy lens. The cost dial is the backend type (`hermes` for a direct router call, `opencode` or `claude-code` or `codex` for agent CLIs). The deep statistics (paired bootstrap with Benjamini-Hochberg correction, selector replay) come from `corpus-report.mts` and `corpus-replay.mts` over the written corpus, computed once. See `bench/HARNESS.md` and `docs/learning-flywheel.md`.
The recursive diverse-vs-blind gate runs through the keystone: `gate-cli.mts` →
`runGate` composes a `Persona` + the generic `fanout` combinator over the budget-conserving
`Supervisor`, with each child solved via the router and graded by the benchmark's own deployable
`adapter.judge` (selector ≠ oracle). Each rollout is written to a durable canonical corpus; the deep
statistics (paired bootstrap with Benjamini-Hochberg correction, selector replay) come from
`corpus-report.mts` and `corpus-replay.mts` over that corpus, computed once and offline. See
`bench/HARNESS.md` and `docs/learning-flywheel.md`.

## Defaults

Expand All @@ -225,7 +236,7 @@ One entrypoint, `runExperiment(adapter, { sandboxClient, agentRun, arms, ... })`
| Router base URL | `https://router.tangle.tools/v1` | `TANGLE_ROUTER_BASE_URL` env |
| Sandbox base URL | `https://sandbox.tangle.tools` | `SANDBOX_API_URL` env |
| Loop iteration cap | 10 (`runLoop`) | `runLoop({ maxIterations })` |
| Driver | none, required by `runLoop` | `createDriver` or an inline `Driver` |
| Driver | none, required by `runLoop` | an inline `Driver` (`plan`/`decide`) |
| Strategy budget (suite) | 3 rollouts/shots per strategy per task | `runBenchmark({ budget })` |
| Winner selection (coder delegate) | `highest-score` | `winnerSelection` option |
| KB gate min passage | 12 chars | `createKbGate({ minPassageChars })` |
Expand Down Expand Up @@ -257,7 +268,7 @@ sandbox AgentProfile, Sandbox.create, streamPrompt, exportTraceBundle. T
|---|---|
| `@tangle-network/agent-runtime` | chat turns, delegated loop-runner, OTEL export, errors, model resolution |
| `.../agent` | `defineAgent` plus surface and outcome adapters |
| `.../loops` | **the optimization suite** (`Environment`, `defineStrategy`, `runBenchmark`, `runStrategyEvolution`, `authorStrategy`, `promotionGate`) + the `runLoop` kernel, `createDriver`, `loopDispatch` |
| `.../loops` | **the optimization suite** (`Environment`, `defineStrategy`, `runBenchmark`, `runStrategyEvolution`, `authorStrategy`, `promotionGate`) + the `runLoop` kernel, the `Driver` type, `loopDispatch` |
| `.../profiles` | `coderProfile`, `researcherProfile` presets |
| `.../mcp` | `createMcpServer`, `createDefaultCoderDelegate`, `createKbGate`, the `agent-runtime-mcp` bin |
| `.../improvement` | `improvementDriver` (code/worktree `CandidateGenerator`), `agenticGenerator`, `reflectiveGenerator` — the code-surface driver you pass to agent-eval's `selfImprove` |
Expand Down
Loading
Loading