feat(atom): recursive agent atom keystone + deep-clean + docs consolidation#304
Conversation
…ofile + resolver + ladder
Add the unified, future-proof delivery structure: one certified unit of agent
power = { interface, binding }. Interfaces are CLOSED (tool / mcp / context /
retrieval / hook / subagent); bindings are OPEN (inline / file / http /
sandbox-code / mcp-stdio / mcp-remote / process-on-infra / rag-index /
memory-store / wasm / a2a). A single resolver lowers any binding into one uniform
ResolvedSurface consumed identically by the host seam (RouterToolsSeam tools +
executeToolCall) and the sandbox seam (AgentProfile).
- src/intelligence/capability.ts: the manifest types + CapabilityNotAdmittedError
+ manifestFromProfile (lowers today's CertifiedProfile wire into capabilities[]
with best-effort binding inference, so the spine delivers value before the
plane changes).
- src/intelligence/resolver.ts: composeCertifiedProfile — the spine resolves
inline/file (byte-identical to composeCertifiedPrompt, the regression lock),
mcp-stdio/mcp-remote (strict union widens to the SDK's flat
AgentProfileMcpServer — an always-valid lowering), and http tools (the host
seam). Ladder rungs that need infra (sandbox-code, process-on-infra) are
injected ResolveCtx providers; rag-index/memory-store/wasm/a2a throw
CapabilityNotAdmittedError (memory gated on the E3 admission bar). Fail-closed:
null manifest -> base surface, per-capability failure -> drop (diagnostic via
onDrop), post-resolve drift drops any tool/mcp whose live names diverge.
- src/mcp/delegation-profile.ts: composeProductionAgentProfile now also merges
tools box-flags, hooks, subagents, and injects ResolvedSurface.mcpConnections
into AgentProfile.mcp (the sandbox-seam mapping).
- exports + export gate + the two spec corrections (mcp lowers via always-valid
widening; tools lower two ways since AgentProfile.tools is box flags).
… n=16 +16.4pp) The +16.4pp CI[+5.3,+29.8] n=16 depth-steered-continuation result did not replicate when powered: depth-breadth = +4.7pp CI[-1.9,+11.4] at n=48 (a tie; +4.1pp at n=72). architecture.md and roadmap-rsi.md advertised it as a cleared keystone; they now carry the retraction and point at .evolve/current.json.
…2 LOC) - delete bench/src/observe-steer-workspace-loop.mts (the #194 mock anti-pattern; 0 inbound refs) - drop orphan pass-through re-exports CaptureIntegrityError/ReplayError/VerificationError (src/errors.ts) - drop orphan interface AgentTaskRunSummary (src/types.ts) - fix doc-rot in loop-facade-postmortem.md; gitignore stray test_repo/ - deletion-ledger.md tracks deletions + the deferred migrations (driver.ts 12 callers, AgentProfile superset) Gates verified by hand: typecheck 0, lint 0, 924 tests pass / 0 fail. Load-bearing fail-loud fences left intact (NOT dead code).
Safe minor/patch dev bumps, gates verified (typecheck 0, lint 0, 924 tests pass). Deferred (need own careful pass): biome 2.5 (13 new lint warnings), typescript 6 + vitest 4 (majors), agent-eval 0.92 (substrate — sync with the AgentProfile superset work).
…tware 3.0)
Replace the per-role hand-coded prompt builders with generateDriverSystemPrompt(spec):
a (fused) router call generates the driver prompt from {role,goal,target,harness,stance},
cached for semantic reuse via PromptRegistry + hashContent key (file/JSON or DB). The
hand-authored worker-driver prompt becomes the generator's seed + its tests the invariants.
Single optimizable surface; depends on a tangle-router fusion primitive (separate issue).
…ateDriver, run commit0, deep-clean, dedup)
…); commit0 RAN; the delete fork
…ement+eval paradigm DELETE the wrong abstraction (createDriver = a code TopologyPlanner driving runLoop over string-prompt→string-answer calls, judged by adapter.judge) and the entire old bench experiment + eval-gen apparatus built on it. The agent-driver (AgentProfile driving AgentProfile via coordination tools) replaces it; the runLoop KERNEL and the Scope/ Supervisor are untouched. Deleted (15): src/runtime/driver.ts; bench experiment.ts(+test)/steering-experiment(+test)/ improve-prompt/research-loop/finsearch-loop/rsi/generate-eval/run-benchmarks/run.ts/ skills-sandbox/profile-coord-sandbox; tests/loops/dynamic.test.ts. Survivors (search-bench/cloud-loop/fleet/commit0-gate) re-homed onto a new pure helper bench/src/sandbox-run.ts (answerOutput/sandboxAgentRun/WorkerBackendType/AnalystFn/llmAnalyst — no experiment shell). runLoop kernel tests kept via a scriptedDriver stub in refine-driver.ts. Gates (hand-verified): build 0, typecheck 0 (root+bench; also fixed a pre-existing bench BackendType red), lint 0, 905 tests pass. Zero dangling code refs. ACCEPTED casualties of the full nuke (rebuild on the agent-driver/Supervisor path when wanted): the generate-eval data engine, the AgentProfile-coordinate optimizer (profile-coord), and run.ts's non-experiment subcommands (preflight/verify-judge/solve-one/ui-review).
…nExperiment to the agent-driver/Supervisor reality
…iving agents A spawned child can now BE a driver. driverExecutorFactory mounts a NESTED Scope over the SAME conserved budget pool + shared journal (scope.ts's new NestedScopeSeam), one depth deeper, and runs the wrapped driver's act there. A child resolves to a LEAF (worker) OR — for a role:'driver' spec, via withDriverExecutor — this executor, recursively. So a driver spawns a driver spawns a worker on one budget-conserving tree. The persona/strategy spawn fences now route a driver child to the recursive executor (compose) instead of throwing; act still fails loud only if a child is run directly. Reuses the atom — builds NO new budget/journal/selection logic. Budget conserved across depth (reserve-on-spawn fails closed at any depth), spend bubbles to root, journal records each nested tree, maxDepth enforced across recursion. Proven OFFLINE (no creds; scripted drivers+workers) in tests/loops/driver-recursion.test.ts: depth-2 chain root->mid->inner->worker (node id rec:s0:s0:s0 — a non-recursive build cannot produce it), fail-closed budget conservation across depth, spend roll-up (spentTotal = the worker's exact spend), nested-journal sub-trees, depth-ceiling across recursion. Gates hand-verified: build 0, typecheck 0, lint 0, 911 tests pass.
…s to latest-only Rename the opaque 'keystone' jargon: runKeystoneGate->runGate (+ RunGateOptions/GateArmResult/ GateReport), bench/src/keystone-gate.ts->gate.ts (+ -cli, +test), all import paths and CLI banners. Strip the last historical createDriver/runExperiment 'was removed/nuked' breadcrumbs from architecture.md, architecture-interpretations.md, learning-flywheel.md, roadmap-rsi.md — upgrading agents now see only the current agent-driver/Supervisor reality (history lives in git). Gates green; 905->911 with the keystone test.
… via nuke; #2b brain next
…LLM tool-loop over the coordination verbs) The CHEAP, in-process, no-creds variant of the recursive driver: act() mounts createCoordinationTools over its scope and runs an LLM tool-loop (injected chat seam) so the driver REASONS spawn/steer/await/stop; composes with 2a recursion (a driver agent spawns a driver agent, via makeWorkerAgent -> driverChild). NOT the primary driver — the CAPABLE driver is a sandbox agent with the coordination verbs as an MCP. This one is the offline-testable + cheap-orchestration path. Prompt is INJECTED (decoupled from agent-eval). Proven OFFLINE (no creds, scripted mock chat) tests/loops/coordination-driver.test.ts: the tool-loop drives real Scope.spawn via the coordination verbs + folds results back; a driver AGENT spawns a driver AGENT (separate nested journal tree). typecheck 0, lint 0.
… + proof); #2b cheap driver done, #2c capable sandbox driver + #3 completion-oracle next
The honest settle: a node counts as delivered only when a deployable check passes, never on self-report. - completion-gate.ts: gateOnDeliverable wraps any Executor so its settlement valid reflects a DeliverableSpec check (both execute shapes; fail-closed). - coordination-driver finalize: returns the best DELIVERED child; undefined when none delivered — a driver cannot self-declare done via prose. - driver-executor: derive the driver child's verdict from its direct settled events, so delivery composes UP the recursion (a sub-driver is valid only when it itself selected a delivered child). - supervisor: a winner MUST carry a real Out; a successful act that produced nothing is a no-winner, never a winner wrapping undefined. 8 offline tests: leaf gate (both execute shapes, fail-closed), ran-but-didn't- deliver yields no winner, the gate dominates score, delivery propagates up the recursion.
…d, composes up the recursion
…ble-checked domain A coordinationDriverAgent (real router brain) drives gated workers on HumanEval: each worker is settled valid ONLY when the local Docker test suite passes (completion-oracle, not self-report), against a blind best-of-K baseline. Proven live: the driver spawns, the worker solves, the checker gates, the supervisor returns a winner only on real delivery. Also exports gateOnDeliverable/DeliverableSpec from the runtime barrel (the #3 primitive was added to supervise/ but not surfaced on the package).
Fold the one runtime-hooks stream into a timestamped ReplayEvent[] (createReplayRecorder) and render a self-contained, scrubbable HTML player (renderReplayHtml) — the recursive agent tree animated over wall-clock, each node colored by the completion-oracle: delivered (valid) green, ran-but-not-delivered amber, failed red, with live token/cost counters. Synthesizes the unspawned root driver so the whole recursion renders. No server/build/deps. Wired into atom-humaneval (every driver run emits a replay.html). 4 offline recorder tests; proven on a live HumanEval run (driver -> worker -> delivered).
… (a 502 is a failed attempt, not a crash — matches the driver arm's down-typing)
…(the intelligence, not the plumbing) The supervisor's job is to DESIGN the agents it spawns — read the task, decompose it, and author a tailored profile (instructions + model) per worker. supervisorSkill is the how-to it reads (its own system prompt) — THE optimizable self-improvement surface; authoredWorker builds a worker from an authored profile; asAuthoredProfile catches empty/placeholder profiles (a skill violation). Proven offline (no creds, no plumbing): a skill-guided supervisor authors DISTINCT, tailored worker recipes per sub-task and they flow to the workers. 3 tests.
…ne for in-box driving serveCoordinationMcp fronts a live Scope with an HTTP JSON-RPC MCP server: an in-box coding harness (opencode via cli-bridge) mounts mcp.mcpServers.coordination and calls spawn_worker as a native tool, landing on Scope.spawn — a real box driving real boxes, not emulated function-tools. Real test: HTTP tools/call spawn_worker -> Scope.spawn -> worker settles -> winner (no mock of the MCP path). Plus the standard supervise SKILL.md.
…tion MCP (live) opencode (glm-5-turbo via cli-bridge) mounts mcp.mcpServers.coordination (type:http → opencode remote) and calls spawn_worker itself → real Scope.spawn → worker settles, and reads back the await_next result. The in-box driving path is REAL — a coding agent drives recursion as a native tool, not emulated. (Bridge wants mcp type:'http', not 'remote'.)
…kers via the coordination MCP, real test gates delivery Live, no mock: the opencode supervisor (glm-5-turbo via cli-bridge) mounts the coordination MCP, authors worker profiles, calls spawn_worker -> real Scope.spawn -> real opencode workers code in a cwd -> python3 test gates valid -> supervisor settles on the delivered worker -> winner. The completion-oracle (deployable check, not LLM judge) decided delivery over the supervisor's confusion that it couldn't see the workers' isolated cwds (→ shared Workspace next). Proof artifact for the in-box-driving path; the law-compliant productionization is a substrate backend (tmux/bridge/sandbox) that runs authored profiles — not this harness-specific script.
…ubstrate materializes it §1.5 + decision-table rows + CLAUDE.md §0 pointer. The thing we keep forgetting: an agent IS its full AgentProfile (prompt+skills+tools/mcp+subagents+hooks+permissions+model), not a prompt; change behavior by AUTHORING the profile and letting the sandbox substrate materialize it into harness shapes — never write a verify-loop or harness-specific config (self-verification is a hook/process; opencode is only the cli-bridge test target; a missing lever is a substrate gap).
…umed design docs Retired 14 design-research docs whose content is now shipped code, in .evolve/current.json, or self-declared subsumed/retracted (the recursion atom shipped; the optimization-space layer evidence landed; verdicts reached). Refreshed the research index, recorded the retirement + rationale in deletion-ledger.md (Pass 2), and fixed every inbound link (top index, the harvest-corpus.ts comment → current.json, optimization-space's suite links). Kept the SSOT masterplan, the canonical-referenced maps (optimization-space/leapfrog), the two gated belief specs, the postmortem guardrail, the build-lists, and the agent-lab tombstones. No broken links into the 14 remain from any canonical doc or src/.
- 8 bench files (finsearch-loop/improve-prompt/rsi/run/run-benchmarks/research-loop/ skills-sandbox/profile-coord-sandbox): kept DELETED — the createDriver-paradigm nuke (2101f2d); main only modified files this branch had already removed. - src/runtime/strategy.ts: kept BOTH imports (withDriverExecutor + routerToolLoop — the body uses both, 4+3 refs). - package.json: biome ^2.4.15 (branch) + agent-eval ^0.92.0 (main's newer substrate); lockfile reconciled via pnpm install. - docs/architecture.md + roadmap-rsi.md: kept the branch's agent-driver framing + main's clarifications. Verified in an isolated worktree: build + typecheck (core + examples) clean; 944 tests pass.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — a6eab4ec
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T07:51:41Z
tangletools
left a comment
There was a problem hiding this comment.
🟡 Value Audit — sound-with-nits
| Verdict | sound-with-nits |
| Concerns | 5 (2 low, 3 weak-concern) |
| Heuristic | 0.2s |
| Duplication | 0.0s |
| Interrogation | 825.4s (2 bridge agents) |
| Total | 825.6s |
💰 Value — sound-with-nits
Adds the missing recursive-driver execution layer to the existing Scope/Supervisor substrate, plus an LLM-driven coordination driver, an HTTP MCP server, a deliverable gate, and authoring helpers; it is coherent and fills a real gap, though the authoring helper is thinner than the docs' full-profile
- What it does: The change turns the existing Scope/Supervisor atom into a genuinely recursive one: a spawned child can itself be a driver that mounts a nested Scope and spawns more children, all sharing one conserved budget pool, journal, and blob store. Concretely it adds: (1)
driver-executor.ts(driverChild,withDriverExecutor,driverExecutorFactory) which runs a driver child inside a nested Scope one - Goals it achieves: The change achieves three goals evident from the code: (1) make agents-driving-agents real — the recursive driver-executor closes the gap between the project's 'recursive atom' architecture and actual recursive execution; (2) let external coding harnesses act as supervisors by mounting the coordination verbs as an MCP server, not just in-process code; (3) enforce that 'done' means a deliverable ch
- Assessment: This is a good change on its merits. It is built in the grain of the codebase: it extends the existing
Scope/Supervisor/createCoordinationToolssubstrate rather than replacing it, preserves the conserved-budget/equal-k invariant by construction, adds a depth ceiling, and bubbles spend and delivery verdicts up the recursion tree. The new primitives are well-layered (tools → in-process driver - Better / existing approach: none — this is the right approach. I checked
maindirectly and the recursive capability the PR adds is not present:main:src/runtime/personify/persona.ts:102explicitly warns that a spawned child run as a driver 'drives a leaf', andmain:src/runtime/supervise/scope.tshas nonestedScopeSeamKey/NestedScopeSeam(the PR adds these atsrc/runtime/supervise/scope.ts:117-160).mainhas `cr
🎯 Usefulness — sound-with-nits
A coherent, grain-following change: it lands a real recursive agent atom (Scope/Supervisor + LLM-driven coordination driver + MCP server), layers strategy/persona on top, removes verified dead code, and consolidates docs; only minor export and doc-link cleanup misses.
- Integration: The new behavior is reachable and wired in.
coordinationDriverAgentand the supervisor primitives are exported fromsrc/runtime/index.ts(src/runtime/index.ts:296-303,:333-336) and consumed by tests (tests/loops/coordination-driver.test.ts,tests/loops/coordination-mcp.test.ts), benchmarks (bench/src/atom-humaneval.mts:173,bench/src/atom-mcp-e2e.mts:136), and higher-level runtime - Fit with existing patterns: It fits the codebase's architecture rather than competing with it. The new code sits on top of the existing
Scope/Supervisor/driver-executorrecursion primitives (src/runtime/supervise/supervisor.ts:64,driver-executor.ts:125) and reusescreateCoordinationToolsfromsrc/mcp/tools/coordination.ts:91. It directly implements the workflow described inskills/supervise/SKILL.md(decompo - Real-world viability: The primitives are exercised beyond the happy path: tests cover budget conservation, abort cascade, recursion, join barriers, intensity breaker, MCP over HTTP, completion-oracle selection, and worker-profile authoring. The benchmark harnesses use real router models and real deterministic checks (HumanEval Docker checker, pytest file). Production deployment would still need to swap the in-memory bl
🔎 Heuristic Signals
🟡 Cruft: console debug added bench/src/atom-humaneval.mts
- console.log(
atom-humaneval: N=${N} K=${K} offset=${OFFSET} worker=${cfg.model} driver=${driverCfg.model})
🟡 Cruft: magic number added src/topology/replay.ts
+function setMs(ms){ ms=Math.max(0,Math.min(span,ms)); scrub.value=(ms/span*1000)|0; applyTo(ms); }
🎯 Usefulness Audit
🟡 serveCoordinationMcp is not re-exported from the runtime barrel [integration] ``
serveCoordinationMcpis called in tests and benchmarks (tests/loops/coordination-mcp.test.ts:64,bench/src/atom-mcp-e2e.mts:136,bench/src/mcp-mount-probe.mts:76) and is the "capable/primary" keystone path perdocs/research/rsi-atom-masterplan.md:20, but a grep ofsrc/runtime/index.tsfinds no re-export.coordinationDriverAgentis re-exported atsrc/runtime/index.ts:298. Either addserveCoordinationMcpto the barrel or document that it is intentionally deep-import-only.
🟡 Stale docstring link to deleted research doc [integration] ``
bench/src/commit0-env.ts:2still links todocs/research/long-horizon-benchmark-survey.md, which was retired in Pass 2 of the doc consolidation (docs/research/deletion-ledger.md:42). Update the comment to point to.evolve/current.jsonordocs/research/long-horizon-agent-map.md.
💰 Value Audit
🟡 AuthoredProfile is thinner than the AgentProfile law it advertises [against-grain] ``
The PR's docs establish that 'the supervisor's only intelligence is AUTHORING full profiles' and that an agent IS 'prompt + skills + tools/mcp + subagents + hooks' (
docs/canonical-api.md:15-25). ButAuthoredProfileonly carriesname,systemPrompt, andmodel(src/runtime/supervise/authoring.ts:23-29), andauthoredWorkermaterializes it as a single router call with a bare{ name }AgentProfile(src/runtime/supervise/authoring.ts:65-117). The supervisor skill tells the superviso
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
…#307) * chore(release): 0.54.0 — expose the recursive agent atom + workspace seam Publishes coordinationDriverAgent, serveCoordinationMcp, driverChild (the recursive driver atom, #304) and runInWorkspace (the shared-workspace seam, #305), which merged into main without a version bump and so were absent from the published 0.53.0. Additive (new exports on the /runtime barrel) — no breaking changes, no fleet bump required. Verified: dist/runtime.js exposes all four; tests 945 pass. * fix(lint): clear 2 biome errors blocking the release (assignment-in-expr, unused import, import order) --------- Co-authored-by: Drew Stone <hello@webb.tools>
Draft — the⚠️ Needs a rebase/conflict-resolve vs
chore/atom-deep-cleanwork line (28 commits). Tracks the recursive-agent-atom keystone, the deep-clean, and the docs consolidation.mainbefore it's mergeable (~13 conflict points —mainmoved while this branch carried large deletions); resolving that is its own reviewed step.What's in here
The recursive agent atom — agents driving agents, live (the keystone)
c053e37— coordination MCP over a liveScope: the real keystone for in-box driving.2f7212d— proof a coding harness drives theScopevia the coordination MCP (live).972707f— whole real e2e: an opencode supervisor drives opencode workers via the coordination MCP, with the real test suite gating delivery (no mock).8ed2982— the supervisor authors worker profiles from a skill (the intelligence is the authoring, not the plumbing).525dc5a— animated visual replay of a recursive agent run (topology).Atom deep-clean
docs/research/deletion-ledger.md.Docs (this session)
9853f6a— the AgentProfile law (docs/canonical-api.md§1.5 + decision rows +CLAUDE.md§0): an agent IS its full profile (prompt+skills+tools/mcp+subagents+hooks); change behavior by authoring the profile and letting the sandbox substrate materialize it — never write a verify-loop or harness-specific config. The thing we kept forgetting, now in the first-read doc.daff034— research-dir consolidation 28→14: retired design docs that shipped/were subsumed (evidence now in.evolve/current.json); refreshed the index, recorded rationale in the deletion ledger (Pass 2), fixed every inbound link.docs/51 → 37.State / how to land
mainis resolved (and the unrelated in-flight bridge experiment on the branch is either committed or dropped).