Skip to content

feat(atom): recursive agent atom keystone + deep-clean + docs consolidation#304

Merged
drewstone merged 30 commits into
mainfrom
chore/atom-deep-clean
Jun 16, 2026
Merged

feat(atom): recursive agent atom keystone + deep-clean + docs consolidation#304
drewstone merged 30 commits into
mainfrom
chore/atom-deep-clean

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Draft — the chore/atom-deep-clean work line (28 commits). Tracks the recursive-agent-atom keystone, the deep-clean, and the docs consolidation. ⚠️ Needs a rebase/conflict-resolve vs main before it's mergeable (~13 conflict points — main moved while this branch carried large deletions); resolving that is its own reviewed step.

What's in here

The recursive agent atom — agents driving agents, live (the keystone)

  • c053e37 — coordination MCP over a live Scope: the real keystone for in-box driving.
  • 2f7212d — proof a coding harness drives the Scope via the coordination MCP (live).
  • 972707fwhole real e2e: an opencode supervisor drives opencode workers via the coordination MCP, with the real test suite gating delivery (no mock).
  • 8ed2982 — the supervisor authors worker profiles from a skill (the intelligence is the authoring, not the plumbing).
  • 525dc5a — animated visual replay of a recursive agent run (topology).

Atom deep-clean

  • Dead-code + orphan removal across passes, each gates-verified; tracked precisely in docs/research/deletion-ledger.md.

Docs (this session)

  • 9853f6athe AgentProfile law (docs/canonical-api.md §1.5 + decision rows + CLAUDE.md §0): an agent IS its full profile (prompt+skills+tools/mcp+subagents+hooks); change behavior by authoring the profile and letting the sandbox substrate materialize it — never write a verify-loop or harness-specific config. The thing we kept forgetting, now in the first-read doc.
  • daff034research-dir consolidation 28→14: retired design docs that shipped/were subsumed (evidence now in .evolve/current.json); refreshed the index, recorded rationale in the deletion ledger (Pass 2), fixed every inbound link. docs/ 51 → 37.

State / how to land

  • Draft until the rebase vs main is resolved (and the unrelated in-flight bridge experiment on the branch is either committed or dropped).
  • Substrate-layering and no-broken-links verified for the docs portion; the keystone commits carry their own live-e2e proofs.

drewstone added 30 commits June 14, 2026 09:05
…ofile + resolver + ladder

Add the unified, future-proof delivery structure: one certified unit of agent
power = { interface, binding }. Interfaces are CLOSED (tool / mcp / context /
retrieval / hook / subagent); bindings are OPEN (inline / file / http /
sandbox-code / mcp-stdio / mcp-remote / process-on-infra / rag-index /
memory-store / wasm / a2a). A single resolver lowers any binding into one uniform
ResolvedSurface consumed identically by the host seam (RouterToolsSeam tools +
executeToolCall) and the sandbox seam (AgentProfile).

- src/intelligence/capability.ts: the manifest types + CapabilityNotAdmittedError
  + manifestFromProfile (lowers today's CertifiedProfile wire into capabilities[]
  with best-effort binding inference, so the spine delivers value before the
  plane changes).
- src/intelligence/resolver.ts: composeCertifiedProfile — the spine resolves
  inline/file (byte-identical to composeCertifiedPrompt, the regression lock),
  mcp-stdio/mcp-remote (strict union widens to the SDK's flat
  AgentProfileMcpServer — an always-valid lowering), and http tools (the host
  seam). Ladder rungs that need infra (sandbox-code, process-on-infra) are
  injected ResolveCtx providers; rag-index/memory-store/wasm/a2a throw
  CapabilityNotAdmittedError (memory gated on the E3 admission bar). Fail-closed:
  null manifest -> base surface, per-capability failure -> drop (diagnostic via
  onDrop), post-resolve drift drops any tool/mcp whose live names diverge.
- src/mcp/delegation-profile.ts: composeProductionAgentProfile now also merges
  tools box-flags, hooks, subagents, and injects ResolvedSurface.mcpConnections
  into AgentProfile.mcp (the sandbox-seam mapping).
- exports + export gate + the two spec corrections (mcp lowers via always-valid
  widening; tools lower two ways since AgentProfile.tools is box flags).
… n=16 +16.4pp)

The +16.4pp CI[+5.3,+29.8] n=16 depth-steered-continuation result did not
replicate when powered: depth-breadth = +4.7pp CI[-1.9,+11.4] at n=48 (a tie;
+4.1pp at n=72). architecture.md and roadmap-rsi.md advertised it as a cleared
keystone; they now carry the retraction and point at .evolve/current.json.
…2 LOC)

- delete bench/src/observe-steer-workspace-loop.mts (the #194 mock anti-pattern; 0 inbound refs)
- drop orphan pass-through re-exports CaptureIntegrityError/ReplayError/VerificationError (src/errors.ts)
- drop orphan interface AgentTaskRunSummary (src/types.ts)
- fix doc-rot in loop-facade-postmortem.md; gitignore stray test_repo/
- deletion-ledger.md tracks deletions + the deferred migrations (driver.ts 12 callers, AgentProfile superset)

Gates verified by hand: typecheck 0, lint 0, 924 tests pass / 0 fail. Load-bearing fail-loud fences left intact (NOT dead code).
Safe minor/patch dev bumps, gates verified (typecheck 0, lint 0, 924 tests pass).
Deferred (need own careful pass): biome 2.5 (13 new lint warnings), typescript 6 + vitest 4 (majors), agent-eval 0.92 (substrate — sync with the AgentProfile superset work).
…tware 3.0)

Replace the per-role hand-coded prompt builders with generateDriverSystemPrompt(spec):
a (fused) router call generates the driver prompt from {role,goal,target,harness,stance},
cached for semantic reuse via PromptRegistry + hashContent key (file/JSON or DB). The
hand-authored worker-driver prompt becomes the generator's seed + its tests the invariants.
Single optimizable surface; depends on a tangle-router fusion primitive (separate issue).
…ement+eval paradigm

DELETE the wrong abstraction (createDriver = a code TopologyPlanner driving runLoop over
string-prompt→string-answer calls, judged by adapter.judge) and the entire old bench
experiment + eval-gen apparatus built on it. The agent-driver (AgentProfile driving
AgentProfile via coordination tools) replaces it; the runLoop KERNEL and the Scope/
Supervisor are untouched.

Deleted (15): src/runtime/driver.ts; bench experiment.ts(+test)/steering-experiment(+test)/
improve-prompt/research-loop/finsearch-loop/rsi/generate-eval/run-benchmarks/run.ts/
skills-sandbox/profile-coord-sandbox; tests/loops/dynamic.test.ts.
Survivors (search-bench/cloud-loop/fleet/commit0-gate) re-homed onto a new pure helper
bench/src/sandbox-run.ts (answerOutput/sandboxAgentRun/WorkerBackendType/AnalystFn/llmAnalyst —
no experiment shell). runLoop kernel tests kept via a scriptedDriver stub in refine-driver.ts.

Gates (hand-verified): build 0, typecheck 0 (root+bench; also fixed a pre-existing bench
BackendType red), lint 0, 905 tests pass. Zero dangling code refs.

ACCEPTED casualties of the full nuke (rebuild on the agent-driver/Supervisor path when wanted):
the generate-eval data engine, the AgentProfile-coordinate optimizer (profile-coord), and
run.ts's non-experiment subcommands (preflight/verify-judge/solve-one/ui-review).
…nExperiment to the agent-driver/Supervisor reality
…iving agents

A spawned child can now BE a driver. driverExecutorFactory mounts a NESTED Scope over
the SAME conserved budget pool + shared journal (scope.ts's new NestedScopeSeam), one
depth deeper, and runs the wrapped driver's act there. A child resolves to a LEAF
(worker) OR — for a role:'driver' spec, via withDriverExecutor — this executor,
recursively. So a driver spawns a driver spawns a worker on one budget-conserving tree.
The persona/strategy spawn fences now route a driver child to the recursive executor
(compose) instead of throwing; act still fails loud only if a child is run directly.

Reuses the atom — builds NO new budget/journal/selection logic. Budget conserved across
depth (reserve-on-spawn fails closed at any depth), spend bubbles to root, journal records
each nested tree, maxDepth enforced across recursion.

Proven OFFLINE (no creds; scripted drivers+workers) in tests/loops/driver-recursion.test.ts:
depth-2 chain root->mid->inner->worker (node id rec:s0:s0:s0 — a non-recursive build cannot
produce it), fail-closed budget conservation across depth, spend roll-up (spentTotal = the
worker's exact spend), nested-journal sub-trees, depth-ceiling across recursion.
Gates hand-verified: build 0, typecheck 0, lint 0, 911 tests pass.
…s to latest-only

Rename the opaque 'keystone' jargon: runKeystoneGate->runGate (+ RunGateOptions/GateArmResult/
GateReport), bench/src/keystone-gate.ts->gate.ts (+ -cli, +test), all import paths and CLI
banners. Strip the last historical createDriver/runExperiment 'was removed/nuked' breadcrumbs
from architecture.md, architecture-interpretations.md, learning-flywheel.md, roadmap-rsi.md —
upgrading agents now see only the current agent-driver/Supervisor reality (history lives in git).
Gates green; 905->911 with the keystone test.
…LLM tool-loop over the coordination verbs)

The CHEAP, in-process, no-creds variant of the recursive driver: act() mounts
createCoordinationTools over its scope and runs an LLM tool-loop (injected chat seam) so the
driver REASONS spawn/steer/await/stop; composes with 2a recursion (a driver agent spawns a
driver agent, via makeWorkerAgent -> driverChild). NOT the primary driver — the CAPABLE
driver is a sandbox agent with the coordination verbs as an MCP. This one is the
offline-testable + cheap-orchestration path. Prompt is INJECTED (decoupled from agent-eval).

Proven OFFLINE (no creds, scripted mock chat) tests/loops/coordination-driver.test.ts:
the tool-loop drives real Scope.spawn via the coordination verbs + folds results back; a
driver AGENT spawns a driver AGENT (separate nested journal tree). typecheck 0, lint 0.
… + proof); #2b cheap driver done, #2c capable sandbox driver + #3 completion-oracle next
The honest settle: a node counts as delivered only when a deployable check
passes, never on self-report.

- completion-gate.ts: gateOnDeliverable wraps any Executor so its settlement
  valid reflects a DeliverableSpec check (both execute shapes; fail-closed).
- coordination-driver finalize: returns the best DELIVERED child; undefined
  when none delivered — a driver cannot self-declare done via prose.
- driver-executor: derive the driver child's verdict from its direct settled
  events, so delivery composes UP the recursion (a sub-driver is valid only
  when it itself selected a delivered child).
- supervisor: a winner MUST carry a real Out; a successful act that produced
  nothing is a no-winner, never a winner wrapping undefined.

8 offline tests: leaf gate (both execute shapes, fail-closed), ran-but-didn't-
deliver yields no winner, the gate dominates score, delivery propagates up the
recursion.
…ble-checked domain

A coordinationDriverAgent (real router brain) drives gated workers on HumanEval:
each worker is settled valid ONLY when the local Docker test suite passes
(completion-oracle, not self-report), against a blind best-of-K baseline. Proven
live: the driver spawns, the worker solves, the checker gates, the supervisor
returns a winner only on real delivery.

Also exports gateOnDeliverable/DeliverableSpec from the runtime barrel (the #3
primitive was added to supervise/ but not surfaced on the package).
Fold the one runtime-hooks stream into a timestamped ReplayEvent[] (createReplayRecorder)
and render a self-contained, scrubbable HTML player (renderReplayHtml) — the recursive
agent tree animated over wall-clock, each node colored by the completion-oracle: delivered
(valid) green, ran-but-not-delivered amber, failed red, with live token/cost counters.
Synthesizes the unspawned root driver so the whole recursion renders. No server/build/deps.

Wired into atom-humaneval (every driver run emits a replay.html). 4 offline recorder tests;
proven on a live HumanEval run (driver -> worker -> delivered).
… (a 502 is a failed attempt, not a crash — matches the driver arm's down-typing)
…(the intelligence, not the plumbing)

The supervisor's job is to DESIGN the agents it spawns — read the task, decompose it,
and author a tailored profile (instructions + model) per worker. supervisorSkill is the
how-to it reads (its own system prompt) — THE optimizable self-improvement surface;
authoredWorker builds a worker from an authored profile; asAuthoredProfile catches
empty/placeholder profiles (a skill violation).

Proven offline (no creds, no plumbing): a skill-guided supervisor authors DISTINCT,
tailored worker recipes per sub-task and they flow to the workers. 3 tests.
…ne for in-box driving

serveCoordinationMcp fronts a live Scope with an HTTP JSON-RPC MCP server: an in-box
coding harness (opencode via cli-bridge) mounts mcp.mcpServers.coordination and calls
spawn_worker as a native tool, landing on Scope.spawn — a real box driving real boxes,
not emulated function-tools. Real test: HTTP tools/call spawn_worker -> Scope.spawn ->
worker settles -> winner (no mock of the MCP path). Plus the standard supervise SKILL.md.
…tion MCP (live)

opencode (glm-5-turbo via cli-bridge) mounts mcp.mcpServers.coordination (type:http →
opencode remote) and calls spawn_worker itself → real Scope.spawn → worker settles, and
reads back the await_next result. The in-box driving path is REAL — a coding agent drives
recursion as a native tool, not emulated. (Bridge wants mcp type:'http', not 'remote'.)
…kers via the coordination MCP, real test gates delivery

Live, no mock: the opencode supervisor (glm-5-turbo via cli-bridge) mounts the coordination
MCP, authors worker profiles, calls spawn_worker -> real Scope.spawn -> real opencode workers
code in a cwd -> python3 test gates valid -> supervisor settles on the delivered worker ->
winner. The completion-oracle (deployable check, not LLM judge) decided delivery over the
supervisor's confusion that it couldn't see the workers' isolated cwds (→ shared Workspace next).

Proof artifact for the in-box-driving path; the law-compliant productionization is a substrate
backend (tmux/bridge/sandbox) that runs authored profiles — not this harness-specific script.
…ubstrate materializes it

§1.5 + decision-table rows + CLAUDE.md §0 pointer. The thing we keep forgetting: an agent IS
its full AgentProfile (prompt+skills+tools/mcp+subagents+hooks+permissions+model), not a prompt;
change behavior by AUTHORING the profile and letting the sandbox substrate materialize it into
harness shapes — never write a verify-loop or harness-specific config (self-verification is a
hook/process; opencode is only the cli-bridge test target; a missing lever is a substrate gap).
…umed design docs

Retired 14 design-research docs whose content is now shipped code, in .evolve/current.json, or
self-declared subsumed/retracted (the recursion atom shipped; the optimization-space layer
evidence landed; verdicts reached). Refreshed the research index, recorded the retirement +
rationale in deletion-ledger.md (Pass 2), and fixed every inbound link (top index, the
harvest-corpus.ts comment → current.json, optimization-space's suite links). Kept the SSOT
masterplan, the canonical-referenced maps (optimization-space/leapfrog), the two gated belief
specs, the postmortem guardrail, the build-lists, and the agent-lab tombstones. No broken links
into the 14 remain from any canonical doc or src/.
- 8 bench files (finsearch-loop/improve-prompt/rsi/run/run-benchmarks/research-loop/
  skills-sandbox/profile-coord-sandbox): kept DELETED — the createDriver-paradigm nuke
  (2101f2d); main only modified files this branch had already removed.
- src/runtime/strategy.ts: kept BOTH imports (withDriverExecutor + routerToolLoop — the
  body uses both, 4+3 refs).
- package.json: biome ^2.4.15 (branch) + agent-eval ^0.92.0 (main's newer substrate);
  lockfile reconciled via pnpm install.
- docs/architecture.md + roadmap-rsi.md: kept the branch's agent-driver framing + main's
  clarifications.

Verified in an isolated worktree: build + typecheck (core + examples) clean; 944 tests pass.
@drewstone drewstone marked this pull request as ready for review June 16, 2026 07:51

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — a6eab4ec

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T07:51:41Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 5 (2 low, 3 weak-concern)
Heuristic 0.2s
Duplication 0.0s
Interrogation 825.4s (2 bridge agents)
Total 825.6s

💰 Value — sound-with-nits

Adds the missing recursive-driver execution layer to the existing Scope/Supervisor substrate, plus an LLM-driven coordination driver, an HTTP MCP server, a deliverable gate, and authoring helpers; it is coherent and fills a real gap, though the authoring helper is thinner than the docs' full-profile

  • What it does: The change turns the existing Scope/Supervisor atom into a genuinely recursive one: a spawned child can itself be a driver that mounts a nested Scope and spawns more children, all sharing one conserved budget pool, journal, and blob store. Concretely it adds: (1) driver-executor.ts (driverChild, withDriverExecutor, driverExecutorFactory) which runs a driver child inside a nested Scope one
  • Goals it achieves: The change achieves three goals evident from the code: (1) make agents-driving-agents real — the recursive driver-executor closes the gap between the project's 'recursive atom' architecture and actual recursive execution; (2) let external coding harnesses act as supervisors by mounting the coordination verbs as an MCP server, not just in-process code; (3) enforce that 'done' means a deliverable ch
  • Assessment: This is a good change on its merits. It is built in the grain of the codebase: it extends the existing Scope/Supervisor/createCoordinationTools substrate rather than replacing it, preserves the conserved-budget/equal-k invariant by construction, adds a depth ceiling, and bubbles spend and delivery verdicts up the recursion tree. The new primitives are well-layered (tools → in-process driver
  • Better / existing approach: none — this is the right approach. I checked main directly and the recursive capability the PR adds is not present: main:src/runtime/personify/persona.ts:102 explicitly warns that a spawned child run as a driver 'drives a leaf', and main:src/runtime/supervise/scope.ts has no nestedScopeSeamKey/NestedScopeSeam (the PR adds these at src/runtime/supervise/scope.ts:117-160). main has `cr

🎯 Usefulness — sound-with-nits

A coherent, grain-following change: it lands a real recursive agent atom (Scope/Supervisor + LLM-driven coordination driver + MCP server), layers strategy/persona on top, removes verified dead code, and consolidates docs; only minor export and doc-link cleanup misses.

  • Integration: The new behavior is reachable and wired in. coordinationDriverAgent and the supervisor primitives are exported from src/runtime/index.ts (src/runtime/index.ts:296-303, :333-336) and consumed by tests (tests/loops/coordination-driver.test.ts, tests/loops/coordination-mcp.test.ts), benchmarks (bench/src/atom-humaneval.mts:173, bench/src/atom-mcp-e2e.mts:136), and higher-level runtime
  • Fit with existing patterns: It fits the codebase's architecture rather than competing with it. The new code sits on top of the existing Scope/Supervisor/driver-executor recursion primitives (src/runtime/supervise/supervisor.ts:64, driver-executor.ts:125) and reuses createCoordinationTools from src/mcp/tools/coordination.ts:91. It directly implements the workflow described in skills/supervise/SKILL.md (decompo
  • Real-world viability: The primitives are exercised beyond the happy path: tests cover budget conservation, abort cascade, recursion, join barriers, intensity breaker, MCP over HTTP, completion-oracle selection, and worker-profile authoring. The benchmark harnesses use real router models and real deterministic checks (HumanEval Docker checker, pytest file). Production deployment would still need to swap the in-memory bl

🔎 Heuristic Signals

🟡 Cruft: console debug added bench/src/atom-humaneval.mts

  • console.log(atom-humaneval: N=${N} K=${K} offset=${OFFSET} worker=${cfg.model} driver=${driverCfg.model})

🟡 Cruft: magic number added src/topology/replay.ts

+function setMs(ms){ ms=Math.max(0,Math.min(span,ms)); scrub.value=(ms/span*1000)|0; applyTo(ms); }

🎯 Usefulness Audit

🟡 serveCoordinationMcp is not re-exported from the runtime barrel [integration] ``

serveCoordinationMcp is called in tests and benchmarks (tests/loops/coordination-mcp.test.ts:64, bench/src/atom-mcp-e2e.mts:136, bench/src/mcp-mount-probe.mts:76) and is the "capable/primary" keystone path per docs/research/rsi-atom-masterplan.md:20, but a grep of src/runtime/index.ts finds no re-export. coordinationDriverAgent is re-exported at src/runtime/index.ts:298. Either add serveCoordinationMcp to the barrel or document that it is intentionally deep-import-only.

🟡 Stale docstring link to deleted research doc [integration] ``

bench/src/commit0-env.ts:2 still links to docs/research/long-horizon-benchmark-survey.md, which was retired in Pass 2 of the doc consolidation (docs/research/deletion-ledger.md:42). Update the comment to point to .evolve/current.json or docs/research/long-horizon-agent-map.md.

💰 Value Audit

🟡 AuthoredProfile is thinner than the AgentProfile law it advertises [against-grain] ``

The PR's docs establish that 'the supervisor's only intelligence is AUTHORING full profiles' and that an agent IS 'prompt + skills + tools/mcp + subagents + hooks' (docs/canonical-api.md:15-25). But AuthoredProfile only carries name, systemPrompt, and model (src/runtime/supervise/authoring.ts:23-29), and authoredWorker materializes it as a single router call with a bare { name } AgentProfile (src/runtime/supervise/authoring.ts:65-117). The supervisor skill tells the superviso


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260616T080709Z

@drewstone drewstone merged commit ce95d21 into main Jun 16, 2026
1 check failed
drewstone added a commit that referenced this pull request Jun 16, 2026
…#307)

* chore(release): 0.54.0 — expose the recursive agent atom + workspace seam

Publishes coordinationDriverAgent, serveCoordinationMcp, driverChild (the recursive driver
atom, #304) and runInWorkspace (the shared-workspace seam, #305), which merged into main
without a version bump and so were absent from the published 0.53.0. Additive (new exports on
the /runtime barrel) — no breaking changes, no fleet bump required.

Verified: dist/runtime.js exposes all four; tests 945 pass.

* fix(lint): clear 2 biome errors blocking the release (assignment-in-expr, unused import, import order)

---------

Co-authored-by: Drew Stone <hello@webb.tools>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants