feat(runtime): durable run loop — wire supervisor resume + journal the kernel loop#346
Conversation
…e kernel loop [WIP] Build-stage durability work captured as a draft for review/resume. See PR comment for spec, checklist, completion criteria, ranked alternatives, decisions, and resume steps.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 3e0c0f49
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-20T14:05:37Z
tangletools
left a comment
There was a problem hiding this comment.
🟠 Value Audit — better-approach-exists
| Verdict | better-approach-exists |
| Concerns | 2 (1 medium-concern, 1 low) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 417.1s (2 bridge agents) |
| Total | 417.1s |
💰 Value — better-approach-exists
Adds coherent crash-resume durability to runLoop and supervised runs, but LoopJournal duplicates the existing ConversationJournal adapter layer instead of reusing a shared substrate.
- What it does: Wires durable crash-resume into two runtime paths. (1) runLoop gains a LoopJournal interface (InMemoryLoopJournal + FileLoopJournal) that loads prior iterations on start, appends each committed round before planning the next, and records end on finalization; resumed runs skip committed iterations and ended runs short-circuit to the recorded result. (2) Supervisor.run now loads the spawn journal tr
- Goals it achieves: Make agent runs survive a worker or driver process crash/restart without redoing already-committed work; give real (non-test) supervised runs a durable-by-directory context; align the kernel loop with the existing ConversationJournal durability pattern and the supervisor with its own already-built replay primitives.
- Assessment: Good change that achieves its stated goal coherently. The commit boundary is well-placed (after workers drain and outputs fold), file appends fsync, ended runs are idempotent, and the tests in tests/loops/run-loop-durability.test.ts prove no-redo resume for both kernel and supervisor paths. The Supervisor wiring correctly reuses existing tested primitives rather than inventing new replay logic.
- Better / existing approach: LoopJournal's InMemoryLoopJournal and FileLoopJournal are near-identical structural clones of InMemoryConversationJournal and FileConversationJournal in src/conversation/journal.ts (compare journal.ts:58-194 with loop-journal.ts:58-193). Both use the same Map-with-defensive-copy, JSONL line-per-record, fsync-on-append, begin/append/end record shapes, isNoEntError helper, and startedAt-mismatch gua
- Model: kimi-code/kimi-for-coding
- Bridge attempts: 3
- Bridge warning: opencode/deepseek/deepseek-v4-pro: bridge stream ended without value-audit content; opencode/zai-coding-plan/glm-5.1: bridge stream ended without value-audit content
🎯 Usefulness — sound
Wires crash-durability into both the supervised run and kernel loop using the codebase's existing journal patterns — reload recovers committed work without redoing it.
- Integration: Reachable through two paths. (1) Supervisor: createFileRunContext(dir) or createInMemoryRunContext({dir}) → spread into SupervisorOpts → supervisor.ts:102-116 loads prior tree, rehydrates via replaySpawnTree/materializeTreeView, passes resumeFrom to scope → scope exposes scope.resume (scope.ts:433-438, types.ts:332) consumable by any Agent.act. The examples/supervisor-loop/loop.ts:208 caller alrea
- Fit with existing patterns: Mirrors the codebase's two existing journal patterns precisely: ConversationJournal (src/conversation/journal.ts — begin/append/load shape, in-memory+JSONL+SQL impls) and SpawnJournal (src/durable/spawn-journal.ts:132-250 — same shape, same fsync-per-append). The LoopJournal file adapter (loop-journal.ts:127-217) is a structural clone of FileSpawnJournal (same JSONL format, same isNoEntError guard
- Real-world viability: 339-line test (tests/loops/run-loop-durability.test.ts) proves end-to-end: file journal round-trips across instances (L160-197), in-memory resume skips committed rounds and only executes un-committed ones (L81-124, creates=2 not 3), ended run short-circuits to recorded result with zero boxes (L126-158), supervisor file-context resume rehydrates children from disk and does not re-execute (L274-338,
- Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
🔎 Heuristic Signals
🟡 Cruft: magic number added tests/loops/run-loop-durability.test.ts
budget: { maxIterations: 1, maxTokens: 1000 },
💰 Value Audit
🟠 LoopJournal adapters duplicate ConversationJournal adapters [duplication] ``
src/runtime/loop-journal.ts:58-193 mirrors src/conversation/journal.ts:58-194 almost line-for-line: Map storage, defensive copy, JSONL file adapter with fsync, record kind shapes, corruption guards, and isNoEntError. Better approach: extract a generic append-only journal substrate in this repo, or move directly to the shared durability package the PR says will replace this. This also closes the SQL-adapter gap: ConversationJournal has SqlConversationJournal (src/conversation/journal-sql.ts) w
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
…cs that can't lie, supervise() one-call (#347) * Revert "feat(runtime): durable run loop — wire supervisor resume + journal the kernel loop (#346)" This reverts commit edc1d54. * docs(simplification): master tracker — converged design, scratch list, full doc/module/example inventory + completion criteria * docs(simplification): red-team corrections — 4 verbs (run/improve/certify/refuse), steer-in-run, milestone-oracle gap, 8 skills to vendor * docs(simplification): improve is ONE verb with a PLUGGABLE CandidateGenerator (GEPA/skillOpt/autoresearch) + surface param — not 'one engine' * refactor(runtime): extract the canonical runToolLoop; routerToolLoop becomes a thin adapter (keystone 1/4) * refactor(runtime): unify the supervisor brain on the canonical ToolLoopChat seam (keystone) Delete DriverChat + routerDriverChat; the coordination-driver brain is now the canonical ToolLoopChat and its loop runs through runToolLoop (routerBrain = 4 lines, was 60). The equal-k driver-inference metering is preserved exactly. Three tool-loop copies collapse to one. * docs(simplification): keystone WS1 is two phases — 1a (seam unified) done, 1b (brain-from-profile/harness-as-data, sandbox supervisor) next * refactor(runtime): internalize leaked recursion/seam/journal/trace plumbing from the public barrel * refactor(runtime): internalize durable spawn-journal + spawn-tree types from the public barrel * refactor(api): collapse public export subpaths 13→6 (fold audit into profiles; drop unused duplicates) * docs: fix 3 stale/fabricated symbol references (DriverChat, runSteeringExperiment, refineGepa label) * docs: consolidate 26→19 + archive (shrink canonical-api 984→76, merge 4 architecture docs→1, merge PLAIN→README, archive 5 niche notes) * feat(docs-gate): CLASS 6 prose-symbol check — every backticked symbol in curated docs must resolve Scans canonical-api/concepts/architecture for backticked symbols outside code fences; reddens on any call-shaped or PascalCase symbol that resolves to no src/bench/substrate export or concept-whitelist entry. Walks every substrate dist/**/*.d.ts (not just index barrels). Closes the gap that let gepaDriver/refineGepa live in the docs unchecked. * chore(profiles): sort barrel exports after the audit fold (biome) * docs(simplification): mark WS1a/WS3/WS5 shipped * feat(runtime): supervisorAgent resolves the brain from profile.harness (WS1b) A supervisor is now an AgentProfile: harness null -> the in-process router tool-loop (coordinationDriverAgent; routerBrain becomes an internal detail), a coding-CLI harness (claude-code/opencode/codex) -> a sandboxed harness driving the coordination verbs via serveCoordinationMcp. Both arms share makeWorkerAgent + the keep-best-delivered oracle. Closes the critique's A2 (driver brain was router-only). Proven offline both arms. * docs(simplification): mark WS1b shipped (supervisorAgent — brain from profile.harness) * feat(runtime): supervise() one-call convenience + workerFromBackend supervise(profile, task, { backend|makeWorkerAgent, budget }) defaults blobs/perWorker/ journal/executors/maxDepth so 'just invoke the supervisor' is a one-liner. workerFromBackend derives the worker seam from a backend config + an optional completion oracle (settled⟺delivered). The raw seams (supervisorAgent + createSupervisor().run) stay for power use. * docs(simplification): table the supervisor/driver/worker multi-round design (round vs turn, prompt-policy retry, real-time trace self-correction) * docs(examples): canonical supervise() one-call example (the DX payoff — profile + goal, scaffolding defaulted) * refactor(mcp): rename spawn_worker → spawn_agent (the verb spawns ANY agent, incl. a sub-supervisor) The coordination verb always took a worker OR a driver profile and resolves a sub-supervisor via the role marker — the name lied. Renamed across the tool def, the LLM-facing descriptions, the scripted-brain tests, the examples, and the hand docs. WS4 (naming taxonomy). * refactor(mcp): rename observe_worker/steer_worker → observe_agent/steer_agent (consistent verb family) The coordination verbs operate on any spawned agent (a leaf worker OR a sub-supervisor), so the family is now spawn_agent / observe_agent / steer_agent. WS4 (naming taxonomy). * refactor(runtime): depthDriver/breadthDriver→depthStrategy/breadthStrategy, supervisorSkill→supervisorInstructions (WS4) They are strategy combinators and a prompt-instruction builder, not 'drivers'/'skills' — reserving 'Driver' for the agent-orchestration layer (coordinationDriverAgent/driverChild). * refactor(mcp): rename createDriveTurnResumeDriver → createDetachedTurnResumeDriver (WS4) * feat(improvement): improve() — the one pluggable RSI verb (facade over selfImprove; generator defaulted from surface) * test(improvement): offline improve() facade test (scripted generator, no creds) * refactor(examples): delete run-router.ts + loop.ts, rewrite sandbox/bridge runners onto supervise() run-router.ts duplicated examples/supervise/supervise.ts (router brain + router-tools backend). loop.ts's runSupervisorLoop/makeWorkerAgent duplicated supervise()/workerFromBackend. The sandbox + bridge runners now call supervise() with only their load-bearing per-backend seam; the shared demo task + scripted brain move to shared.ts. * refactor(examples): rewrite run-supervisor-mcp onto workerFromBackend() (single-sourced worker seam) Replaces the bespoke makeWorker (executor construction + per-worker file plumbing) with workerFromBackend(backend, deliverable); the deployable check now reads the worker's real output for ANSWER=42 (completion oracle, not a self-report). Keeps the cli-bridge harness supervisor arm that drives spawn_agent natively over the coordination MCP. * style(improvement,mcp): biome import ordering after WS4 rename + improve() exports * docs(examples): point READMEs at the pruned set + add an offline supervise() example test * docs(simplification): record WS4/WS2 closed decisions (AgentRunSpec deferred, improvementDriver/runLoop kept, improve surface boundary) * fix(scripts): align verify-package-exports with the 6-subpath surface (drop stale ./workflow) The 13→6 export collapse (e6ff2a2) removed the ./workflow subpath but left verify-package-exports.mjs asserting it (requiredExports + a runtime import), so the gate failed on a subpath the package intentionally no longer exposes. Verify the real subpaths (., ./agent, ./intelligence, ./loops, ./profiles, ./mcp) instead. * refactor: post-audit cleanups — rename internal runToolLoop→runBrainLoop, fix improve() default model - runToolLoop name collided with the public streaming runToolLoop; the internal brain-loop seam is now runBrainLoop (one grep = one concept). - improve()'s zero-config default reflection model was the dead anthropic/claude-sonnet-4.6 → deepseek-v4-flash (router-served). * docs: front-door supervise()/improve() — the #1 audit fix The two flagship verbs were invisible in every gated doc, so a reader was routed back onto the verbose legacy path the PR replaced. README now leads with the 3 entry points (chat turn / supervise / improve); canonical-api §2 makes supervise() the 'just run a supervisor' START-HERE row and routes self-improvement to improve(). * refactor: delete the standalone workflow-script engine (src/workflow, 2775 LOC + tests) A third orchestration substrate (a workflow-as-a-script DSL runner with its own checkpoints/budget/ delegates) that does NOT use the supervisor and is NOT self-improving — redundant with the Scope/Supervisor + supervise() path (the architecture's 'two substrates, do not invent a third'). Zero in-repo or fleet consumers; its ./workflow subpath was already dropped in WS3. * feat(mcp): spawn_agent accepts an optional per-spawn budget (supervisor can vary budget per worker) * feat(runtime): allowedModels guard on supervise()/improve() (fail-loud model-subset restriction) * docs(examples): strategy-evolution — the policy-search research journey (runStrategyEvolution + promotionGate) * feat(conversation): evalPersona() facade + examples/product-eval (user-sim product evals, one-call) * docs(examples): improve() — the RSI verb, offline scripted example * docs(examples): intelligence-recommend — connect traces→findings→improve() (the intelligence loop) * docs(examples): list the 4 new examples in the index README * docs(api): regenerate API reference for allowedModels guard + evalPersona facade * fix: address PR #347 review — restore contentAddress export, guard improve() JSON.parse, test flagged fns - HIGH: contentAddress was dropped from the runtime barrel by WS3 → bench/atom-humaneval + atom-mcp-e2e fail to compile (a content-addressing helper bench legitimately uses). Re-exported from the barrel. - MEDIUM: applyWinnerToProfile's JSON.parse threw a raw SyntaxError after a ship verdict on a malformed winner → parseWinnerJson guards it with a typed ConfigError + a test. - MEDIUM: finalizeBestDelivered + runBrainLoop had no direct tests → added focused unit tests (the blob store's content-address invariant is exercised). - LOW: supervise() decision-table/README rows implied backend is required (it's optional) → { budget, backend? }.
Durable run loop
Makes an agent run survive a worker crash or restart by wiring the run loop onto durable journals, reusing the runtime's existing resume primitives.
createSupervisor().run()loads an existing tree and rehydrates settled children (replaySpawnTree/materializeTreeView) instead of always starting fresh. These primitives were already built and tested; this wires them in.SpawnJournalinstead of the in-memory one, so a run's progress is durable without the caller opting in.Verification
A run killed mid-flight and reloaded resumes from the last committed step and does not redo committed work.
Relationship to the shared package
The kernel journal and tool-call repair are the agent-runtime consumer of the shared durability primitives (
@tangle-network/sdk-core/durability); once that package publishes, this re-points onto the shared contract so the runtime and the sandbox/cli-bridge paths share one recovery story.