feat(supervise): bidirectional coordination bus — down-leg + resume_worker#318
Conversation
… resume_worker
Close the bus to 100% bidirectional. The parent→child down-leg routes to the child
inbox (scope.send→deliver) AND records a queue:false event on the same bus: it lands
in history() + reaches subscribers for the audit trail, but is never pulled back by
the parent. New: resume_worker (continue a parked worker — the protocol had {resume}
but no verb); answer_question now routes the answer DOWN to the asking worker, unparking
it. EventBus gains PublishOptions.queue for record-only events.
down-leg + bidirectional history tests; full suite 1000 pass; typecheck/build/lint clean.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — b6eaf6de
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T20:36:55Z
tangletools
left a comment
There was a problem hiding this comment.
🟡 Value Audit — sound-with-nits
| Verdict | sound-with-nits |
| Concerns | 2 (2 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 301.5s (2 bridge agents) |
| Total | 301.5s |
💰 Value — sound
Adds the parent→child down-leg to the existing event bus — steer/answer/resume route to child inbox AND record on the bus as queue:false (observable, never pulled back) — completing the bidirectional comms picture with zero new primitives.
- What it does: Makes the supervise event bus bidirectional. Three MCP tools (steer_worker, answer_question, new resume_worker) now both deliver to the child inbox via scope.send→deliver AND publish a
queue:falseevent on the same bus. Down-leg events appear inhistory(), reachsubscribe/onEventobservers, and are counted instats(), but the parent never pulls its own outbound message back. The `Publish - Goals it achieves: Complete the bidirectional communication picture on ONE observability spine. Before this, the bus was child→parent only (UP: settled/question/finding); the down-leg existed as scope.send but was silent — no audit trail, no pass-through, invisible to observers. Now both directions are stamped, ordered, replayable in one
history(). Access patterns stay distinct by design: UP is fan-in pull-queue, - Assessment: Coherent and well-factored. Builds directly on the bus core from #316/#317 without reinventing anything. The
queue: falseoption on publish is the minimal, principled seam — it reuses the same typed union, same subscribe/history/stats machinery, same code paths.resume_workeris a genuine new verb (the protocol already had{resume}on the inbox,coordination.ts:389— it just wasn't exposed - Better / existing approach: none — this is the right approach. The only pre-existing 'down' mechanism was
scope.send(scope.ts), which was a silent one-way delivery. Adding observability by publishing to the existing bus (rather than building a second audit channel) is the simplest design. No codebase entity does 'record without queuing' —queue: falseis a clean 1-line extension topublish()atevent-bus.ts:107and - Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
🎯 Usefulness — sound-with-nits
The down-leg infrastructure is coherent, fits the ONE typed pipe pattern, and is wired into the primary call path — normal ahead-of-first-caller gaps (no executor implements deliver yet, resume_worker not in the operator prompt) do not gate shipping.
- Integration: Wired into the primary consumer:
coordinationDriverAgent(coordination-driver.ts:132–143) maps all tools includingsteer_worker/resume_workerinto the LLM's turn loop viabyName. The coordination MCP server (coordination-mcp.ts:68) also serves them over HTTP. The event bus recording path (sendDownat coordination.ts:206–208 →bus.publish(…, { queue: false })at event-bus.ts:107) correc - Fit with existing patterns: Fits the existing event bus pattern exactly — the down-leg rides the same
createEventBus<CoordinationEvent>()instance (coordination.ts:137),queue: falsecleanly separates record-only down-leg events from the pull queue, andsendDownmirrors the existingbus.publishcall pattern used by the up-leg. No competing pattern exists;scope.sendis the established child inbox mechanism (scope.t - Real-world viability: Will hold up under realistic use:
scope.sendcheckschild.deliveredsynchronously (scope.ts:357) — no async race on already-settled children. Thequeue: falseguard correctly keeps down-leg events out of the pull queue (event-bus.ts:107) soawait_eventnever returns them (test at coordination.test.ts:304). Theanswer_questionhandler correctly decides the question FIRST (coordination.ts: - Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
🎯 Usefulness Audit
🟡 resume_worker tool exists but is absent from the OPERATOR_TOOLS list and driver profile system prompt [integration] ``
The driver LLM won't be prompted to use
resume_workerbecause it's missing frombench/src/profiles.tsOPERATOR_TOOLS (line 18–26) and the driver profile's systemPrompt (lines 86–117) only mentionssteer_worker, notresume_worker. The tool IS in the MCP tool specs (coordination.ts:373–391) and the LLM could discover it via tool descriptions, but the prompt doesn't advertise it. Addresume_workerto the OPERATOR_TOOLS array and reference it in the driver system prompt.
🟡 No built-in executor implements deliver — the down-leg messages reach the inbox but no executor drains it [problem-fit] ``
All 11 built-in executor factories (routerInlineExecutor, routerToolsInlineExecutor, sandboxExecutor, cliExecutor, bridgeExecutor, cliWorktreeExecutor, driverExecutorFactory, createWorktreeCliExecutor, shotExecutor, analystExecutor, authoredWorker's inner) return objects without
deliver— verified in runtime.ts:173–226, 267–387, 403–467, 557–600, 693–776, 785–801; driver-executor.ts:125–182; strategy.ts:346–425, 427–457; authoring.ts:76–104. Thecompletion-gate.ts:57passthrough only passes
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
❌ Needs Work —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 54 | 83 | 54 |
| Confidence | 80 | 80 | 80 |
| Correctness | 54 | 83 | 54 |
| Security | 54 | 83 | 54 |
| Testing | 54 | 83 | 54 |
| Architecture | 54 | 83 | 54 |
Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision.
Blocking
🔴 HIGH answer_question hides delivery failure from caller — src/mcp/tools/coordination.ts
At line 479,
const delivered = opts.scope.send(question.from, { answer, questionId })is computed but never included in the return value at line 488 (return { question }). steer_worker (line 369) and resume_worker (line 390) both return{ delivered }. Callers of ans
Other
🟠 MEDIUM No test for successful answer delivery to a live worker — tests/loops/coordination.test.ts
The test at line 159-200 (
blocks stop under failClosed...) usesfrom: 'driver-1'which does not match any worker in the mock scope (sendonly returns true for'w0'). This meansdelivered: falseis the only path tested for answer_question delivery. The happy path where a question originates from'w0'andscope.sendreturnstrueis untested, leaving a coverage gap for the new answer-routing behavior.
🟡 LOW single monolithic paragraph reduces scanability — CLAUDE.md
Line 65 is an extremely long paragraph (one bullet spanning ~1.5KB). The bidirectional bus addition makes it longer still. This is pre-existing style and matches the rest of the code map section, but breaking the agent-driver bullet into sub-bullets (one for the bus, one for observability, one for analysts, one for core types) would improve readability without changing any substance. Not blocking.
🟡 LOW answer_question now attempts down-delivery — behavior change, currently a no-op — src/mcp/tools/coordination.ts
Lines 478-487: answer_question previously only recorded the decision locally (decideQuestion). It now also calls
opts.scope.send(question.from, { answer, questionId })and publishes ananswerdown-leg event. This is a behavior change for any consumer whose question.from resolves to a live child with a deliver callback. Today this is a no-op (no executor implements deliver), but when one does, the answer will be routed to the worker inbox with shape{ answer, questionId }— a shape no executor handler currently recognizes (strategy.ts:387 only handlest.steer). The comment at [line 478](https://github.com/tangle-network/agent-runtime/blob/b6eaf6de2
🟡 LOW decideQuestion mutates question ledger before delivery attempt — src/mcp/tools/coordination.ts
At line 473,
decideQuestionmarks the question as 'answered' by mutating thequestionsarray beforescope.sendis called at line 479. Ifbus.publishsubsequently throws (e.g., a subscriber error), the handler throws but the question remains mutated. Pre-existing pattern (drainSettlement at line 192 has the same recordSettled-before-publish ordering), but more visible now that the answer path d
🟡 LOW delivered flag is always false in production — no executor implements deliver — src/mcp/tools/coordination.ts
Lines 367, 388, 479: scope.send() captures
deliveredfrom scope.ts:357 which returns false when!child.deliver. Exhaustive grep of src/ confirms NO shipped executor (router, inline, sandbox, cli, driver, worktree-cli) implements the optionaldeliver?(msg)on the Executor interface (types.ts:94). Only completion-gate.ts:57 passes it through, and only if the inner executor defines it — which none do. Sodelivered: truein the steer/resume tests (coordination.test.ts:294-295) is mock-only behavior that cannot occur in production. The PR's code is correct (it faithfully mirrors scope.send's return), but the observability story is misleading: the down-leg a
🟡 LOW No direct unit test for queue:false at the event-bus primitive level — src/runtime/supervise/event-bus.ts
The new queue:false option is the core behavior change but event-bus.test.ts (96 lines, 6 tests) has zero coverage for it. Coverage comes only from coordination.test.ts's integration test ('steer_worker and resume_worker route down + record in history but are never pulled back', line 275) which asserts await_event returns {idle:true} — proving the queue stays empty — but does not directly assert pending()===0, that pull() returns undefined, or that history()/subscribe still see the record. A regression that broke queue:false but left scope.send intact would only be caught at the integration layer. Fix: add a focused test to event-bus.test.ts asserting tha
🟡 LOW resume_worker delivered:false path not explicitly tested — tests/loops/coordination.test.ts
The new test (lines 275-305) only covers the delivered:true path for resume_worker. steer_worker has an explicit failure-case test at line 118 (workerId: 'gone' → delivered: false), but resume_worker does not. The logic is symmetric (both call scope.send and return {delivered}), so the risk is low, but a one-line assertion mirroring the steer_worker 'gone' test would close the gap.
tangletools · 2026-06-16T20:48:46Z · trace
tangletools
left a comment
There was a problem hiding this comment.
❌ 1 Blocking Finding — b6eaf6de
Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision.
Full immutable report for this review: trace
Summary comment for this run: full summary
tangletools · 2026-06-16T20:48:46Z · immutable trace
…iew gaps Address PR #318 review: - BLOCKING: answer_question computed `delivered` but returned only { question } — now returns { question, delivered }, consistent with steer_worker/resume_worker (no longer hides whether the answer reached a live worker). - tests: answer routed down to a LIVE worker (delivered:true happy path); resume_worker delivered:false path; a focused event-bus queue:false unit test (history+subscribers see it, pull queue never does). - resume_worker added to OPERATOR_TOOLS + the driver system prompt so the driver is actually prompted to use it.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 8b509e56
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T21:41:53Z
|
Addressed the review in 🔴 Blocking — 🟠 No happy-path answer-delivery test → added: a question from a live worker ( 🟡 🟡 No 🟡 No executor implements Full suite 1003 pass; typecheck/build/lint clean. |
tangletools
left a comment
There was a problem hiding this comment.
🟡 Value Audit — sound-with-nits
| Verdict | sound-with-nits |
| Concerns | 1 (1 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 222.6s (2 bridge agents) |
| Total | 222.6s |
💰 Value — sound
Adds a parent→child down-leg to the coordination event bus via a queue:false publish option — making the bus truly bidirectional on one observable spine with minimal surface area.
- What it does: Adds the parent→child direction (down-leg) to the existing child→parent coordination event bus. Three changes: (1)
PublishOptions.queue(default true) — set false for record-only events that hit history/subscribers but skip the pull queue, preventing the parent from pulling its own outbound; (2)steer_workerand newresume_workerMCP tools now route messages to the child inbox AND record a ` - Goals it achieves: True bidirectional communication on a single observable spine: both directions are stamped/ordered/replayable in one
history(). Theresume_workertool fills a gap where the executor inbox protocol already accepted{resume}but no coordination MCP surface exposed it.answer_questionnow routes answers back to workers, closing the loop — previously answers were recorded locally but the block - Assessment: The design is coherent and proportionate. The
queue:falseflag onPublishOptionsis the minimal change needed to make down-leg events observable without being pullable — a one-line guard inevent-bus.ts:107. The new event types (steer,answer,resume) extend the existingCoordinationEventunion additively in the codebase's established pattern (coordination.ts:85-91). The tool imple - Better / existing approach: none — this is the right approach. Searched for existing bidirectional bus mechanisms, answer routing, or resume tooling across
src/runtime/supervise/,src/mcp/tools/, andsrc/runtime/supervise/types.ts. Thescope.sendprimitive already existed but was unobservable (no record of what was sent). The{ kind: 'resume' }protocol existed inRootSignal(types.ts:452) but had no coordinati - Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
🎯 Usefulness — sound-with-nits
Bidirectional bus is a clean architectural extension — one typed pipe for both directions with distinct access patterns — and the down-leg observability + answer-routing are genuinely useful improvements; the only nit is that no built-in executor implements deliver yet, so the delivery verbs retur
- Integration: All three new down-leg paths are reachable.
coordinationDriverAgent.act()(coordination-driver.ts:132) passes every tool fromcreateCoordinationToolsto the driver LLM, andserveCoordinationMcp(coordination-mcp.ts:58) does the same for sandbox MCP.resume_workeris listed inbench/src/profiles.ts:25OPERATOR_TOOLS andbench/src/profiles.ts:99driver prompt.answer_questionis in the - Fit with existing patterns: Fits the existing one-typed-pipe architecture precisely.
CoordinationEventadds three additive discriminated-union members — no existing consumer breaks.queue: falseonEventBusis a minimal generic seam (event-bus.ts:107) that doesn't distort the pull queue or subscription model.projectEventhandles down-leg kinds defensively (coordination.ts:227-229). No competing pattern exists — this - Real-world viability: Bus recording (
queue: false) works correctly: events enter history + hit subscribers but never enter the pull queue (verified by event-bus.test.ts:82-102).answer_questionrouting the answer down is a structural improvement over the prior state where answers were only decided, not delivered. The real-world gap: no built-in executor (router/inline/sandbox/cli/bridge/driver) implements the optio - Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
🎯 Usefulness Audit
🟡 No built-in executor implements deliver — delivery verbs return false with current runtimes [ergonomics] ``
scope.send()returnsfalsewhen the child executor lacks adelivermethod (scope.ts:357). All five built-in executor factories (runtime.ts:173,267,403,557,693 and driver-executor.ts:137-181) omitdeliver. This meanssteer_worker(already pre-existing),resume_worker(new), and the answer-routing leg ofanswer_question(new) all returndelivered: falsein production with built-in executors. The protocol is correctly plumbed and will activate when a streaming/BYO executor adds `del
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 73 | 77 | 73 |
| Confidence | 85 | 85 | 85 |
| Correctness | 73 | 77 | 73 |
| Security | 73 | 77 | 73 |
| Testing | 73 | 77 | 73 |
| Architecture | 73 | 77 | 73 |
Full multi-shot audit completed 5/5 planned shots over 6 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 6 changed files. Global verifier still owns final merge decision.
🟠 MEDIUM answer_question return type branches are inconsistent — src/mcp/tools/coordination.ts
The answer path (lines 471-490) returns
{ question, delivered }withdelivered: boolean. The defer path (lines 492-498) and escalate path (lines 500-511) return{ question }withoutdelivered. TypeScript unifies these as{ question: QuestionRecord; delivered?: boolean }which is safe, but callers that destructuredeliveredwithout checking will getundefinedfor defer/esca
🟡 LOW DownMessageEvent.instruction naming is loose for resume — bench/src/profiles.ts
The resume_worker tool at coordination.ts:388 sends { resume: message } via scope.send but records the down-leg using DownMessageEvent.instruction to store the message (coordination.ts:389:
sendDown('resume', { toWorker: workerId, instruction: message, delivered })). For steer_worker 'instruction' is the right term; for resume_worker the semantic is 'message'. This is cosmetic only — the field name is an internal shape and no caller depends on the distinction. The system-prompt wording at profiles.ts:99 already correctly uses 'message'. No behavior change needed.
🟡 LOW resume_worker lacks a dedicated prompt bullet; steer-vs-resume distinction not surfaced to the agent — bench/src/profiles.ts
Line 99 lists resume_worker only in the compressed one-liner: 'spawn_worker(...) / steer_worker(worker, instruction) / resume_worker(worker, message) / stop.' There is no dedicated bullet (like observe_worker at L97-98 or define_analyst at L95-96) telling the driver WHEN to use resume_worker (parked/idle worker) versus steer_worker (running worker). The code comment at L24-25 carries the distinction ('send a running worker...' vs 'continue a parked/idle worker...') but the agent never sees comments — it sees only systemPrompt. A driver that can't distinguish the two may call steer_worker on a parked worker (no live inbox → delivered:false) and waste a turn. Suggest add
🟡 LOW DownMessageEvent.toWorker field name is wrong when routing answers to non-worker askers — src/mcp/tools/coordination.ts
Lines 76-80 + 479-485: answer_question routes the answer to question.from via scope.send(question.from, ...). ask_parent accepts arbitrary from strings (line 532: str(a.from,'from')) — tests pass from:'driver-1' and from:'user' is the default-by fallback. The DownMessageEvent.toWorker field is therefore semantically incorrect for driver/user askers: it is labeled toWorker but carries a driver or 'user' id. No functional bug (delivered:false is surfaced correctly for non-node ids), but the field name lies about its contents. Fix
🟡 LOW No idempotency guard against double-answering a question — src/mcp/tools/coordination.ts
decideQuestion(line 290) does not check if the question is already answered. Callinganswer_questiontwice on the same questionId causes: (1)decideQuestionoverwrites the decision silently, (2)scope.senddelivers a duplicate answer to the worker inbox, (3) the bus records a secondanswerevent. Pre-existing indecideQuestion(not a regression), but the new routing+recording amplifies the effect — the worker now receives double-delivery instead of just an overwritten ledger record. Fix: checkprior.status !== 'open'and either no-op or throw.
🟡 LOW history() docstring stale — doesn't mention down-leg event types — src/mcp/tools/coordination.ts
The
history()method at CoordinationTools line 117-118 has a jsdoc:(settled / question / finding). Since this PR, the bus also carriessteer,answer, andresumeevents (though never queued for pull, they appear in history). The docstring should reflect the full event surface.
🟡 LOW projectEvent defensive branch for down-leg types is unreachable — src/mcp/tools/coordination.ts
Lines 227-229: the fallback
return { type: ev.type, ...ev.down }handles steer/answer/resume in projectEvent. But down-leg events are published with queue:false (lines 207, 486), so they never enter the pull queue, and projectEvent is only ever called on pulled events (await_event handler line 443). The branch is dead under current usage. Defensible as future-proofing, but add a test or a comme
🟡 LOW resume_worker description overpromises vs. scope.send semantics — src/mcp/tools/coordination.ts
Line 374-375: description says 'Resume a parked/idle worker with a follow-up message, continuing its session'. But scope.send (scope.ts:357) returns false for any child where child.delivered===true (already drained by next()). A worker that settled and was observed via await_next/await_event CANNOT be resumed — it is gone from the live set. 'Parked/idle' implies a pause/resume lifecycle that the scope model does not have: a child is either live (running, inbox-reachable) or delivered (drained, gone). The behavior is safe (delivered:false), but the description misleads callers into expecting they can resume a settled worker. Fix: clarify description to 'De
🟡 LOW sendDown helper does not cover the 'answer' down-leg (inline duplication) — src/mcp/tools/coordination.ts
Lines 206-208 define sendDown(type:'steer'|'resume', down) which wraps bus.publish({type,down},{queue:false}). The 'answer' down-leg at lines 480-487 duplicates the same queue:false publish inline because the answer event also carries questionId. Two code paths for the same invariant (down-leg = record-only). Not a bug, but a maintenance hazard: if the queue:false contract changes, the answer path can drift. Fix: generalize sendDown to accept the full CoordinationEvent down-leg variant, or extract a publishDown(event, op
🟡 LOW sendDown helper excludes 'answer' type, creating minor pattern drift — src/mcp/tools/coordination.ts
sendDown(line 206) accepts'steer' | 'resume'but not'answer'. Theanswer_questionhandler publishes its answer event directly viabus.publish(line 479) because the answer event carries an extraquestionIdfield thatDownMessageEventdoesn't have. This is deliberate but creates a pattern inconsistency: three down-leg publish sites, one via helper, one raw. If more down-leg types are added, consider wideningsendDownor adding a generic down-leg publisher.
🟡 LOW stats().published invariant change is undocumented — src/runtime/supervise/event-bus.ts
Before this change,
published === pulled + pendingalways held (every published event entered the queue). Now record-only events (queue:false) incrementpublished(seq) but never enter the queue, sopublished > pulled + pendingis possible. This is correct by design — the test at line 96-100 validatespublished: 2withpending: 1. However, no doc/comment notes this invariant shift. A future consumer that derives expected queue depth frompublished - pulledwould be wrong. Suggest a one-line comment in the stats() JSDoc or PublishOptions doc noting thatpublishedincludes record-only events not in the queue.
tangletools · 2026-06-16T21:52:23Z · trace
Make the down-leg actually move a live worker (was observable-only). New createInbox (supervise/inbox.ts) is the receive end an executor exposes as Executor.deliver; the owned tool-loop (routerToolsInlineExecutor) drains it two ways: - QUEUED (default): flush at each step boundary AND before the worker may settle — it can't finish while a steer/answer it never read is pending. - FORCEFUL (steer_worker interrupt:true): aborts the in-flight turn so the worker re-plans immediately, breaking it off a wrong path mid-task. Black-box CLI harnesses can't be interrupted mid-step → down-leg degrades to next spawn. inbox 4 + executor-drains-inbox integration test (flush-before-settle proven end to end through the real executor); full suite 1008 pass; typecheck/build/lint clean.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — cea04b56
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T22:45:26Z
Down-leg is now FUNCTIONAL, not just observable (
|
tangletools
left a comment
There was a problem hiding this comment.
🟡 Value Audit — sound-with-nits
| Verdict | sound-with-nits |
| Concerns | 2 (1 medium-concern, 1 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 92.9s (2 bridge agents) |
| Total | 92.9s |
💰 Value — sound
Makes the supervise coordination bus bidirectional: adds a per-worker inbox that receives parent-to-child steer/answer/resume messages (down-leg), records them on the same event bus for unified observability, and drains them into the worker's conversation loop with queued-or-forceful delivery. Fills
- What it does: Makes the supervise coordination bus truly bidirectional. Adds (1)
createInbox()— a reusable per-worker inbox (src/runtime/supervise/inbox.ts) that accepts steer/answer/resume messages from the parent viaScope.send, queues them, and drains them into the worker's conversation at step boundaries (queued flush) or immediately aborts the in-flight turn (forceful interrupt). (2) Wires it into ` - Goals it achieves: True bidirectional comms over ONE observability spine. Both up-leg (child→parent questions/settlements/findings) and down-leg (parent→child steer/answer/resume) are stamped, ordered, replayable, and visible in a single
history()trace. Access patterns stay distinct by design — UP is a fan-in pull-queue the driver drains; DOWN is addressed routing to a specific child's inbox. Before this PR, `ste - Assessment: Sound. The
Executor.deliver?interface (src/runtime/supervise/types.ts:94),Scope.sendrouting (src/runtime/supervise/scope.ts:353-360), andcompletion-gate.tsforwarding (completion-gate.ts:57) were already wired up and waiting for an implementation. This PR fills that gap without restructuring anything. TheInboxabstraction is clean, minimal (85 lines), and reusable — other execut - Better / existing approach: none — this is the right approach. Searched for existing
deliverimplementations across all executors (runtime.ts, sandbox executor, worktree-cli-executor, driver-executor), forinboxor down-leg patterns, and forresumehandling. Found: theExecutor.deliver?interface was defined (types.ts:94),Scope.sendwas routing to it (scope.ts:353-360), andcompletion-gate.tswas forwarding - Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
🎯 Usefulness — sound-with-nits
Bidirectional coordination bus with down-leg routing is well-designed, correctly wired end-to-end through coordination tools → Scope.send → inbox → executor loop, with only the sandbox executor not yet adopting deliver — progressive, not broken.
- Integration: Fully reachable. steer_worker/resume_worker are MCP tools exposed to the driver LLM via coordination.ts:352-398, routed down through Scope.send (scope.ts:353-360) → executor.deliver → inbox.deliver. The answer_question down-leg (coordination.ts:475-497) similarly delivers answers to asking workers. The bench profiles (bench/src/profiles.ts:24-25) document both verbs. The inbox is wired in routerTo
- Fit with existing patterns: Fits the codebase's grain precisely. The EventBus already had publish/subscribe/pull — adding
queue?: booleanto PublishOptions (event-bus.ts:40-46) is a minimal, coherent extension, not a new abstraction. The Inbox module (inbox.ts) follows the same factory pattern as createEventBus. The Executor.deliver? method existed in the interface since types.ts:94 — this PR is its first real implementati - Real-world viability: Holds up. deliver() never throws — malformed messages are silently ignored (inbox.ts:40-53). Interrupt vs external abort is correctly discriminated (runtime.ts:337-341: if interruptSig.aborted && !signal.aborted && !controller.signal.aborted → continue, else rethrow). Flush-before-settle prevents workers finishing with unread messages (runtime.ts:357-360). maxTurns=200 bounds any looping. Scope.se
- Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
🎯 Usefulness Audit
🟠 Only routerToolsInlineExecutor implements deliver — sandbox workers cannot be steered mid-flight [integration] ``
The entire down-leg infrastructure (tools → scope.send → executor.deliver → inbox → loop) is correct and wired for routerToolsInlineExecutor (runtime.ts:297). However sandboxExecutor (runtime.ts:431-493) — the primary production executor that runs agents in actual boxes — does not implement deliver. A driver calling steer_worker/resume_worker on a sandbox worker always gets delivered:false (scope.ts:357: returns false when child.deliver is absent). This is by design (progressive adoption) and th
🟡 AbortSignal.any() requires Node.js ≥20.1.0 — verify target runtime [robustness] ``
runtime.ts:319 uses AbortSignal.any([signal, controller.signal, interruptSig]) which landed in Node.js 20.1.0 (and is still under consideration for broader adoption). If the project targets earlier Node.js or environments without this API, this will fail at runtime. Check the project's engines field and deployment targets; if below 20.1, polyfill or use a manual signal-linking helper like the existing linkSignals() at runtime.ts:980-991.
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
…sendDown covers answer PR #318 audit follow-ups (non-blocking): - resume_worker description no longer implies a park/resume lifecycle the scope model lacks — a settled (drained) worker is gone; says so and points to spawning fresh. - sendDown now covers the 'answer' down-leg too (removes the inline bus.publish duplication; one helper for all three down kinds). - history() docstring lists the down-leg event kinds. full suite 1008 pass; typecheck/lint clean.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 83da8dfa
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T22:51:04Z
tangletools
left a comment
There was a problem hiding this comment.
🟡 Value Audit — sound-with-nits
| Verdict | sound-with-nits |
| Concerns | 2 (2 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 268.8s (2 bridge agents) |
| Total | 268.8s |
💰 Value — sound
Makes the coordination bus bidirectional by recording parent→child messages (steer/answer/resume) on the same EventBus as record-only events, adds a per-worker Inbox for receive-side delivery with queued+forceful modes, and wires the router-tools executor to drain it — a coherent extension of existi
- What it does: Extends the child→parent EventBus (#316/#317) to carry parent→child messages too, making it bidirectional. Three down-leg event types (steer/answer/resume) are published with
queue:falseso they appear inhistory()and reachsubscribeobservers but never enter the pull queue the parent reads from. A newInboxprimitive on the worker side (src/runtime/supervise/inbox.ts:56) parses incoming - Goals it achieves: 1. Single observability spine: both directions of parent↔child communication are stamped, ordered, and replayable in one
history()— no separate audit trail needed. 2. Parent→child messages are observable but never pollute the parent's pull queue (the parent must never pull its own outbound message back). 3. Workers can be steered mid-flight with two modes: queued (folded at next step boundary) - Assessment: Well-constructed change that extends existing primitives rather than reinventing them. The
Executor.delivermethod andScope.sendpath already existed (types.ts:94, scope.ts:353-360); the PR is wiring observability onto them and adding the worker-side receive abstraction. Thequeue:falseflag on EventBus is a 2-line extension that cleanly separates record-only from queueable events. The `Inb - Better / existing approach: none — this is the right approach. The codebase already had the
Executor.delivercontract (types.ts:94), theScope.sendrouting (scope.ts:353-360), and the EventBus with publish/subscribe/history (event-bus.ts). The PR adds the missing pieces: the worker-side inbox primitive, thequeue:falsepublish option, and the down-leg event recording. No existing mechanism duplicates or should substitu - Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
🎯 Usefulness — sound-with-nits
The down-leg is coherently wired — it rides the existing event bus with a clean queue:false seam, the inbox is well-integrated into the only multi-turn executor, and the graceful degradation (Scope.send returns false for non-deliver executors) is by design.
- Integration: Fully wired: steer_worker/resume_worker/answer_question all call scope.send → LiveChild.deliver (set at spawn, src/runtime/supervise/scope.ts:262) → Inbox.deliver (src/runtime/supervise/inbox.ts:60). The routerToolsInlineExecutor exposes deliver at src/runtime/supervise/runtime.ts:297. One-shot executors (routerInline, bridge) and sandbox/CLI executors deliberately omit deliver — Scope.send return
- Fit with existing patterns: The down-leg fits naturally: it uses the SAME EventBus as the up-leg (settled/question/finding), with a single PublishOptions.queue:false option (src/runtime/supervise/event-bus.ts:107) that keeps down-leg events in history() and subscriber streams but out of the parent's pull queue. This gives one observability spine for both directions — no competing pattern. The inbox folds down-messages into t
- Real-world viability: Handles edge cases well: malformed messages silently ignored (inbox.ts:41), settled/unknown workers return delivered:false (scope.ts:357), interrupt signals are per-turn with correct re-throw logic for external aborts (runtime.ts:337-341), settle-before-reading guard prevents a worker finishing with unread steers (runtime.ts:357-361). The worker inbox is single-producer single-consumer — no concur
- Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
💰 Value Audit
🟡 Only routerToolsInlineExecutor implements deliver — sandbox/CLI/bridge workers cannot receive steers/resumes mid-flight [against-grain] ``
The
Inboxis wired only intorouterToolsInlineExecutor(runtime.ts:297). The sandbox executor (runtime.ts:431), bridge executor (runtime.ts:721), CLI executor (runtime.ts:585), and cli-worktree executor (runtime.ts:813) do not implementdeliver. The architecture already handles this correctly —Scope.sendreturnsfalsefor children withoutdeliver(scope.ts:357) — and theExecutor.deliveris documented as optional (types.ts:88-94). This is likely deliberate: sandbox workers run in-
🎯 Usefulness Audit
🟡 RouterToolsInlineExecutor is the only built-in with deliver; steering degrades silently for all other built-in executor backends [ergonomics] ``
The default executor registry (src/runtime/supervise/runtime.ts:886) registers router, inline, sandbox, cli — none have deliver. The routerToolsInlineExecutor (the only multi-turn built-in) isn't in the default registry; it's only reachable via createExecutor({ backend: 'router-tools', ...}) as a BYO executor. Most bench workers use one-shot gateOnDeliverable wrappers (bench/atom-humaneval.mts:136, authoring.ts:105) which also omit deliver. This is by design, not a bug, but a consumer wiring up
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
❌ Needs Work —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 35 | 58 | 35 |
| Confidence | 85 | 85 | 85 |
| Correctness | 35 | 58 | 35 |
| Security | 35 | 58 | 35 |
| Testing | 35 | 58 | 35 |
| Architecture | 35 | 58 | 35 |
Full multi-shot audit completed 5/5 planned shots over 10 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 10 changed files. Global verifier still owns final merge decision.
Blocking
🔴 HIGH No MCP-tool-level test for steer_worker interrupt:true — src/mcp/tools/coordination.ts
steer_worker handler (line 377-384) now reads
a.interrupt === true(line 381) and passes it to scope.send. The inbox and executor layers have interrupt:true tests (tests/loops/inbox.test.ts:33-40), but NO test calls the MCP tool with{ workerId: 'w0', instruction: 'x', interrupt: true }. The existing test at tests/loops/coordination.test.ts:115 only exercises the defaultinterrupt: falsepath. Without a test, an MCP input-schema parsing error on the interrupt boolean or a type coercion bug in the handler could go undete
Other
🟠 MEDIUM Asymmetric return shape in answer_question handler — src/mcp/tools/coordination.ts
The answer_question handler returns
{ question, delivered }when the answer branch is taken (line 504) but returns only{ question }for the defer branch (lines 507-511) and escalate branch (lines 519-524). The test at tests/loops/coordination.test.ts:331 confirms the answer path returns delivered:true, confirming this asymmetry is intentional. However, callers parsing the result must ei
🟠 MEDIUM Stale comment in onEvent subscriber setup — src/mcp/tools/coordination.ts
Lines 135-137 say "The one child→parent pipe" and enumerate "question, settled, finding" — the comment describes only UP events. After this PR, the bus carries ALL event types (steer/answer/resume down-leg events too), and
onEventsubscribers (line 141) receive every event including down-leg types. The code is correct — the bus.subscribe fires for all publishes — but the comment misrepresents the pipe now. Also: coordination-mcp.ts:55 JSDoc says "(settled / question / finding)" with the same stale enumeration.
🟠 MEDIUM maxTurns exhaustion drops pending inbox messages — violates the 'flush before settle' invariant — src/runtime/supervise/runtime.ts
The for-loop (lines 313-394) flushes pending messages only at the top of each iteration (line 316) and in the toolCalls==0 settle branch (line 360). There is NO flush after the loop exits via the
t < maxTurnscondition. Two concrete loss paths: (1) A forceful interrupt on the LAST iteration: deliver() pushes the msg topendingthen aborts (inbox.ts:63-65); the catch at runtime.ts:340
🟡 LOW Driver prompt tool-summary line omits observe_worker and analyst verbs — bench/src/profiles.ts
Line 99's compact summary is 'spawn_worker(profile, task) / steer_worker(worker, instruction) / resume_worker(worker, message) / stop.' — it lists spawn/steer/resume/stop but not observe_worker, list_analysts, run_analyst, or define_analyst (those are detailed on lines 93-98). This asymmetry is pre-existing (the prior line also omitted observe_worker) and intentional as a summary, but a driver reading only line 99 could und
🟡 LOW System prompt EACH CYCLE omits resume_worker guidance — bench/src/profiles.ts
Line 113: 'Otherwise STEER the worker with a specific, verifiable instruction.' The TOOLS section (line 99) documents both steer_worker and resume_worker, but the EACH CYCLE workflow only tells the driver to STEER. A driver may never use resume_worker on parked/idle workers since the step-by-step instructions only route through steer. Impact: underutilization of the new verb until the RSI optimizer (gepaDriver) tunes the prompt. Fix: add a parallel clause — e.g. 'Otherwise STEER or RESUME the worker with a specific, verifiable instruction' — or ad
🟡 LOW Throwing onEvent subscriber causes down-leg tool to error after worker already received the message — src/mcp/tools/coordination.ts
In steer_worker (line 382-383), resume_worker (line 405-406), and answer_question (line 496-501),
opts.scope.senddelivers to the worker BEFOREawait sendDown(...). If a subscriber throws insidebus.publish(event-bus.ts:112 — no try/catch around subscriber loop), the tool handler rejects after the worker already received the message. The MCP client sees an error and may retry, ca
🟡 LOW answer_question return shape is asymmetric across branches — src/mcp/tools/coordination.ts
The answer branch (line 504) returns
{ question, delivered }, but the defer branch (line 507) and escalate branch (line 519) return only{ question }. An MCP client consuming the response cannot rely ondeliveredbeing present without branching on the input. This is intentional (only answers route down), but the schema doesn't document the conditional field. Impact: minor — a client checking `re
🟡 LOW resume_worker does not expose interrupt despite inbox support — src/mcp/tools/coordination.ts
steer_worker exposes an
interruptboolean (lines 368-373), but resume_worker does not (lines 393-399). The inbox parseDown function (src/runtime/supervise/inbox.ts:40-53) supportsinterrupton ALL three message kinds (steer, answer, resume). This may be intentional — resumes are inherently non-forceful — but the inconsistency is undocumented. If a caller ever needs to force-resume a parked worker with an interrupt, resume_worker cannot deliver that.
🟡 LOW steer_worker interrupt:true path untested at coordination layer — src/mcp/tools/coordination.ts
The new
interruptflag on steer_worker (line 381-382) is verified only for the defaultfalsecase at coordination.test.ts:117 ({ steer: 'do X next', interrupt: false }). No coordination-level test passesinterrupt: trueand asserts it reachesscope.sendas{ steer, interrupt: true }. The inbox-level interrupt signal IS tested (inbox.test.ts:33-41), but the one-line forwarding through the MCP tool handler is unverified. A regression that drops the flag (e.g., hardcodinginterrupt: false) would pass all current tests. Fix: add a test case callingsteer_workerwith{ workerId: 'w0', instruction: 'stop', interrupt: true }and assert `sent[0]
🟡 LOW AbortSignal.any not available on Node 20.0-20.2 — src/runtime/supervise/runtime.ts
Line 319:
AbortSignal.any([signal, controller.signal, interruptSig])was added to Node.js in 20.3.0 (June 2023). The project'senginesfield saysnode >= 20(package.json:128), which includes 20.0-20.2 where this method would throw TypeError. Most users are on 20.x LTS (20.3+) or 22+, but the engine range technically doesn't guarantee the API. Consider tighteningengines.nodeto >=20.3.0 or adding a runtime guard.
🟡 LOW AbortSignal.any() newly introduced; raises effective Node floor above declared engines — src/runtime/supervise/runtime.ts
Line 319 uses
AbortSignal.any([signal, controller.signal, interruptSig]), replacing the priorlinkSignals(signal, controller.signal)(still used by routerInlineExecutor:200 and bridgeExecutor:742).AbortSignal.anyrequires Node >=20.3 (stable) while package.json declaresengines.node: >=20. The rest of the file uses the listener-basedlinkSignalshelper for exactly this portability reason. Either bump engines to>=20.3or compose linkSignals with the interrupt signal for consistency.
🟡 LOW Catch handler discriminates by signal state, not error type — src/runtime/supervise/runtime.ts
Lines 337-341: the try/catch around fetch checks
interruptSig.aborted && !signal.aborted && !controller.signal.abortedto decide whether tocontinue(interrupt) orthrow(external abort/network error). This is signal-state-based discrimination rather than error-type-based (e.g. checkinge instanceof DOMException && e.name === 'AbortError'). In a tight race where fetch rejects with a non-abort error at the same instant a forceful interrupt fires, the network error is silently swallowed. In practice, the window is vanishingly small: the interrupt must land between fetch dispatch and error rejection. Not a practical concern, but aNameError-styl
🟡 LOW Forceful-interrupt path through the executor is untested — src/runtime/supervise/runtime.ts
inbox.test.ts tests the inbox-level signal abort (lines 33-51) and the queued settle-flush through the executor (lines 66-103), but there is NO executor-level test for the forceful interrupt flow: fetch throws → catch at runtime.ts:337-342 →
continue→ next iteration flush folds the message. This is the riskiest control-flow in the diff (distinguishing inbox-interrupt from fatal external abort) and it is exercised only by reasoning. Add a test that stubs fetch to reject with an AbortError after an interrupt deliver,
🟡 LOW Interrupted turn inflates iterations in spend (accounting nit) — src/runtime/supervise/runtime.ts
turns += 1runs at line 314 before the fetch. When a forceful interrupt aborts the fetch, the catch doescontinue(line 340) without parsing usage, so that turn contributes +1 tospent.iterations(line 397) but +0 tokens. The equal-k / budget assertions key offiterations, so a chatty interrupter can make a worker look like it used more compute than it did. Consider decrementing `turn
🟡 LOW Settle-point flush continues consume maxTurns slots — src/runtime/supervise/runtime.ts
Lines 357-361: when toolCalls.length === 0 and flush() returns true (pending steer/answer messages), the code
continues the for-loop. Botht(loop counter) andturns(spend counter) increment, consuming a maxTurns slot for what is a read-and-fold, not an inference call. The maxTurns default is 200 (line 281), so a worker trapped in 'finish → steer arrives → flush → continue' without making tool calls could exhaust maxTurns before producing terminal output. A trivial fix: don't incrementturnsfor the settle-flus
🟡 LOW Executor forceful-interrupt path not covered by integration test — tests/loops/inbox.test.ts
The executor integration test (line 66-103) validates the QUEUED delivery mode (flush-before-settle at runtime.ts:360) but not the FORCEFUL interrupt mode (interrupt:true → AbortSignal aborts fetch → catch at runtime.ts:340 → continue → re-plan). The interrupt primitive is unit-tested at inbox.test.ts:33-51, but the executor-level round-trip (deliver interrupt mid-fetch → fetch throws → loop continues with folded message) has no coverage. This is the more complex async path and the one most likely to break under changes to AbortSignal.any or the catch logic.
🟡 LOW Resume fold format untested in fold unit test — tests/loops/inbox.test.ts
The 'folds queued messages into one operator turn' test (line 23-31) delivers steer + answer and checks their fold substrings, but never delivers a resume message. The resume fold path ('- Continue: ') at inbox.ts:79 is exercised indirectly by the executor integration test but has no direct unit assertion on its fold output. Add
inbox.deliver({ resume: 'keep going' })to the fold test and assertexpect(folded).toContain('Continue: keep going').
tangletools · 2026-06-16T23:03:46Z · trace
tangletools
left a comment
There was a problem hiding this comment.
❌ 1 Blocking Finding — 83da8dfa
Full multi-shot audit completed 5/5 planned shots over 10 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 10 changed files. Global verifier still owns final merge decision.
Full immutable report for this review: trace
Summary comment for this run: full summary
tangletools · 2026-06-16T23:03:46Z · immutable trace
Simplify without losing capability:
- MERGE steer_worker + resume_worker → one steer_worker (any live worker; the only
real axis was interrupt forceful-vs-queued, already a param). 'Resume' = a non-
interrupt steer. Removes a redundant verb + dissolves the resume-vs-steer prompt nits.
- REMOVE await_next — it was a strict subset of await_event({kinds:['settled']}).
One wait-verb now; callers/prompts pass kinds:['settled'] for the next finished worker.
- DROP bus.peek() — speculative, only its own test used it (YAGNI).
Down-leg event union + inbox shed the dead 'resume' kind. Full suite 1007 pass;
typecheck/build/lint clean.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 809e53ce
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T23:17:38Z
tangletools
left a comment
There was a problem hiding this comment.
🟡 Value Audit — sound-with-nits
| Verdict | sound-with-nits |
| Concerns | 3 (3 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 102.9s (2 bridge agents) |
| Total | 102.9s |
💰 Value — sound
Completes the Scope.send→deliver pipe that was designed but inert — adds per-worker inbox receiving, records down-leg on the bus, and consolidates 3 tools into 1 — a clean, in-grain completion of the existing architecture.
- What it does: Wires the down-leg (parent→child communication) into the existing event bus/Scope.send infrastructure: (1) adds
createInbox— a per-worker message buffer the executor loop drains at step boundaries + before settling, supporting queued and forceful (interrupt) delivery; (2) records down-leg events on the bus withqueue:falseso they're visible in history()/subscribers but never pulled by the pa - Goals it achieves: Achieves true bidirectional communication with a single observability spine — both upward (child→parent, queued+pullable) and downward (parent→child, record-only+delivered) events are stamped, ordered, and replayable in one history(). The worker now actually receives and processes steering/answers mid-run, with the guarantee it cannot settle while unread messages are pending. Tool surface shrinks
- Assessment: The change completes the bus architecture coherently. The Scope.send→deliver pipe was already designed into the codebase (src/runtime/supervise/scope.ts:353-360 at the merge base) but no standard executor implemented
deliver— so the down-leg was inert.inbox.tsis the minimal implementation needed to close that loop (83 lines, well-tested). Thequeue:falsepublish option is a 3-line additio - Better / existing approach: none — this is the right approach. The Scope.send→deliver→inbox chain was the designed path from the start (see docs/canonical-api.md:242 documenting
sendas the steer primitive, and scope.ts:108-109 storingdeliveron spawn). The PR does not introduce a new architecture; it implements the executor-side of a pipe that was already laid out. Thequeue:falseoption is additive to the existing E - Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
🎯 Usefulness — sound-with-nits
Bidirectional coordination bus completes the pre-existing design: the down-leg is correctly wired Scope.send→inbox→executor-loop, observed via queue:false on the unified event bus, with only the router-tools executor receiving it today (by design).
- Integration: Reachable through two MCP tools: steer_worker (coordination.ts:360-388) and answer_question (coordination.ts:449-468), both routing through Scope.send (scope.ts:353-360) → child.deliver → Inbox.deliver (inbox.ts:59-64). The worker side drains the inbox at step boundaries and before settling in routerToolsInlineExecutor.execute() (runtime.ts:307-360). Bus recording via sendDown (coordination.ts:206
- Fit with existing patterns: Fits the codebase grain exactly: the EventBus was always designed to be bidirectional (event-bus.ts docstring lines 15-17 describe the pass-through lane), Executor.deliver was already an optional interface member in supervise/types.ts:94, and Scope.send was already defined in scope.ts:353. This change connects seams that were deliberately held open. The queue:false PublishOption is a minimal, non-
- Real-world viability: Concurrency: single-threaded event loop, Inbox is a plain array push/splice, safe by construction. Error paths: parseDown ignores malformed messages (inbox.ts:40-53), Scope.send returns false for unknown/settled/deliver-less children (scope.ts:356-358), the caller receives delivered:false — not a crash. The forceful interrupt path in the executor catches inbox-originated aborts and re-plans (runti
- Model: opencode/deepseek/deepseek-v4-pro
- Bridge attempts: 1
💰 Value Audit
🟡 history() comment still references 'resume' after removal [maintenance] ``
src/mcp/tools/coordination.ts line ~117: the JSDoc for
history()says(steer / answer / resume)but theresumeevent kind was removed in the final refactor (merged intosteer). The actualCoordinationEventunion only hassteerandanswer. This is a stale comment, not a functional bug —history()shows actual events, not what the comment says.
🎯 Usefulness Audit
🟡 resume_worker tool claimed but not present [ergonomics] ``
PR title says resume_worker is a new tool, but no such tool exists in coordination.ts. The InboxMessage kind is steer|answer (inbox.ts:18-24), never resume. A parked-worker resume IS achievable via steer_worker with a continuation instruction, so the capability exists — but a driver LLM expecting a dedicated resume_worker tool will not find one. Either add the tool (it would be a thin wrapper over steer_worker or a dedicated inbox kind) or correct the title. File: src/mcp/tools/coordination.ts t
🟡 subscriber delivery is sequential, so one slow/throwing subscriber blocks all others [robustness] ``
EventBus.publish iterates subscribers sequentially with await (event-bus.ts:110). The comment says this is by design for ordering (line 108-109). A throwing subscriber will prevent subsequent subscribers from receiving events. In the down-leg path, sendDown calls bus.publish with queue:false — if an onEvent subscriber throws, the MCP handler returning {delivered} could have already sent the message to the child inbox but fail to record it. This is pre-existing event-bus design, not new in this P
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
❌ Needs Work —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 29 | 48 | 29 |
| Confidence | 85 | 85 | 85 |
| Correctness | 29 | 48 | 29 |
| Security | 29 | 48 | 29 |
| Testing | 29 | 48 | 29 |
| Architecture | 29 | 48 | 29 |
Full multi-shot audit completed 5/5 planned shots over 17 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 17 changed files. Global verifier still owns final merge decision.
Blocking
🔴 HIGH AbortSignal.any requires Node >=20.3.0 but engines says >=20 — src/runtime/supervise/runtime.ts
Line 319 uses
AbortSignal.any([signal, controller.signal, interruptSig]). This static method was added in Node.js 20.3.0. The package.jsonenginesfield says">=20", which includes 20.0.0–20.2.x whereAbortSignal.anydoes not exist. On those Node versions the executor will throwTypeError: AbortSignal.any is not a functionon the first turn. The tsconfiglib: ["ES2022"]also doesn't include this API natively, though@types/node ^25.9.3may cover it via global augmentation. Fix: tighten engines to">=20.3.0"or use the existinglinkSignalshelper (still alive in the file at [line 980](https://github.com/tangle-network/agent-runtime/blob/809
Other
🟠 MEDIUM await_next MCP tool removed — breaking API change — src/mcp/tools/coordination.ts
The await_next tool is fully removed from the tools array (old L337-355). await_event({ kinds: ['settled'] }) provides equivalent behavior, but any driver hard-coded to call await_next will break. Grep confirms 17 references remain in bench/src/mcp-mount-probe.mts:89-91, bench/src/atom-humaneval.mts:146, bench/src/atom-mcp-e2e.mts:178, docs/glossary.md:31,36,64,68, docs/execution-model.md:52,56,116, docs/architecture-visual.md:110, skills/supervise/SKILL.md:18, skills/loop-writer/SKILL.md:113, and CLAUDE.md:65. These stale references are outside this shot but document the breaking surface.
🟠 MEDIUM Forceful interrupt only aborts the inference fetch, not tool execution — docstring overpromises — src/runtime/supervise/runtime.ts
runtime.ts:317-336: turnSignal (which includes interruptSig) is passed ONLY to the fetch call at line 335. Tool execution at lines 375-393 calls
await seam.executeToolCall(name, args, task)with NO signal — the RouterToolsSeam.executeToolCall interface (runtime.ts:246) doesn't accept one. So a forceful interrupt arriving during tool execution: (1) fires live.abort() which sets interruptSig.aborted=true, (2) but the fetch already completed so turnSignal's abort has no consumer, (3) the tool runs to completion, (4) the
🟠 MEDIUM deliver inbox only wired for routerToolsInlineExecutor — src/runtime/supervise/runtime.ts
Only
routerToolsInlineExecutor(line 297) implementsdeliver. ThesandboxExecutor,bridgeExecutor,cliExecutor, andcliWorktreeExecutoromit it. This means steer_worker/answer_question can only reach tool-using router workers. Scope.send correctly returnsfalsewhendeliveris absent (scope.ts:357), so non-inbox workers silently ignore the down-leg. This appears deliberate (only multi-turn executors are steerable) but the asymmetry should be documented — a supervisor calling steer_worker on a sandbox worker will seedelivered: falsewith no explanation that the backend doesn't support it.
🟠 MEDIUM Forceful-interrupt executor path (fetch-abort→catch→continue) is untested end-to-end — tests/loops/inbox.test.ts
The 'a worker may not settle while a steer is pending' test (line 57) exercises only the QUEUED path: a non-interrupt steer delivered mid-turn triggers flush-before-settle. The forceful path — deliver({steer,interrupt:true}) mid-fetch → AbortSignal.any fires → fetch throws → catch at runtime.ts:337 checks interruptSig.aborted&&!signal.aborted&&!controller.signal.aborted → continue → next turn flushes the message — has NO integration test. The unit test at line 33 verifies freshInterrupt().aborted becomes true, but nothing verifies the execu
🟡 LOW Documented positional arg name diverges from tool schema — bench/src/profiles.ts
Line 98 documents steer_worker(worker, instruction, interrupt?) but the actual MCP schema in src/mcp/tools/coordination.ts:369 names the first property 'workerId', not 'worker'. The prompt shorthand is consistent with sibling entries (spawn_worker(profile, task), observe_worker(worker)) and was already this way before the PR, so it is not a regression — purely a cosmetic prompt/schema naming mismatch. No action required for this PR; flagging only for awareness if prompt/schema parity is ever asserted.
🟡 LOW Integration regression: await_next removed but bench/* prompts still reference it — src/mcp/tools/coordination.ts
This file removed the await_next tool (folded into await_event). However bench/src/mcp-mount-probe.mts:89-91, bench/src/atom-humaneval.mts:146, and bench/src/atom-mcp-e2e.mts:178 still inject LLM system/user prompts instructing the model to call 'await_next' — a tool that no longer exists in the served MCP surface. Those benchmarks will fail or produce no-ops because the model calls a nonexistent tool. The bench files are unchanged by this PR (confirmed via git diff --stat), so this is a regression introduced by coordination.ts. Fix is in bench/* (outside this shot) but the root cause is here. Flag for the global verifier to assign to the bench shot.
🟡 LOW Stale 'resume' in history() docstring — no resume event kind exists — src/mcp/tools/coordination.ts
Line 116: the history() JSDoc says 'DOWN (steer / answer / resume)' but the CoordinationEvent union (lines 85-90) defines only 'steer' and 'answer' as down-leg kinds — there is no 'resume' member. This is a leftover from the earlier resume_worker iteration (commit b6eaf6d) that was removed in the 12→10 tool unification (commit 809e53c). A reader inspecting history() output would expect a 'resume' type that never appears. Fix: drop '/ resume' from [line 116](https://github.com/tangle-network/agent-runtime/blob/809e53ceb08d4a008c
🟡 LOW answer_question return shape is inconsistent across answer/defer/escalate paths — src/mcp/tools/coordination.ts
The answer path (line 468) returns {question, delivered} but the defer path (line 471) and escalate path (line 483) return {question} with no
deliveredfield. A consumer destructuringdeliveredgets undefined for defer/escalate. This is an experimental API so not breaking, but the inconsistency is a latent footgun — defer/escalate don't deliver to a worker, so returning delivered:false (or omittin
🟡 LOW sendDown answer path has dead questionId fallback that silently masks a bug — src/mcp/tools/coordination.ts
Line 212:
questionId: questionId ?? ''. The sole caller (answer_question handler, line 461-465) always passes questionId explicitly, so the?? ''fallback is dead. If a future caller forgot questionId, the recorded 'answer' event would carry questionId:'' silently rather than failing — corrupting the audit trail with an untraceable answer. Prefer making questionId required for type==='answer' (e.g., a typed overload or a runtime assert) instead of defaulting to empty string.
🟡 LOW peek removal from EventBus is a breaking API change — src/runtime/supervise/event-bus.ts
The
peek()method was removed from theEventBus<E>interface (was at line 61-62 in the old commit). No internal callers exist (verified via grep of src/ and tests/), and the module is tagged@experimental, so the blast radius is small. However, any external consumer calling.peek()on a bus instance will get a runtime TypeError. Consider keepingpeekwith a@deprecatedtag for one release cycle, or documenting the removal in the PR/release notes.
🟡 LOW answer_question down-message always queued (no interrupt path) — may stall a blocked worker — src/runtime/supervise/inbox.ts
inbox.ts parseDown (lines 40-53) supports interrupt:true on both steer and answer kinds. But coordination.ts answer_question handler calls
scope.send(question.from, { answer, questionId })WITHOUT setting interrupt:true (verified in the coordination.ts diff, not in this shot). A worker that asked a blocking question may be parked in a long inference call waiting for the answer; without interrupt, the answer only flushes at the next step boundary, which may not arrive until the current (potentially wrong-direction) inference completes. For a truly blocking question the answer should arguably be forceful (interrupt:true) to unpark the worker immediately. I
🟡 LOW test coverage: no test for concurrent inbox drain + deliver during settle flush — src/runtime/supervise/inbox.ts
The inbox tests (tests/loops/inbox.test.ts) cover parsing, folding, interrupt signaling, and the router-tools executor drain loop. However, no test exercises a deliver arriving AFTER drain() is called but before the model response resolves — the exact race that the pre-settle flush (runtime.ts:360) is designed to prevent. While JS's single-threaded event loop makes this safe in practice, a test with a controlled deliver-during-settle window would harden the invariant that a worker never settles with unread messages.
🟡 LOW Interrupted turns inflate the iteration count against maxTurns — src/runtime/supervise/runtime.ts
runtime.ts:313-314:
turns += 1executes before the fetch at line 322. When a forceful interrupt aborts the fetch (catch at line 337, continue at line 340), the turn counter has already incremented but no inference completed. Over many interrupts, a supervisor can burn through the worker's maxTurns budget (default 200) without the worker completing useful work — the worker hits the maxTurns
🟡 LOW No test for answer_question delivered:false when worker settles after asking — tests/loops/coordination.test.ts
The 'answer_question routes the answer down to a LIVE worker' test (line 306) covers delivered:true (live w0). The failClosed test covers delivered:false for a non-existent 'driver-1'. But the race — worker w0 asks a question, then settles before the driver answers — is untested. scope.send returns false when child.delivered is true (scope.ts:357). A regression where answer_question reported delivered:true for a settled worker (e.g., checking the wrong flag) would not be caught. This is a minor gap because the scope.send contract is the guard, but the coordination layer's handling of this race (answer recorded in trail but delivered:false) deserves a direct
🟡 LOW answer_question return value unasserted in failClosed test — tests/loops/coordination.test.ts
Line 183:
answer_questionnow returns{ question, delivered }but the failClosed test only awaits it without asserting the return shape. Thedelivered: falsepath (mock scope rejects 'driver-1') is observable in the emitted trail but the tool return is unchecked. The new dedicated test at line 306 covers thedelivered: truepath. Minor gap — could addexpect(result.delivered).toBe(false)at [line 183](https://github.com/tangle-network/agent-runtime/blob/809e53ceb08d4a008c8372ae7b1688dee0996181/tests/loops/coordinat
🟡 LOW await_event idle semantics with mismatched kinds filter not tested — tests/loops/coordination.test.ts
Line 392-417: The 'await_event with kinds filter' test verifies
{ idle: true }when the matched-kind queue is empty AND no settlements drain. It does NOT test the case where a settlement IS drained but the kinds filter excludes 'settled' — the handler at coordination.ts:422 returns{ idle: !drained }which would be{ idle: false }despite nothing pullable of the requested kind. This could drive a busy loop in a caller with a non-settled filter. A test asserting this edge case would improve coverage.
🟡 LOW Fold-format assertions use toContain, not exact match — tests/loops/inbox.test.ts
Lines 31-33 assert the folded operator turn with three separate toContain checks ('[SUPERVISOR]', 'New instruction from your supervisor: switch to recursion', 'Answer to your question (q7): v2'). A malformed fold that interleaves garbage between these substrings would still pass. Minor, but since fold() is the worker's only view of out-of-band messages, an exact toEqual on the full string would be more robust against format regressions.
tangletools · 2026-06-16T23:32:37Z · trace
tangletools
left a comment
There was a problem hiding this comment.
❌ 1 Blocking Finding — 809e53ce
Full multi-shot audit completed 5/5 planned shots over 17 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 17 changed files. Global verifier still owns final merge decision.
Full immutable report for this review: trace
Summary comment for this run: full summary
tangletools · 2026-06-16T23:32:37Z · immutable trace
…gent-eval kernel) createDetectorMonitor (supervise/detector-monitor.ts) — the online analyst on the live worker pipe. Folds each tool step through agent-eval 0.93.0's published streaming kernel (repeatedActionDetector/errorStreakDetector — the SAME kernel control-runtime folds; no detection logic reimplemented) and fires onSignal → a finding on the bus the moment a worker loops or error-storms. routerToolsInlineExecutor feeds it via a new onToolStep seam. Bumps agent-eval ^0.93.0. monitor tests (4); full suite 1011 pass; typecheck/build/lint clean.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 6f8b82d0
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-17T00:25:42Z
Last mile: createCoordinationTools.raiseFinding (exposed on the MCP handle) — the seam an ONLINE detector uses to publish a finding on the live bus mid-run. Proven end-to-end: a stuck-loop on the worker pipe → monitor → raiseFinding → await_event surfaces it. Review fixes (audit on the earlier commit): - HIGH: AbortSignal.any (needs Node 20.3, floor is 20) → portable mergeAbortSignals. - forceful interrupt: docstring no longer overpromises (aborts in-flight inference, a tool mid-exec finishes first); interrupted turns no longer count toward maxTurns; added the e2e test (forceful steer aborts the turn, re-plans, aborted turn is free). - answer to a BLOCKING question is now delivered forcefully (interrupt) to unpark the worker immediately, not at its next boundary. - sendDown 'answer' now REQUIRES questionId (overload; no silent ?? '' mask). - tool-step status captured (error vs ok) for the error-streak detector. - stale await_next purged from bench prompts + docs; history() docstring drops 'resume'. - added tests: answer delivered:false + return asserted; await_event idle-on-mismatch. full suite 1014 pass; typecheck/build/lint clean.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — d92fe1a6
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-17T00:36:33Z
|
Final-review pass ( Last mile: Resolved:
Intentional (this Full suite 1014 pass; typecheck/build/lint clean; merges cleanly into main; depends on agent-eval 0.93.0 (published). |
…es agent-eval) createTrajectoryRecorder (supervise/trajectory-recorder.ts) — the post-hoc half of the analyst pipe. Replays a worker's captured tool steps as agent-eval spans (InMemoryTraceStore) and runs its PUBLISHED batch analyzers — buildTrajectory (structured run summary), stuckLoopView (full-run repeated-call view, complementing the online consecutive detector), toolWasteView. No analysis reimplemented; the thin bridge from live tool steps to the substrate trace model. Feeds from the same onToolStep seam as the online monitor. 3 recorder tests (real spans → real agent-eval findings); full suite 1017 pass; typecheck/build/lint clean. Closes both legs: online (mid-run) + settle (post-hoc).
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 27dd2cec
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-17T00:44:16Z
|
Last mile #2 done (
So the pipe is now complete:
Both reuse agent-eval 0.93.0's published analyzers; both feed from |
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 1 (1 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 328.9s (2 bridge agents) |
| Total | 328.9s |
💰 Value — sound
Makes the coordination bus truly bidirectional by adding the down-leg (parent→child steer/answer/resume) on the same EventBus via a queue:false record-only flag, plus two analysis adapters that reuse agent-eval — clean architecture, no duplication, no better alternative found.
- What it does: The PR completes the bidirectional coordination surface for the supervise system. Previously (#316/#317) the EventBus was one-directional: child→parent events (settled/question/finding) queued for the driver to pull. This PR adds: (1)
PublishOptions.queueon EventBus — defaulttrue(queued/pullable),falsemeans record-only (lands inhistory()+ reachessubscribeobservers but never ente - Goals it achieves: ONE observability spine for both directions. Before this PR, UP messages (child→parent) were stamped, ordered, and replayable in
history(), but DOWN messages (parent→child) were fire-and-forget — delivered viascope.sendwith no audit trail. Now both directions share the sameEventBus.history()record, so a singlehistory()call shows the full ordered conversation in both directions. Secon - Assessment: Coherent and well-architected, built firmly in the grain of the codebase. The
queue:falsedesign is elegant: a single boolean onPublishOptionsdistinguishes the two access patterns (UP = pull-queue, DOWN = addressed routing) without splitting into two buses, preserving unifiedhistory()ordering. ThesendDownoverload that REQUIRESquestionIdforanswerevents (no silent?? ''fallb - Better / existing approach: none — this is the right approach. The alternative (two separate buses, one per direction) would lose the unified
history()ordering trail and the single subscriber model. The single-bus-with-queue-flag design is simpler and gives both directions one stamp/seq/timestamp spine. I searched for existing parent→child delivery mechanisms (scope.send,Executor.deliver,RootSignalresume/cancel - Model: opencode/zai-coding-plan/glm-5.1
- Bridge attempts: 2
- Bridge warning: opencode/deepseek/deepseek-v4-pro: Bridge returned 503: {"error":{"message":"cli-bridge admission timed out after 30000ms","type":"admission_rejected","reason":"queue_timeout","admission":{"active":24,"queued":0,"maxActive":24,"maxQueue":32}}}
🎯 Usefulness — error
usefulness agent produced no parseable value-audit JSON.
- Model: kimi-code/kimi-for-coding
- Bridge attempts: 3
- Bridge error: opencode/deepseek/deepseek-v4-pro: Bridge returned 503: {"error":{"message":"cli-bridge admission timed out after 30000ms","type":"admission_rejected","reason":"queue_timeout","admission":{"active":24,"queued":1,"maxActive":24,"maxQueue":32}}}; opencode/zai-coding-plan/glm-5.1: Bridge returned 503: {"error":{"message":"cli-bridge admission timed out after 30000ms","type":"admission_rejected","reas
💰 Value Audit
🟡 Analysis modules exported but not wired into any default path [maintenance] ``
createDetectorMonitorandcreateTrajectoryRecorderare exported fromruntime/index.tsand have full test coverage, but neither is constructed or connected insideruntime.tsorcoordination.ts. TheonToolStepseam exists (runtime.ts:249) andraiseFindingis exposed onCoordinationMcpHandle, so a consumer CAN wire them — but there is no default wiring, meaning these capabilities are inert unless someone explicitly assembles them. This follows the codebase's established 'export pr
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 63 | 55 | 55 |
| Confidence | 95 | 95 | 95 |
| Correctness | 63 | 55 | 55 |
| Security | 63 | 55 | 55 |
| Testing | 63 | 55 | 55 |
| Architecture | 63 | 55 | 55 |
Full multi-shot audit completed 8/8 planned shots over 29 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 29 changed files. Global verifier still owns final merge decision.
🟠 MEDIUM Comment claims interrupted turns don't count toward maxTurns, but for-loop counter still increments — src/runtime/supervise/runtime.ts
Lines 344-347: The
continueafter an interrupt skipsturns += 1(spend tracking), but the for-loopfor (let t = 0; t < maxTurns; t += 1)unconditionally incrementst. So each interrupted turn DOES consume amaxTurnsbudget slot. An attacker who can repeatedly send interrupt messages could DOS the worker by consuming all maxTurns slots without the worker making progress. Either decrementtbeforecontinue, switch to a while loop, or update the comment to reflect the actual behavior.
🟠 MEDIUM Per-turn mergeAbortSignals leaks listeners on long-lived signals — src/runtime/supervise/runtime.ts
runtime.ts:325 calls
mergeAbortSignals(signal, controller.signal, interruptSig)INSIDE the per-turn for-loop, whereas the old code calledlinkSignals(signal, controller.signal)once before the loop (runtime.ts:1015-1026). EachmergeAbortSignalscall registers{ once: true }abort listeners onsignalandcontroller.signalvias.addEventListener('abort', onAbort, { once: true }). In normal operation (no interrupt, no teardown), these listeners never fire, so they are never removed. Over N turns, N closures accumulate on each long-lived signal. For default maxTurns=200 that's 200 leaked closures oncontroller.signalalone; for a long workflow with maxTurns=10000 it's 10000. They DO self-clean whencontroller.abort()fires on teardown (each fires once and self-removes), but t
🟠 MEDIUM mergeAbortSignals called per-turn accumulates AbortSignal listeners on parent signals — src/runtime/supervise/runtime.ts
Line 325:
mergeAbortSignals(signal, controller.signal, interruptSig)is called inside every for-loop iteration. Each call registers new{ once: true }listeners onsignalandcontroller.signal. The pre-PR code calledlinkSignals(signal, controller.signal)ONCE before the loop (line 200 in old code). With default maxTurns=200, this adds up to 400 cumulative listeners on the parent signals, only cleaned up at teardown when the signals fire. Fix: pre-linksignalandcontroller.signalonce before the loop, then per
🟡 LOW Paragraph density — single ~1000-word line — CLAUDE.md
The agent-driver bullet is now one continuous line covering bidirectional bus, inbox, executor deliver, interrupt, online detector, settle recorder, and the existing substrate notes. Pre-existing style, but this change nearly doubles its length. Impact: future maintainers will have trouble locating a specific claim. Fix: split into 2-3 logical lines (bus/down-leg | online+settle analysts | substrate pointers) — optional, cosmetic.
🟡 LOW Redundant restatement of the single-wait-verb description — CLAUDE.md
The phrase describing 'await_event({kinds?}) — the ONE wait verb; kinds:[settled] = next finished worker, omit = also questions/findings' appears twice within the same paragraph (once near the top in the bus description, once again mid-paragraph after the interrupt sentence). The second occurrence ('The driver waits on ONE verb — await_event({kinds?})...') adds nothing new. Impact: readability only. Fix: drop the second occurrence or merge the interrupt context into the first.
🟡 LOW Tool-name/prompt coupling has no compile-time or test guard — bench/src/atom-humaneval.mts
The driverSystem prompt at line 146 hardcodes the tool name
await_eventas a string. The coordination driver (coordination-driver.ts:165-168) dispatches bybyName.get(tc.name)— a mismatch yields a silentunknown toolerror at runtime, not a type error at build. This PR fixed the mismatch (await_next→await_event) but nothing prevents it from recurring if the canonical tool name changes again. Consider deriving tool names from a single const (e.g.COORDINATION_TOOL.AWAIT_EVENT) shared between coordination.ts and the bench prompts, or adding a smoke test that asserts every tool name mentioned in a prompt exists increateCoordinationTools(...).tools. Low
🟡 LOW DownMessageEvent.instruction is semantically misnamed for the answer kind — src/mcp/tools/coordination.ts
coordination.ts:76-80 + :476 —
DownMessageEvent.instructioncarries the raw answer text foranswerevents (instruction: answer). A reader auditing history() sees{toWorker, instruction, delivered}with no signal that the payload is an answer to a specific question rather than a steer. ThequestionIdlives only on the outeranswerevent wrapper (:90), not on the DownMessageEvent. Cosmetic/readability only; behavior is correct. Optional: rename topayloador add a discriminating field on DownMessageEvent.
🟡 LOW projectEvent defensive branch drops questionId for answer events — src/mcp/tools/coordination.ts
coordination.ts:245 — the fallback
return { type: ev.type, ...ev.down }projects asteer/answerevent without itsquestionId. The branch is effectively unreachable in normal flow because sendDown publishes with{queue:false}(event-bus.ts:105), so steer/answer never enter the pull queue andpull()can never return them; the comment at :243-244 acknowledges this. Impact: none in practice. If a future caller ever routes a down-event through the queue (e.g. a replayer), the audit projection would lose the questionId linkage. Fix when touching this:if (ev.type === 'answer') return { type: 'answer', ...ev.down, questionId: ev.questionId }before the fallback.
🟡 LOW Adversarial interrupt spam can exhaust maxTurns with zero useful work — src/runtime/supervise/runtime.ts
runtime.ts:347: when an interrupt aborts the fetch, the catch does
continue, which consumes a for-loop iteration (t += 1) but does NOT incrementturns. A driver (or bug) that sends forceful interrupts in a tight loop causes every fetch to abort-and-continue, burning allmaxTurnsiterations withturns === 0and producing no output. The loop IS bounded (terminates after maxTurns) and billing is not inflated, so this is acceptable degradation rather than a hang. But the worker silently settles empty with no error signal. Consider a dedicated interrupt-count backstop (e.g., if consecutive interrupts exceed a threshold, surface afindingor throw) so the failure mode is observable.
🟡 LOW Non-AbortError failures coinciding with an interrupt signal could be silently swallowed — src/runtime/supervise/runtime.ts
Line 347: The catch block only checks
interruptSig.aborted && !signal.aborted && !controller.signal.aborted. If a non-abort fetch error (e.g., network timeout, DNS failure) occurs at the same moment an interrupt signal fires, the error is silently swallowed viacontinuebecauseinterruptSig.abortedis true. The worker re-plans and retries, which is correct for the interrupt case, but a real network fault would be lost and retried verbatim. Consider also checkinge instanceof DOMException && e.name === 'AbortError'or inspecting the error type to distinguish genuine aborts from coincident network failures.
🟡 LOW onToolStep exceptions crash the worker loop — src/runtime/supervise/runtime.ts
runtime.ts:412 calls
seam.onToolStep?.({ toolName, args, status })synchronously with no try/catch inside the tool-call inner loop (lines 384-413). If the handler throws — e.g., acreateDetectorMonitor().observeToolStepwhoseargHash(step.args)chokes on a circular or non-serializable args object — the exception propagates through the for-loop, aborting the entire worker execute(). A monitoring/observability side-channel should not be able to crash the production inference loop. The detector-monitor'sobserveToolStep(detector-monitor.ts:53-61) also lacks an internal try/catch aroundobserveAll/argHash. Fix: wrap the `seam.onToolStep?.(...)
🟡 LOW Trajectory recorder records zero-duration spans (startedAt === endedAt) — src/runtime/supervise/trajectory-recorder.ts
trajectory-recorder.ts:68-69 sets both
startedAt: s.atandendedAt: s.atto the same timestamp (the push time, captured AFTER the tool call resolved at runtime.ts:412). Individual span durations are always 0. The docstring (line 31) promisesbuildTrajectoryyields 'total duration' as a meaningful metric, but per-span duration is lost. This does not affect loop/waste detection (those key on call patterns and cross-span windows, verified by the test assertingwindowMs > 0using the fake clock's per-step increment). Fix: capture start/end timestamps at the call site and pass them throughRecordedToolStep, or have the recorder accept an opt
🟡 LOW Stale await_next references in skill docs (outside shot scope, FYI only) — tests/loops/coordination.test.ts
Two skill markdown files (skills/supervise/SKILL.md:18, skills/loop-writer/SKILL.md:113) still reference the old tool name 'await_next' after the rename to 'await_event'. These are NOT in this shot's changed files so are not blocking, but they will mislead any agent that reads them. Flag for a follow-up doc-sync pass. No code impact — all test and source files are consistently renamed.
🟡 LOW inbox executor test relies on synchronous fetch mock timing for interrupt delivery — tests/loops/inbox.test.ts
The 'FORCEFUL steer aborts' test (L101-146) calls deliver() synchronously inside the fetch mock before throwing AbortError. This works because the mock is synchronous up to the throw, but a real fetch abort is asynchronous — the abort signal fires, then fetch rejects on a later microtask. The test correctly validates the catch-block logic (runtime.ts:347: interruptSig.aborted && !signal.aborted && !controller.signal.aborted → continue), but the synchronous delivery masks any race between signal-abortion and fetch-rejection timing. Not a defect in the test's correctness for the current implementation, just a fragility note: if the executor's abort handling becomes more timing-sensitive, this test may need async delivery.
tangletools · 2026-06-17T01:12:05Z · trace
…, comment accuracy)
- mergeAbortSignals listener leak: pre-link external signals ONCE; per-turn add+remove the
listener (no accumulation on long-lived signals over maxTurns).
- interrupt catch now requires a real AbortError (DOMException) — a network fault coincident
with an interrupt is no longer swallowed; rethrown.
- corrected the comment: an interrupted+re-planned turn DOES consume a maxTurns slot (bounded
backstop, not a hang) — it just doesn't bill a turn.
- onToolStep is an observability side-channel: wrapped so a throwing monitor can't crash the
worker loop; detector-monitor.observeToolStep also defends argHash on circular/unhashable args.
- projectEvent preserves questionId on the answer branch.
- stale await_next purged from skills/{supervise,loop-writer}; trimmed CLAUDE.md redundancy;
softened the recorder's per-span-duration claim.
full suite 1018 pass; typecheck/build/lint clean.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 13cecb5f
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-17T09:25:21Z
|
Addressed the audit ( Fixed (real):
Opted out (cosmetic, with reason): Full suite 1018 pass; typecheck/build/lint clean; merges cleanly. |
…ent) — main was red (#322) #318 renamed the coordination wait verb await_next → await_event; #319's metering test (merged around the same time) still scripted await_next. Since the tool no longer exists, the scripted driver's 'collect the worker' turn returned {error:'unknown tool: await_next'} and the worker was never drained → result 'no-winner' → the 2 winner-expecting tests failed on main. Updated all 6 refs to await_event (the driver already fails loud on the unknown tool — only the fixed script couldn't recover, unlike a real LLM). Full suite 1030 pass (was 2 failing on main); no stale await_next remains anywhere.
Builds on the bus core (#316/#317) to make it bidirectional — the down-leg (parent→child) now rides the same observable primitive.
What changed
steer_worker/answer_question/resume_workerroute to the child inbox (scope.send→deliver) AND record aqueue:falseevent: it lands inhistory()and reachessubscribeobservers (full audit trail), but the parent never pulls its own outbound message back.resume_workerverb. The protocol already had{resume}on the inbox but no tool exposed it — a driver can now continue a parked worker.answer_questionroutes down. An answer is now delivered to the asking worker (unparking it), not just recorded.EventBusgainsPublishOptions.queue(default true) for record-only events — the generic seam that makes the down-leg observable without being pullable.Why
True bidirectional comms with ONE observability spine: both directions are stamped, ordered, replayable. Access patterns stay distinct on purpose — UP is a fan-in pull-queue, DOWN is addressed routing — but both are visible in one
history().Verification
🤖 Generated with Claude Code