Skip to content

feat(supervise): bidirectional coordination bus — down-leg + resume_worker#318

Merged
drewstone merged 9 commits into
mainfrom
feat/bus-bidirectional
Jun 17, 2026
Merged

feat(supervise): bidirectional coordination bus — down-leg + resume_worker#318
drewstone merged 9 commits into
mainfrom
feat/bus-bidirectional

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Builds on the bus core (#316/#317) to make it bidirectional — the down-leg (parent→child) now rides the same observable primitive.

What changed

  • Down-leg on the bus. steer_worker / answer_question / resume_worker route to the child inbox (scope.senddeliver) AND record a queue:false event: it lands in history() and reaches subscribe observers (full audit trail), but the parent never pulls its own outbound message back.
  • resume_worker verb. The protocol already had {resume} on the inbox but no tool exposed it — a driver can now continue a parked worker.
  • answer_question routes down. An answer is now delivered to the asking worker (unparking it), not just recorded.
  • EventBus gains PublishOptions.queue (default true) for record-only events — the generic seam that makes the down-leg observable without being pullable.

Why

True bidirectional comms with ONE observability spine: both directions are stamped, ordered, replayable. Access patterns stay distinct on purpose — UP is a fan-in pull-queue, DOWN is addressed routing — but both are visible in one history().

Verification

  • new tests: down-leg routes + records but is never pulled back; answer routes down with a recorded leg
  • coordination 11 + bus 6 tests; full suite 1000 pass; typecheck / build / lint clean; merges cleanly into main

🤖 Generated with Claude Code

… resume_worker

Close the bus to 100% bidirectional. The parent→child down-leg routes to the child
inbox (scope.send→deliver) AND records a queue:false event on the same bus: it lands
in history() + reaches subscribers for the audit trail, but is never pulled back by
the parent. New: resume_worker (continue a parked worker — the protocol had {resume}
but no verb); answer_question now routes the answer DOWN to the asking worker, unparking
it. EventBus gains PublishOptions.queue for record-only events.

down-leg + bidirectional history tests; full suite 1000 pass; typecheck/build/lint clean.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — b6eaf6de

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T20:36:55Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 2 (2 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 301.5s (2 bridge agents)
Total 301.5s

💰 Value — sound

Adds the parent→child down-leg to the existing event bus — steer/answer/resume route to child inbox AND record on the bus as queue:false (observable, never pulled back) — completing the bidirectional comms picture with zero new primitives.

  • What it does: Makes the supervise event bus bidirectional. Three MCP tools (steer_worker, answer_question, new resume_worker) now both deliver to the child inbox via scope.send→deliver AND publish a queue:false event on the same bus. Down-leg events appear in history(), reach subscribe/onEvent observers, and are counted in stats(), but the parent never pulls its own outbound message back. The `Publish
  • Goals it achieves: Complete the bidirectional communication picture on ONE observability spine. Before this, the bus was child→parent only (UP: settled/question/finding); the down-leg existed as scope.send but was silent — no audit trail, no pass-through, invisible to observers. Now both directions are stamped, ordered, replayable in one history(). Access patterns stay distinct by design: UP is fan-in pull-queue,
  • Assessment: Coherent and well-factored. Builds directly on the bus core from #316/#317 without reinventing anything. The queue: false option on publish is the minimal, principled seam — it reuses the same typed union, same subscribe/history/stats machinery, same code paths. resume_worker is a genuine new verb (the protocol already had {resume} on the inbox, coordination.ts:389 — it just wasn't exposed
  • Better / existing approach: none — this is the right approach. The only pre-existing 'down' mechanism was scope.send (scope.ts), which was a silent one-way delivery. Adding observability by publishing to the existing bus (rather than building a second audit channel) is the simplest design. No codebase entity does 'record without queuing' — queue: false is a clean 1-line extension to publish() at event-bus.ts:107 and
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

🎯 Usefulness — sound-with-nits

The down-leg infrastructure is coherent, fits the ONE typed pipe pattern, and is wired into the primary call path — normal ahead-of-first-caller gaps (no executor implements deliver yet, resume_worker not in the operator prompt) do not gate shipping.

  • Integration: Wired into the primary consumer: coordinationDriverAgent (coordination-driver.ts:132–143) maps all tools including steer_worker/resume_worker into the LLM's turn loop via byName. The coordination MCP server (coordination-mcp.ts:68) also serves them over HTTP. The event bus recording path (sendDown at coordination.ts:206–208 → bus.publish(…, { queue: false }) at event-bus.ts:107) correc
  • Fit with existing patterns: Fits the existing event bus pattern exactly — the down-leg rides the same createEventBus<CoordinationEvent>() instance (coordination.ts:137), queue: false cleanly separates record-only down-leg events from the pull queue, and sendDown mirrors the existing bus.publish call pattern used by the up-leg. No competing pattern exists; scope.send is the established child inbox mechanism (scope.t
  • Real-world viability: Will hold up under realistic use: scope.send checks child.delivered synchronously (scope.ts:357) — no async race on already-settled children. The queue: false guard correctly keeps down-leg events out of the pull queue (event-bus.ts:107) so await_event never returns them (test at coordination.test.ts:304). The answer_question handler correctly decides the question FIRST (coordination.ts:
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

🎯 Usefulness Audit

🟡 resume_worker tool exists but is absent from the OPERATOR_TOOLS list and driver profile system prompt [integration] ``

The driver LLM won't be prompted to use resume_worker because it's missing from bench/src/profiles.ts OPERATOR_TOOLS (line 18–26) and the driver profile's systemPrompt (lines 86–117) only mentions steer_worker, not resume_worker. The tool IS in the MCP tool specs (coordination.ts:373–391) and the LLM could discover it via tool descriptions, but the prompt doesn't advertise it. Add resume_worker to the OPERATOR_TOOLS array and reference it in the driver system prompt.

🟡 No built-in executor implements deliver — the down-leg messages reach the inbox but no executor drains it [problem-fit] ``

All 11 built-in executor factories (routerInlineExecutor, routerToolsInlineExecutor, sandboxExecutor, cliExecutor, bridgeExecutor, cliWorktreeExecutor, driverExecutorFactory, createWorktreeCliExecutor, shotExecutor, analystExecutor, authoredWorker's inner) return objects without deliver — verified in runtime.ts:173–226, 267–387, 403–467, 557–600, 693–776, 785–801; driver-executor.ts:125–182; strategy.ts:346–425, 427–457; authoring.ts:76–104. The completion-gate.ts:57 passthrough only passes


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260616T204349Z

@tangletools

Copy link
Copy Markdown
Contributor

❌ Needs Work — b6eaf6de

Readiness 54/100 · Confidence 80/100 · 8 findings (1 high, 1 medium, 6 low)

deepseek glm aggregate
Readiness 54 83 54
Confidence 80 80 80
Correctness 54 83 54
Security 54 83 54
Testing 54 83 54
Architecture 54 83 54

Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision.

Blocking

🔴 HIGH answer_question hides delivery failure from caller — src/mcp/tools/coordination.ts

At line 479, const delivered = opts.scope.send(question.from, { answer, questionId }) is computed but never included in the return value at line 488 (return { question }). steer_worker (line 369) and resume_worker (line 390) both return { delivered }. Callers of ans

Other

🟠 MEDIUM No test for successful answer delivery to a live worker — tests/loops/coordination.test.ts

The test at line 159-200 (blocks stop under failClosed...) uses from: 'driver-1' which does not match any worker in the mock scope (send only returns true for 'w0'). This means delivered: false is the only path tested for answer_question delivery. The happy path where a question originates from 'w0' and scope.send returns true is untested, leaving a coverage gap for the new answer-routing behavior.

🟡 LOW single monolithic paragraph reduces scanability — CLAUDE.md

Line 65 is an extremely long paragraph (one bullet spanning ~1.5KB). The bidirectional bus addition makes it longer still. This is pre-existing style and matches the rest of the code map section, but breaking the agent-driver bullet into sub-bullets (one for the bus, one for observability, one for analysts, one for core types) would improve readability without changing any substance. Not blocking.

🟡 LOW answer_question now attempts down-delivery — behavior change, currently a no-op — src/mcp/tools/coordination.ts

Lines 478-487: answer_question previously only recorded the decision locally (decideQuestion). It now also calls opts.scope.send(question.from, { answer, questionId }) and publishes an answer down-leg event. This is a behavior change for any consumer whose question.from resolves to a live child with a deliver callback. Today this is a no-op (no executor implements deliver), but when one does, the answer will be routed to the worker inbox with shape { answer, questionId } — a shape no executor handler currently recognizes (strategy.ts:387 only handles t.steer). The comment at [line 478](https://github.com/tangle-network/agent-runtime/blob/b6eaf6de2

🟡 LOW decideQuestion mutates question ledger before delivery attempt — src/mcp/tools/coordination.ts

At line 473, decideQuestion marks the question as 'answered' by mutating the questions array before scope.send is called at line 479. If bus.publish subsequently throws (e.g., a subscriber error), the handler throws but the question remains mutated. Pre-existing pattern (drainSettlement at line 192 has the same recordSettled-before-publish ordering), but more visible now that the answer path d

🟡 LOW delivered flag is always false in production — no executor implements deliver — src/mcp/tools/coordination.ts

Lines 367, 388, 479: scope.send() captures delivered from scope.ts:357 which returns false when !child.deliver. Exhaustive grep of src/ confirms NO shipped executor (router, inline, sandbox, cli, driver, worktree-cli) implements the optional deliver?(msg) on the Executor interface (types.ts:94). Only completion-gate.ts:57 passes it through, and only if the inner executor defines it — which none do. So delivered: true in the steer/resume tests (coordination.test.ts:294-295) is mock-only behavior that cannot occur in production. The PR's code is correct (it faithfully mirrors scope.send's return), but the observability story is misleading: the down-leg a

🟡 LOW No direct unit test for queue:false at the event-bus primitive level — src/runtime/supervise/event-bus.ts

The new queue:false option is the core behavior change but event-bus.test.ts (96 lines, 6 tests) has zero coverage for it. Coverage comes only from coordination.test.ts's integration test ('steer_worker and resume_worker route down + record in history but are never pulled back', line 275) which asserts await_event returns {idle:true} — proving the queue stays empty — but does not directly assert pending()===0, that pull() returns undefined, or that history()/subscribe still see the record. A regression that broke queue:false but left scope.send intact would only be caught at the integration layer. Fix: add a focused test to event-bus.test.ts asserting tha

🟡 LOW resume_worker delivered:false path not explicitly tested — tests/loops/coordination.test.ts

The new test (lines 275-305) only covers the delivered:true path for resume_worker. steer_worker has an explicit failure-case test at line 118 (workerId: 'gone' → delivered: false), but resume_worker does not. The logic is symmetric (both call scope.send and return {delivered}), so the risk is low, but a one-line assertion mirroring the steer_worker 'gone' test would close the gap.


tangletools · 2026-06-16T20:48:46Z · trace

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ 1 Blocking Finding — b6eaf6de

Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary


tangletools · 2026-06-16T20:48:46Z · immutable trace

…iew gaps

Address PR #318 review:
- BLOCKING: answer_question computed `delivered` but returned only { question } —
  now returns { question, delivered }, consistent with steer_worker/resume_worker
  (no longer hides whether the answer reached a live worker).
- tests: answer routed down to a LIVE worker (delivered:true happy path); resume_worker
  delivered:false path; a focused event-bus queue:false unit test (history+subscribers
  see it, pull queue never does).
- resume_worker added to OPERATOR_TOOLS + the driver system prompt so the driver is
  actually prompted to use it.
tangletools
tangletools previously approved these changes Jun 16, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 8b509e56

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T21:41:53Z

@drewstone

Copy link
Copy Markdown
Contributor Author

Addressed the review in 8b509e5:

🔴 Blocking — answer_question hid delivery failure → fixed. Now returns { question, delivered }, consistent with steer_worker/resume_worker.

🟠 No happy-path answer-delivery test → added: a question from a live worker (w0) gets the answer routed to its inbox ({ answer, questionId }), delivered:true, both legs on the trail.

🟡 resume_worker absent from operator prompt → added to OPERATOR_TOOLS + the driver system prompt.

🟡 No resume_worker failure-path test / no direct queue:false unit test → both added (resume to a dead worker → delivered:false; a focused event-bus test proving queue:false records to history+subscribers but never enters the pull queue).

🟡 No executor implements deliver — down-leg delivery is inert in production → acknowledged, intentionally not in this PR. This is a pre-existing gap (steer_worker + scope.send/deliver already had it); this PR makes the down-leg observable, which is real and complete on its own. Making delivery functional needs an executor whose run-loop drains the deliver inbox between turns AND handles the {steer}/{answer}/{resume} shapes (today strategy.ts only handles {steer}). That's the next, separately-scoped piece — tracking it as the follow-up rather than ballooning this PR.

Full suite 1003 pass; typecheck/build/lint clean.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 1 (1 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 222.6s (2 bridge agents)
Total 222.6s

💰 Value — sound

Adds a parent→child down-leg to the coordination event bus via a queue:false publish option — making the bus truly bidirectional on one observable spine with minimal surface area.

  • What it does: Adds the parent→child direction (down-leg) to the existing child→parent coordination event bus. Three changes: (1) PublishOptions.queue (default true) — set false for record-only events that hit history/subscribers but skip the pull queue, preventing the parent from pulling its own outbound; (2) steer_worker and new resume_worker MCP tools now route messages to the child inbox AND record a `
  • Goals it achieves: True bidirectional communication on a single observable spine: both directions are stamped/ordered/replayable in one history(). The resume_worker tool fills a gap where the executor inbox protocol already accepted {resume} but no coordination MCP surface exposed it. answer_question now routes answers back to workers, closing the loop — previously answers were recorded locally but the block
  • Assessment: The design is coherent and proportionate. The queue:false flag on PublishOptions is the minimal change needed to make down-leg events observable without being pullable — a one-line guard in event-bus.ts:107. The new event types (steer, answer, resume) extend the existing CoordinationEvent union additively in the codebase's established pattern (coordination.ts:85-91). The tool imple
  • Better / existing approach: none — this is the right approach. Searched for existing bidirectional bus mechanisms, answer routing, or resume tooling across src/runtime/supervise/, src/mcp/tools/, and src/runtime/supervise/types.ts. The scope.send primitive already existed but was unobservable (no record of what was sent). The { kind: 'resume' } protocol existed in RootSignal (types.ts:452) but had no coordinati
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

🎯 Usefulness — sound-with-nits

Bidirectional bus is a clean architectural extension — one typed pipe for both directions with distinct access patterns — and the down-leg observability + answer-routing are genuinely useful improvements; the only nit is that no built-in executor implements deliver yet, so the delivery verbs retur

  • Integration: All three new down-leg paths are reachable. coordinationDriverAgent.act() (coordination-driver.ts:132) passes every tool from createCoordinationTools to the driver LLM, and serveCoordinationMcp (coordination-mcp.ts:58) does the same for sandbox MCP. resume_worker is listed in bench/src/profiles.ts:25 OPERATOR_TOOLS and bench/src/profiles.ts:99 driver prompt. answer_question is in the
  • Fit with existing patterns: Fits the existing one-typed-pipe architecture precisely. CoordinationEvent adds three additive discriminated-union members — no existing consumer breaks. queue: false on EventBus is a minimal generic seam (event-bus.ts:107) that doesn't distort the pull queue or subscription model. projectEvent handles down-leg kinds defensively (coordination.ts:227-229). No competing pattern exists — this
  • Real-world viability: Bus recording (queue: false) works correctly: events enter history + hit subscribers but never enter the pull queue (verified by event-bus.test.ts:82-102). answer_question routing the answer down is a structural improvement over the prior state where answers were only decided, not delivered. The real-world gap: no built-in executor (router/inline/sandbox/cli/bridge/driver) implements the optio
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

🎯 Usefulness Audit

🟡 No built-in executor implements deliver — delivery verbs return false with current runtimes [ergonomics] ``

scope.send() returns false when the child executor lacks a deliver method (scope.ts:357). All five built-in executor factories (runtime.ts:173,267,403,557,693 and driver-executor.ts:137-181) omit deliver. This means steer_worker (already pre-existing), resume_worker (new), and the answer-routing leg of answer_question (new) all return delivered: false in production with built-in executors. The protocol is correctly plumbed and will activate when a streaming/BYO executor adds `del


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260616T214726Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — 8b509e56

Readiness 73/100 · Confidence 85/100 · 11 findings (1 medium, 10 low)

deepseek glm aggregate
Readiness 73 77 73
Confidence 85 85 85
Correctness 73 77 73
Security 73 77 73
Testing 73 77 73
Architecture 73 77 73

Full multi-shot audit completed 5/5 planned shots over 6 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 6 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM answer_question return type branches are inconsistent — src/mcp/tools/coordination.ts

The answer path (lines 471-490) returns { question, delivered } with delivered: boolean. The defer path (lines 492-498) and escalate path (lines 500-511) return { question } without delivered. TypeScript unifies these as { question: QuestionRecord; delivered?: boolean } which is safe, but callers that destructure delivered without checking will get undefined for defer/esca

🟡 LOW DownMessageEvent.instruction naming is loose for resume — bench/src/profiles.ts

The resume_worker tool at coordination.ts:388 sends { resume: message } via scope.send but records the down-leg using DownMessageEvent.instruction to store the message (coordination.ts:389: sendDown('resume', { toWorker: workerId, instruction: message, delivered })). For steer_worker 'instruction' is the right term; for resume_worker the semantic is 'message'. This is cosmetic only — the field name is an internal shape and no caller depends on the distinction. The system-prompt wording at profiles.ts:99 already correctly uses 'message'. No behavior change needed.

🟡 LOW resume_worker lacks a dedicated prompt bullet; steer-vs-resume distinction not surfaced to the agent — bench/src/profiles.ts

Line 99 lists resume_worker only in the compressed one-liner: 'spawn_worker(...) / steer_worker(worker, instruction) / resume_worker(worker, message) / stop.' There is no dedicated bullet (like observe_worker at L97-98 or define_analyst at L95-96) telling the driver WHEN to use resume_worker (parked/idle worker) versus steer_worker (running worker). The code comment at L24-25 carries the distinction ('send a running worker...' vs 'continue a parked/idle worker...') but the agent never sees comments — it sees only systemPrompt. A driver that can't distinguish the two may call steer_worker on a parked worker (no live inbox → delivered:false) and waste a turn. Suggest add

🟡 LOW DownMessageEvent.toWorker field name is wrong when routing answers to non-worker askers — src/mcp/tools/coordination.ts

Lines 76-80 + 479-485: answer_question routes the answer to question.from via scope.send(question.from, ...). ask_parent accepts arbitrary from strings (line 532: str(a.from,'from')) — tests pass from:'driver-1' and from:'user' is the default-by fallback. The DownMessageEvent.toWorker field is therefore semantically incorrect for driver/user askers: it is labeled toWorker but carries a driver or 'user' id. No functional bug (delivered:false is surfaced correctly for non-node ids), but the field name lies about its contents. Fix

🟡 LOW No idempotency guard against double-answering a question — src/mcp/tools/coordination.ts

decideQuestion (line 290) does not check if the question is already answered. Calling answer_question twice on the same questionId causes: (1) decideQuestion overwrites the decision silently, (2) scope.send delivers a duplicate answer to the worker inbox, (3) the bus records a second answer event. Pre-existing in decideQuestion (not a regression), but the new routing+recording amplifies the effect — the worker now receives double-delivery instead of just an overwritten ledger record. Fix: check prior.status !== 'open' and either no-op or throw.

🟡 LOW history() docstring stale — doesn't mention down-leg event types — src/mcp/tools/coordination.ts

The history() method at CoordinationTools line 117-118 has a jsdoc: (settled / question / finding). Since this PR, the bus also carries steer, answer, and resume events (though never queued for pull, they appear in history). The docstring should reflect the full event surface.

🟡 LOW projectEvent defensive branch for down-leg types is unreachable — src/mcp/tools/coordination.ts

Lines 227-229: the fallback return { type: ev.type, ...ev.down } handles steer/answer/resume in projectEvent. But down-leg events are published with queue:false (lines 207, 486), so they never enter the pull queue, and projectEvent is only ever called on pulled events (await_event handler line 443). The branch is dead under current usage. Defensible as future-proofing, but add a test or a comme

🟡 LOW resume_worker description overpromises vs. scope.send semantics — src/mcp/tools/coordination.ts

Line 374-375: description says 'Resume a parked/idle worker with a follow-up message, continuing its session'. But scope.send (scope.ts:357) returns false for any child where child.delivered===true (already drained by next()). A worker that settled and was observed via await_next/await_event CANNOT be resumed — it is gone from the live set. 'Parked/idle' implies a pause/resume lifecycle that the scope model does not have: a child is either live (running, inbox-reachable) or delivered (drained, gone). The behavior is safe (delivered:false), but the description misleads callers into expecting they can resume a settled worker. Fix: clarify description to 'De

🟡 LOW sendDown helper does not cover the 'answer' down-leg (inline duplication) — src/mcp/tools/coordination.ts

Lines 206-208 define sendDown(type:'steer'|'resume', down) which wraps bus.publish({type,down},{queue:false}). The 'answer' down-leg at lines 480-487 duplicates the same queue:false publish inline because the answer event also carries questionId. Two code paths for the same invariant (down-leg = record-only). Not a bug, but a maintenance hazard: if the queue:false contract changes, the answer path can drift. Fix: generalize sendDown to accept the full CoordinationEvent down-leg variant, or extract a publishDown(event, op

🟡 LOW sendDown helper excludes 'answer' type, creating minor pattern drift — src/mcp/tools/coordination.ts

sendDown (line 206) accepts 'steer' | 'resume' but not 'answer'. The answer_question handler publishes its answer event directly via bus.publish (line 479) because the answer event carries an extra questionId field that DownMessageEvent doesn't have. This is deliberate but creates a pattern inconsistency: three down-leg publish sites, one via helper, one raw. If more down-leg types are added, consider widening sendDown or adding a generic down-leg publisher.

🟡 LOW stats().published invariant change is undocumented — src/runtime/supervise/event-bus.ts

Before this change, published === pulled + pending always held (every published event entered the queue). Now record-only events (queue:false) increment published (seq) but never enter the queue, so published > pulled + pending is possible. This is correct by design — the test at line 96-100 validates published: 2 with pending: 1. However, no doc/comment notes this invariant shift. A future consumer that derives expected queue depth from published - pulled would be wrong. Suggest a one-line comment in the stats() JSDoc or PublishOptions doc noting that published includes record-only events not in the queue.


tangletools · 2026-06-16T21:52:23Z · trace

Make the down-leg actually move a live worker (was observable-only). New createInbox
(supervise/inbox.ts) is the receive end an executor exposes as Executor.deliver; the
owned tool-loop (routerToolsInlineExecutor) drains it two ways:
- QUEUED (default): flush at each step boundary AND before the worker may settle — it
  can't finish while a steer/answer it never read is pending.
- FORCEFUL (steer_worker interrupt:true): aborts the in-flight turn so the worker
  re-plans immediately, breaking it off a wrong path mid-task.
Black-box CLI harnesses can't be interrupted mid-step → down-leg degrades to next spawn.

inbox 4 + executor-drains-inbox integration test (flush-before-settle proven end to end
through the real executor); full suite 1008 pass; typecheck/build/lint clean.
tangletools
tangletools previously approved these changes Jun 16, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — cea04b56

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T22:45:26Z

@drewstone

Copy link
Copy Markdown
Contributor Author

Down-leg is now FUNCTIONAL, not just observable (cea04b5)

The prior commits made the down-leg observable; this makes it move a live worker — closing the reviewer's "no executor implements deliver" gap with the reference implementation.

New: createInbox (supervise/inbox.ts) — the receive end an executor exposes as Executor.deliver. The owned tool-loop (routerToolsInlineExecutor) drains it two ways (Drew's two delivery modes):

  • QUEUED (default): flushes at each step boundary and before the worker may settle — a worker can't declare itself done while a steer/answer it never read is still pending.
  • FORCEFUL (steer_worker({interrupt:true})): aborts the in-flight turn so the worker re-plans immediately, breaking it off a wrong path mid-task.

Black-box CLI harnesses can't be interrupted mid-step, so there the down-leg degrades to the next spawn (honest boundary, documented).

Tests: inbox primitive (parse/fold/interrupt/independent-signal) + an executor-level integration test proving flush-before-settle end-to-end through the real executor (a steer delivered mid-turn forces a second turn carrying the folded instruction). Full suite 1008 pass; typecheck/build/lint clean.

This resolves the last open finding from the review — the down-leg is now wired end to end: driver sends → recorded on the bus → worker drains it and acts.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 2 (1 medium-concern, 1 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 92.9s (2 bridge agents)
Total 92.9s

💰 Value — sound

Makes the supervise coordination bus bidirectional: adds a per-worker inbox that receives parent-to-child steer/answer/resume messages (down-leg), records them on the same event bus for unified observability, and drains them into the worker's conversation loop with queued-or-forceful delivery. Fills

  • What it does: Makes the supervise coordination bus truly bidirectional. Adds (1) createInbox() — a reusable per-worker inbox (src/runtime/supervise/inbox.ts) that accepts steer/answer/resume messages from the parent via Scope.send, queues them, and drains them into the worker's conversation at step boundaries (queued flush) or immediately aborts the in-flight turn (forceful interrupt). (2) Wires it into `
  • Goals it achieves: True bidirectional comms over ONE observability spine. Both up-leg (child→parent questions/settlements/findings) and down-leg (parent→child steer/answer/resume) are stamped, ordered, replayable, and visible in a single history() trace. Access patterns stay distinct by design — UP is a fan-in pull-queue the driver drains; DOWN is addressed routing to a specific child's inbox. Before this PR, `ste
  • Assessment: Sound. The Executor.deliver? interface (src/runtime/supervise/types.ts:94), Scope.send routing (src/runtime/supervise/scope.ts:353-360), and completion-gate.ts forwarding (completion-gate.ts:57) were already wired up and waiting for an implementation. This PR fills that gap without restructuring anything. The Inbox abstraction is clean, minimal (85 lines), and reusable — other execut
  • Better / existing approach: none — this is the right approach. Searched for existing deliver implementations across all executors (runtime.ts, sandbox executor, worktree-cli-executor, driver-executor), for inbox or down-leg patterns, and for resume handling. Found: the Executor.deliver? interface was defined (types.ts:94), Scope.send was routing to it (scope.ts:353-360), and completion-gate.ts was forwarding
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

🎯 Usefulness — sound-with-nits

Bidirectional coordination bus with down-leg routing is well-designed, correctly wired end-to-end through coordination tools → Scope.send → inbox → executor loop, with only the sandbox executor not yet adopting deliver — progressive, not broken.

  • Integration: Fully reachable. steer_worker/resume_worker are MCP tools exposed to the driver LLM via coordination.ts:352-398, routed down through Scope.send (scope.ts:353-360) → executor.deliver → inbox.deliver. The answer_question down-leg (coordination.ts:475-497) similarly delivers answers to asking workers. The bench profiles (bench/src/profiles.ts:24-25) document both verbs. The inbox is wired in routerTo
  • Fit with existing patterns: Fits the codebase's grain precisely. The EventBus already had publish/subscribe/pull — adding queue?: boolean to PublishOptions (event-bus.ts:40-46) is a minimal, coherent extension, not a new abstraction. The Inbox module (inbox.ts) follows the same factory pattern as createEventBus. The Executor.deliver? method existed in the interface since types.ts:94 — this PR is its first real implementati
  • Real-world viability: Holds up. deliver() never throws — malformed messages are silently ignored (inbox.ts:40-53). Interrupt vs external abort is correctly discriminated (runtime.ts:337-341: if interruptSig.aborted && !signal.aborted && !controller.signal.aborted → continue, else rethrow). Flush-before-settle prevents workers finishing with unread messages (runtime.ts:357-360). maxTurns=200 bounds any looping. Scope.se
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

🎯 Usefulness Audit

🟠 Only routerToolsInlineExecutor implements deliver — sandbox workers cannot be steered mid-flight [integration] ``

The entire down-leg infrastructure (tools → scope.send → executor.deliver → inbox → loop) is correct and wired for routerToolsInlineExecutor (runtime.ts:297). However sandboxExecutor (runtime.ts:431-493) — the primary production executor that runs agents in actual boxes — does not implement deliver. A driver calling steer_worker/resume_worker on a sandbox worker always gets delivered:false (scope.ts:357: returns false when child.deliver is absent). This is by design (progressive adoption) and th

🟡 AbortSignal.any() requires Node.js ≥20.1.0 — verify target runtime [robustness] ``

runtime.ts:319 uses AbortSignal.any([signal, controller.signal, interruptSig]) which landed in Node.js 20.1.0 (and is still under consideration for broader adoption). If the project targets earlier Node.js or environments without this API, this will fail at runtime. Check the project's engines field and deployment targets; if below 20.1, polyfill or use a manual signal-linking helper like the existing linkSignals() at runtime.ts:980-991.


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260616T224846Z

…sendDown covers answer

PR #318 audit follow-ups (non-blocking):
- resume_worker description no longer implies a park/resume lifecycle the scope model
  lacks — a settled (drained) worker is gone; says so and points to spawning fresh.
- sendDown now covers the 'answer' down-leg too (removes the inline bus.publish
  duplication; one helper for all three down kinds).
- history() docstring lists the down-leg event kinds.

full suite 1008 pass; typecheck/lint clean.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 83da8dfa

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T22:51:04Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 2 (2 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 268.8s (2 bridge agents)
Total 268.8s

💰 Value — sound

Makes the coordination bus bidirectional by recording parent→child messages (steer/answer/resume) on the same EventBus as record-only events, adds a per-worker Inbox for receive-side delivery with queued+forceful modes, and wires the router-tools executor to drain it — a coherent extension of existi

  • What it does: Extends the child→parent EventBus (#316/#317) to carry parent→child messages too, making it bidirectional. Three down-leg event types (steer/answer/resume) are published with queue:false so they appear in history() and reach subscribe observers but never enter the pull queue the parent reads from. A new Inbox primitive on the worker side (src/runtime/supervise/inbox.ts:56) parses incoming
  • Goals it achieves: 1. Single observability spine: both directions of parent↔child communication are stamped, ordered, and replayable in one history() — no separate audit trail needed. 2. Parent→child messages are observable but never pollute the parent's pull queue (the parent must never pull its own outbound message back). 3. Workers can be steered mid-flight with two modes: queued (folded at next step boundary)
  • Assessment: Well-constructed change that extends existing primitives rather than reinventing them. The Executor.deliver method and Scope.send path already existed (types.ts:94, scope.ts:353-360); the PR is wiring observability onto them and adding the worker-side receive abstraction. The queue:false flag on EventBus is a 2-line extension that cleanly separates record-only from queueable events. The `Inb
  • Better / existing approach: none — this is the right approach. The codebase already had the Executor.deliver contract (types.ts:94), the Scope.send routing (scope.ts:353-360), and the EventBus with publish/subscribe/history (event-bus.ts). The PR adds the missing pieces: the worker-side inbox primitive, the queue:false publish option, and the down-leg event recording. No existing mechanism duplicates or should substitu
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

🎯 Usefulness — sound-with-nits

The down-leg is coherently wired — it rides the existing event bus with a clean queue:false seam, the inbox is well-integrated into the only multi-turn executor, and the graceful degradation (Scope.send returns false for non-deliver executors) is by design.

  • Integration: Fully wired: steer_worker/resume_worker/answer_question all call scope.send → LiveChild.deliver (set at spawn, src/runtime/supervise/scope.ts:262) → Inbox.deliver (src/runtime/supervise/inbox.ts:60). The routerToolsInlineExecutor exposes deliver at src/runtime/supervise/runtime.ts:297. One-shot executors (routerInline, bridge) and sandbox/CLI executors deliberately omit deliver — Scope.send return
  • Fit with existing patterns: The down-leg fits naturally: it uses the SAME EventBus as the up-leg (settled/question/finding), with a single PublishOptions.queue:false option (src/runtime/supervise/event-bus.ts:107) that keeps down-leg events in history() and subscriber streams but out of the parent's pull queue. This gives one observability spine for both directions — no competing pattern. The inbox folds down-messages into t
  • Real-world viability: Handles edge cases well: malformed messages silently ignored (inbox.ts:41), settled/unknown workers return delivered:false (scope.ts:357), interrupt signals are per-turn with correct re-throw logic for external aborts (runtime.ts:337-341), settle-before-reading guard prevents a worker finishing with unread steers (runtime.ts:357-361). The worker inbox is single-producer single-consumer — no concur
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

💰 Value Audit

🟡 Only routerToolsInlineExecutor implements deliver — sandbox/CLI/bridge workers cannot receive steers/resumes mid-flight [against-grain] ``

The Inbox is wired only into routerToolsInlineExecutor (runtime.ts:297). The sandbox executor (runtime.ts:431), bridge executor (runtime.ts:721), CLI executor (runtime.ts:585), and cli-worktree executor (runtime.ts:813) do not implement deliver. The architecture already handles this correctly — Scope.send returns false for children without deliver (scope.ts:357) — and the Executor.deliver is documented as optional (types.ts:88-94). This is likely deliberate: sandbox workers run in-

🎯 Usefulness Audit

🟡 RouterToolsInlineExecutor is the only built-in with deliver; steering degrades silently for all other built-in executor backends [ergonomics] ``

The default executor registry (src/runtime/supervise/runtime.ts:886) registers router, inline, sandbox, cli — none have deliver. The routerToolsInlineExecutor (the only multi-turn built-in) isn't in the default registry; it's only reachable via createExecutor({ backend: 'router-tools', ...}) as a BYO executor. Most bench workers use one-shot gateOnDeliverable wrappers (bench/atom-humaneval.mts:136, authoring.ts:105) which also omit deliver. This is by design, not a bug, but a consumer wiring up


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260616T225720Z

@tangletools

Copy link
Copy Markdown
Contributor

❌ Needs Work — 83da8dfa

Readiness 35/100 · Confidence 85/100 · 18 findings (1 high, 3 medium, 14 low)

deepseek glm aggregate
Readiness 35 58 35
Confidence 85 85 85
Correctness 35 58 35
Security 35 58 35
Testing 35 58 35
Architecture 35 58 35

Full multi-shot audit completed 5/5 planned shots over 10 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 10 changed files. Global verifier still owns final merge decision.

Blocking

🔴 HIGH No MCP-tool-level test for steer_worker interrupt:true — src/mcp/tools/coordination.ts

steer_worker handler (line 377-384) now reads a.interrupt === true (line 381) and passes it to scope.send. The inbox and executor layers have interrupt:true tests (tests/loops/inbox.test.ts:33-40), but NO test calls the MCP tool with { workerId: 'w0', instruction: 'x', interrupt: true }. The existing test at tests/loops/coordination.test.ts:115 only exercises the default interrupt: false path. Without a test, an MCP input-schema parsing error on the interrupt boolean or a type coercion bug in the handler could go undete

Other

🟠 MEDIUM Asymmetric return shape in answer_question handler — src/mcp/tools/coordination.ts

The answer_question handler returns { question, delivered } when the answer branch is taken (line 504) but returns only { question } for the defer branch (lines 507-511) and escalate branch (lines 519-524). The test at tests/loops/coordination.test.ts:331 confirms the answer path returns delivered:true, confirming this asymmetry is intentional. However, callers parsing the result must ei

🟠 MEDIUM Stale comment in onEvent subscriber setup — src/mcp/tools/coordination.ts

Lines 135-137 say "The one child→parent pipe" and enumerate "question, settled, finding" — the comment describes only UP events. After this PR, the bus carries ALL event types (steer/answer/resume down-leg events too), and onEvent subscribers (line 141) receive every event including down-leg types. The code is correct — the bus.subscribe fires for all publishes — but the comment misrepresents the pipe now. Also: coordination-mcp.ts:55 JSDoc says "(settled / question / finding)" with the same stale enumeration.

🟠 MEDIUM maxTurns exhaustion drops pending inbox messages — violates the 'flush before settle' invariant — src/runtime/supervise/runtime.ts

The for-loop (lines 313-394) flushes pending messages only at the top of each iteration (line 316) and in the toolCalls==0 settle branch (line 360). There is NO flush after the loop exits via the t < maxTurns condition. Two concrete loss paths: (1) A forceful interrupt on the LAST iteration: deliver() pushes the msg to pending then aborts (inbox.ts:63-65); the catch at runtime.ts:340

🟡 LOW Driver prompt tool-summary line omits observe_worker and analyst verbs — bench/src/profiles.ts

Line 99's compact summary is 'spawn_worker(profile, task) / steer_worker(worker, instruction) / resume_worker(worker, message) / stop.' — it lists spawn/steer/resume/stop but not observe_worker, list_analysts, run_analyst, or define_analyst (those are detailed on lines 93-98). This asymmetry is pre-existing (the prior line also omitted observe_worker) and intentional as a summary, but a driver reading only line 99 could und

🟡 LOW System prompt EACH CYCLE omits resume_worker guidance — bench/src/profiles.ts

Line 113: 'Otherwise STEER the worker with a specific, verifiable instruction.' The TOOLS section (line 99) documents both steer_worker and resume_worker, but the EACH CYCLE workflow only tells the driver to STEER. A driver may never use resume_worker on parked/idle workers since the step-by-step instructions only route through steer. Impact: underutilization of the new verb until the RSI optimizer (gepaDriver) tunes the prompt. Fix: add a parallel clause — e.g. 'Otherwise STEER or RESUME the worker with a specific, verifiable instruction' — or ad

🟡 LOW Throwing onEvent subscriber causes down-leg tool to error after worker already received the message — src/mcp/tools/coordination.ts

In steer_worker (line 382-383), resume_worker (line 405-406), and answer_question (line 496-501), opts.scope.send delivers to the worker BEFORE await sendDown(...). If a subscriber throws inside bus.publish (event-bus.ts:112 — no try/catch around subscriber loop), the tool handler rejects after the worker already received the message. The MCP client sees an error and may retry, ca

🟡 LOW answer_question return shape is asymmetric across branches — src/mcp/tools/coordination.ts

The answer branch (line 504) returns { question, delivered }, but the defer branch (line 507) and escalate branch (line 519) return only { question }. An MCP client consuming the response cannot rely on delivered being present without branching on the input. This is intentional (only answers route down), but the schema doesn't document the conditional field. Impact: minor — a client checking `re

🟡 LOW resume_worker does not expose interrupt despite inbox support — src/mcp/tools/coordination.ts

steer_worker exposes an interrupt boolean (lines 368-373), but resume_worker does not (lines 393-399). The inbox parseDown function (src/runtime/supervise/inbox.ts:40-53) supports interrupt on ALL three message kinds (steer, answer, resume). This may be intentional — resumes are inherently non-forceful — but the inconsistency is undocumented. If a caller ever needs to force-resume a parked worker with an interrupt, resume_worker cannot deliver that.

🟡 LOW steer_worker interrupt:true path untested at coordination layer — src/mcp/tools/coordination.ts

The new interrupt flag on steer_worker (line 381-382) is verified only for the default false case at coordination.test.ts:117 ({ steer: 'do X next', interrupt: false }). No coordination-level test passes interrupt: true and asserts it reaches scope.send as { steer, interrupt: true }. The inbox-level interrupt signal IS tested (inbox.test.ts:33-41), but the one-line forwarding through the MCP tool handler is unverified. A regression that drops the flag (e.g., hardcoding interrupt: false) would pass all current tests. Fix: add a test case calling steer_worker with { workerId: 'w0', instruction: 'stop', interrupt: true } and assert `sent[0]

🟡 LOW AbortSignal.any not available on Node 20.0-20.2 — src/runtime/supervise/runtime.ts

Line 319: AbortSignal.any([signal, controller.signal, interruptSig]) was added to Node.js in 20.3.0 (June 2023). The project's engines field says node >= 20 (package.json:128), which includes 20.0-20.2 where this method would throw TypeError. Most users are on 20.x LTS (20.3+) or 22+, but the engine range technically doesn't guarantee the API. Consider tightening engines.node to >=20.3.0 or adding a runtime guard.

🟡 LOW AbortSignal.any() newly introduced; raises effective Node floor above declared engines — src/runtime/supervise/runtime.ts

Line 319 uses AbortSignal.any([signal, controller.signal, interruptSig]), replacing the prior linkSignals(signal, controller.signal) (still used by routerInlineExecutor:200 and bridgeExecutor:742). AbortSignal.any requires Node >=20.3 (stable) while package.json declares engines.node: >=20. The rest of the file uses the listener-based linkSignals helper for exactly this portability reason. Either bump engines to >=20.3 or compose linkSignals with the interrupt signal for consistency.

🟡 LOW Catch handler discriminates by signal state, not error type — src/runtime/supervise/runtime.ts

Lines 337-341: the try/catch around fetch checks interruptSig.aborted && !signal.aborted && !controller.signal.aborted to decide whether to continue (interrupt) or throw (external abort/network error). This is signal-state-based discrimination rather than error-type-based (e.g. checking e instanceof DOMException && e.name === 'AbortError'). In a tight race where fetch rejects with a non-abort error at the same instant a forceful interrupt fires, the network error is silently swallowed. In practice, the window is vanishingly small: the interrupt must land between fetch dispatch and error rejection. Not a practical concern, but a NameError-styl

🟡 LOW Forceful-interrupt path through the executor is untested — src/runtime/supervise/runtime.ts

inbox.test.ts tests the inbox-level signal abort (lines 33-51) and the queued settle-flush through the executor (lines 66-103), but there is NO executor-level test for the forceful interrupt flow: fetch throws → catch at runtime.ts:337-342 → continue → next iteration flush folds the message. This is the riskiest control-flow in the diff (distinguishing inbox-interrupt from fatal external abort) and it is exercised only by reasoning. Add a test that stubs fetch to reject with an AbortError after an interrupt deliver,

🟡 LOW Interrupted turn inflates iterations in spend (accounting nit) — src/runtime/supervise/runtime.ts

turns += 1 runs at line 314 before the fetch. When a forceful interrupt aborts the fetch, the catch does continue (line 340) without parsing usage, so that turn contributes +1 to spent.iterations (line 397) but +0 tokens. The equal-k / budget assertions key off iterations, so a chatty interrupter can make a worker look like it used more compute than it did. Consider decrementing `turn

🟡 LOW Settle-point flush continues consume maxTurns slots — src/runtime/supervise/runtime.ts

Lines 357-361: when toolCalls.length === 0 and flush() returns true (pending steer/answer messages), the code continues the for-loop. Both t (loop counter) and turns (spend counter) increment, consuming a maxTurns slot for what is a read-and-fold, not an inference call. The maxTurns default is 200 (line 281), so a worker trapped in 'finish → steer arrives → flush → continue' without making tool calls could exhaust maxTurns before producing terminal output. A trivial fix: don't increment turns for the settle-flus

🟡 LOW Executor forceful-interrupt path not covered by integration test — tests/loops/inbox.test.ts

The executor integration test (line 66-103) validates the QUEUED delivery mode (flush-before-settle at runtime.ts:360) but not the FORCEFUL interrupt mode (interrupt:true → AbortSignal aborts fetch → catch at runtime.ts:340 → continue → re-plan). The interrupt primitive is unit-tested at inbox.test.ts:33-51, but the executor-level round-trip (deliver interrupt mid-fetch → fetch throws → loop continues with folded message) has no coverage. This is the more complex async path and the one most likely to break under changes to AbortSignal.any or the catch logic.

🟡 LOW Resume fold format untested in fold unit test — tests/loops/inbox.test.ts

The 'folds queued messages into one operator turn' test (line 23-31) delivers steer + answer and checks their fold substrings, but never delivers a resume message. The resume fold path ('- Continue: ') at inbox.ts:79 is exercised indirectly by the executor integration test but has no direct unit assertion on its fold output. Add inbox.deliver({ resume: 'keep going' }) to the fold test and assert expect(folded).toContain('Continue: keep going').


tangletools · 2026-06-16T23:03:46Z · trace

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ 1 Blocking Finding — 83da8dfa

Full multi-shot audit completed 5/5 planned shots over 10 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 10 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary


tangletools · 2026-06-16T23:03:46Z · immutable trace

Simplify without losing capability:
- MERGE steer_worker + resume_worker → one steer_worker (any live worker; the only
  real axis was interrupt forceful-vs-queued, already a param). 'Resume' = a non-
  interrupt steer. Removes a redundant verb + dissolves the resume-vs-steer prompt nits.
- REMOVE await_next — it was a strict subset of await_event({kinds:['settled']}).
  One wait-verb now; callers/prompts pass kinds:['settled'] for the next finished worker.
- DROP bus.peek() — speculative, only its own test used it (YAGNI).

Down-leg event union + inbox shed the dead 'resume' kind. Full suite 1007 pass;
typecheck/build/lint clean.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 809e53ce

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-16T23:17:38Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 3 (3 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 102.9s (2 bridge agents)
Total 102.9s

💰 Value — sound

Completes the Scope.send→deliver pipe that was designed but inert — adds per-worker inbox receiving, records down-leg on the bus, and consolidates 3 tools into 1 — a clean, in-grain completion of the existing architecture.

  • What it does: Wires the down-leg (parent→child communication) into the existing event bus/Scope.send infrastructure: (1) adds createInbox — a per-worker message buffer the executor loop drains at step boundaries + before settling, supporting queued and forceful (interrupt) delivery; (2) records down-leg events on the bus with queue:false so they're visible in history()/subscribers but never pulled by the pa
  • Goals it achieves: Achieves true bidirectional communication with a single observability spine — both upward (child→parent, queued+pullable) and downward (parent→child, record-only+delivered) events are stamped, ordered, and replayable in one history(). The worker now actually receives and processes steering/answers mid-run, with the guarantee it cannot settle while unread messages are pending. Tool surface shrinks
  • Assessment: The change completes the bus architecture coherently. The Scope.send→deliver pipe was already designed into the codebase (src/runtime/supervise/scope.ts:353-360 at the merge base) but no standard executor implemented deliver — so the down-leg was inert. inbox.ts is the minimal implementation needed to close that loop (83 lines, well-tested). The queue:false publish option is a 3-line additio
  • Better / existing approach: none — this is the right approach. The Scope.send→deliver→inbox chain was the designed path from the start (see docs/canonical-api.md:242 documenting send as the steer primitive, and scope.ts:108-109 storing deliver on spawn). The PR does not introduce a new architecture; it implements the executor-side of a pipe that was already laid out. The queue:false option is additive to the existing E
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

🎯 Usefulness — sound-with-nits

Bidirectional coordination bus completes the pre-existing design: the down-leg is correctly wired Scope.send→inbox→executor-loop, observed via queue:false on the unified event bus, with only the router-tools executor receiving it today (by design).

  • Integration: Reachable through two MCP tools: steer_worker (coordination.ts:360-388) and answer_question (coordination.ts:449-468), both routing through Scope.send (scope.ts:353-360) → child.deliver → Inbox.deliver (inbox.ts:59-64). The worker side drains the inbox at step boundaries and before settling in routerToolsInlineExecutor.execute() (runtime.ts:307-360). Bus recording via sendDown (coordination.ts:206
  • Fit with existing patterns: Fits the codebase grain exactly: the EventBus was always designed to be bidirectional (event-bus.ts docstring lines 15-17 describe the pass-through lane), Executor.deliver was already an optional interface member in supervise/types.ts:94, and Scope.send was already defined in scope.ts:353. This change connects seams that were deliberately held open. The queue:false PublishOption is a minimal, non-
  • Real-world viability: Concurrency: single-threaded event loop, Inbox is a plain array push/splice, safe by construction. Error paths: parseDown ignores malformed messages (inbox.ts:40-53), Scope.send returns false for unknown/settled/deliver-less children (scope.ts:356-358), the caller receives delivered:false — not a crash. The forceful interrupt path in the executor catches inbox-originated aborts and re-plans (runti
  • Model: opencode/deepseek/deepseek-v4-pro
  • Bridge attempts: 1

💰 Value Audit

🟡 history() comment still references 'resume' after removal [maintenance] ``

src/mcp/tools/coordination.ts line ~117: the JSDoc for history() says (steer / answer / resume) but the resume event kind was removed in the final refactor (merged into steer). The actual CoordinationEvent union only has steer and answer. This is a stale comment, not a functional bug — history() shows actual events, not what the comment says.

🎯 Usefulness Audit

🟡 resume_worker tool claimed but not present [ergonomics] ``

PR title says resume_worker is a new tool, but no such tool exists in coordination.ts. The InboxMessage kind is steer|answer (inbox.ts:18-24), never resume. A parked-worker resume IS achievable via steer_worker with a continuation instruction, so the capability exists — but a driver LLM expecting a dedicated resume_worker tool will not find one. Either add the tool (it would be a thin wrapper over steer_worker or a dedicated inbox kind) or correct the title. File: src/mcp/tools/coordination.ts t

🟡 subscriber delivery is sequential, so one slow/throwing subscriber blocks all others [robustness] ``

EventBus.publish iterates subscribers sequentially with await (event-bus.ts:110). The comment says this is by design for ordering (line 108-109). A throwing subscriber will prevent subsequent subscribers from receiving events. In the down-leg path, sendDown calls bus.publish with queue:false — if an onEvent subscriber throws, the MCP handler returning {delivered} could have already sent the message to the child inbox but fail to record it. This is pre-existing event-bus design, not new in this P


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260616T232110Z

@tangletools

Copy link
Copy Markdown
Contributor

❌ Needs Work — 809e53ce

Readiness 29/100 · Confidence 85/100 · 18 findings (1 high, 4 medium, 13 low)

deepseek glm aggregate
Readiness 29 48 29
Confidence 85 85 85
Correctness 29 48 29
Security 29 48 29
Testing 29 48 29
Architecture 29 48 29

Full multi-shot audit completed 5/5 planned shots over 17 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 17 changed files. Global verifier still owns final merge decision.

Blocking

🔴 HIGH AbortSignal.any requires Node >=20.3.0 but engines says >=20 — src/runtime/supervise/runtime.ts

Line 319 uses AbortSignal.any([signal, controller.signal, interruptSig]). This static method was added in Node.js 20.3.0. The package.json engines field says ">=20", which includes 20.0.0–20.2.x where AbortSignal.any does not exist. On those Node versions the executor will throw TypeError: AbortSignal.any is not a function on the first turn. The tsconfig lib: ["ES2022"] also doesn't include this API natively, though @types/node ^25.9.3 may cover it via global augmentation. Fix: tighten engines to ">=20.3.0" or use the existing linkSignals helper (still alive in the file at [line 980](https://github.com/tangle-network/agent-runtime/blob/809

Other

🟠 MEDIUM await_next MCP tool removed — breaking API change — src/mcp/tools/coordination.ts

The await_next tool is fully removed from the tools array (old L337-355). await_event({ kinds: ['settled'] }) provides equivalent behavior, but any driver hard-coded to call await_next will break. Grep confirms 17 references remain in bench/src/mcp-mount-probe.mts:89-91, bench/src/atom-humaneval.mts:146, bench/src/atom-mcp-e2e.mts:178, docs/glossary.md:31,36,64,68, docs/execution-model.md:52,56,116, docs/architecture-visual.md:110, skills/supervise/SKILL.md:18, skills/loop-writer/SKILL.md:113, and CLAUDE.md:65. These stale references are outside this shot but document the breaking surface.

🟠 MEDIUM Forceful interrupt only aborts the inference fetch, not tool execution — docstring overpromises — src/runtime/supervise/runtime.ts

runtime.ts:317-336: turnSignal (which includes interruptSig) is passed ONLY to the fetch call at line 335. Tool execution at lines 375-393 calls await seam.executeToolCall(name, args, task) with NO signal — the RouterToolsSeam.executeToolCall interface (runtime.ts:246) doesn't accept one. So a forceful interrupt arriving during tool execution: (1) fires live.abort() which sets interruptSig.aborted=true, (2) but the fetch already completed so turnSignal's abort has no consumer, (3) the tool runs to completion, (4) the

🟠 MEDIUM deliver inbox only wired for routerToolsInlineExecutor — src/runtime/supervise/runtime.ts

Only routerToolsInlineExecutor (line 297) implements deliver. The sandboxExecutor, bridgeExecutor, cliExecutor, and cliWorktreeExecutor omit it. This means steer_worker/answer_question can only reach tool-using router workers. Scope.send correctly returns false when deliver is absent (scope.ts:357), so non-inbox workers silently ignore the down-leg. This appears deliberate (only multi-turn executors are steerable) but the asymmetry should be documented — a supervisor calling steer_worker on a sandbox worker will see delivered: false with no explanation that the backend doesn't support it.

🟠 MEDIUM Forceful-interrupt executor path (fetch-abort→catch→continue) is untested end-to-end — tests/loops/inbox.test.ts

The 'a worker may not settle while a steer is pending' test (line 57) exercises only the QUEUED path: a non-interrupt steer delivered mid-turn triggers flush-before-settle. The forceful path — deliver({steer,interrupt:true}) mid-fetch → AbortSignal.any fires → fetch throws → catch at runtime.ts:337 checks interruptSig.aborted&&!signal.aborted&&!controller.signal.aborted → continue → next turn flushes the message — has NO integration test. The unit test at line 33 verifies freshInterrupt().aborted becomes true, but nothing verifies the execu

🟡 LOW Documented positional arg name diverges from tool schema — bench/src/profiles.ts

Line 98 documents steer_worker(worker, instruction, interrupt?) but the actual MCP schema in src/mcp/tools/coordination.ts:369 names the first property 'workerId', not 'worker'. The prompt shorthand is consistent with sibling entries (spawn_worker(profile, task), observe_worker(worker)) and was already this way before the PR, so it is not a regression — purely a cosmetic prompt/schema naming mismatch. No action required for this PR; flagging only for awareness if prompt/schema parity is ever asserted.

🟡 LOW Integration regression: await_next removed but bench/* prompts still reference it — src/mcp/tools/coordination.ts

This file removed the await_next tool (folded into await_event). However bench/src/mcp-mount-probe.mts:89-91, bench/src/atom-humaneval.mts:146, and bench/src/atom-mcp-e2e.mts:178 still inject LLM system/user prompts instructing the model to call 'await_next' — a tool that no longer exists in the served MCP surface. Those benchmarks will fail or produce no-ops because the model calls a nonexistent tool. The bench files are unchanged by this PR (confirmed via git diff --stat), so this is a regression introduced by coordination.ts. Fix is in bench/* (outside this shot) but the root cause is here. Flag for the global verifier to assign to the bench shot.

🟡 LOW Stale 'resume' in history() docstring — no resume event kind exists — src/mcp/tools/coordination.ts

Line 116: the history() JSDoc says 'DOWN (steer / answer / resume)' but the CoordinationEvent union (lines 85-90) defines only 'steer' and 'answer' as down-leg kinds — there is no 'resume' member. This is a leftover from the earlier resume_worker iteration (commit b6eaf6d) that was removed in the 12→10 tool unification (commit 809e53c). A reader inspecting history() output would expect a 'resume' type that never appears. Fix: drop '/ resume' from [line 116](https://github.com/tangle-network/agent-runtime/blob/809e53ceb08d4a008c

🟡 LOW answer_question return shape is inconsistent across answer/defer/escalate paths — src/mcp/tools/coordination.ts

The answer path (line 468) returns {question, delivered} but the defer path (line 471) and escalate path (line 483) return {question} with no delivered field. A consumer destructuring delivered gets undefined for defer/escalate. This is an experimental API so not breaking, but the inconsistency is a latent footgun — defer/escalate don't deliver to a worker, so returning delivered:false (or omittin

🟡 LOW sendDown answer path has dead questionId fallback that silently masks a bug — src/mcp/tools/coordination.ts

Line 212: questionId: questionId ?? ''. The sole caller (answer_question handler, line 461-465) always passes questionId explicitly, so the ?? '' fallback is dead. If a future caller forgot questionId, the recorded 'answer' event would carry questionId:'' silently rather than failing — corrupting the audit trail with an untraceable answer. Prefer making questionId required for type==='answer' (e.g., a typed overload or a runtime assert) instead of defaulting to empty string.

🟡 LOW peek removal from EventBus is a breaking API change — src/runtime/supervise/event-bus.ts

The peek() method was removed from the EventBus<E> interface (was at line 61-62 in the old commit). No internal callers exist (verified via grep of src/ and tests/), and the module is tagged @experimental, so the blast radius is small. However, any external consumer calling .peek() on a bus instance will get a runtime TypeError. Consider keeping peek with a @deprecated tag for one release cycle, or documenting the removal in the PR/release notes.

🟡 LOW answer_question down-message always queued (no interrupt path) — may stall a blocked worker — src/runtime/supervise/inbox.ts

inbox.ts parseDown (lines 40-53) supports interrupt:true on both steer and answer kinds. But coordination.ts answer_question handler calls scope.send(question.from, { answer, questionId }) WITHOUT setting interrupt:true (verified in the coordination.ts diff, not in this shot). A worker that asked a blocking question may be parked in a long inference call waiting for the answer; without interrupt, the answer only flushes at the next step boundary, which may not arrive until the current (potentially wrong-direction) inference completes. For a truly blocking question the answer should arguably be forceful (interrupt:true) to unpark the worker immediately. I

🟡 LOW test coverage: no test for concurrent inbox drain + deliver during settle flush — src/runtime/supervise/inbox.ts

The inbox tests (tests/loops/inbox.test.ts) cover parsing, folding, interrupt signaling, and the router-tools executor drain loop. However, no test exercises a deliver arriving AFTER drain() is called but before the model response resolves — the exact race that the pre-settle flush (runtime.ts:360) is designed to prevent. While JS's single-threaded event loop makes this safe in practice, a test with a controlled deliver-during-settle window would harden the invariant that a worker never settles with unread messages.

🟡 LOW Interrupted turns inflate the iteration count against maxTurns — src/runtime/supervise/runtime.ts

runtime.ts:313-314: turns += 1 executes before the fetch at line 322. When a forceful interrupt aborts the fetch (catch at line 337, continue at line 340), the turn counter has already incremented but no inference completed. Over many interrupts, a supervisor can burn through the worker's maxTurns budget (default 200) without the worker completing useful work — the worker hits the maxTurns

🟡 LOW No test for answer_question delivered:false when worker settles after asking — tests/loops/coordination.test.ts

The 'answer_question routes the answer down to a LIVE worker' test (line 306) covers delivered:true (live w0). The failClosed test covers delivered:false for a non-existent 'driver-1'. But the race — worker w0 asks a question, then settles before the driver answers — is untested. scope.send returns false when child.delivered is true (scope.ts:357). A regression where answer_question reported delivered:true for a settled worker (e.g., checking the wrong flag) would not be caught. This is a minor gap because the scope.send contract is the guard, but the coordination layer's handling of this race (answer recorded in trail but delivered:false) deserves a direct

🟡 LOW answer_question return value unasserted in failClosed test — tests/loops/coordination.test.ts

Line 183: answer_question now returns { question, delivered } but the failClosed test only awaits it without asserting the return shape. The delivered: false path (mock scope rejects 'driver-1') is observable in the emitted trail but the tool return is unchecked. The new dedicated test at line 306 covers the delivered: true path. Minor gap — could add expect(result.delivered).toBe(false) at [line 183](https://github.com/tangle-network/agent-runtime/blob/809e53ceb08d4a008c8372ae7b1688dee0996181/tests/loops/coordinat

🟡 LOW await_event idle semantics with mismatched kinds filter not tested — tests/loops/coordination.test.ts

Line 392-417: The 'await_event with kinds filter' test verifies { idle: true } when the matched-kind queue is empty AND no settlements drain. It does NOT test the case where a settlement IS drained but the kinds filter excludes 'settled' — the handler at coordination.ts:422 returns { idle: !drained } which would be { idle: false } despite nothing pullable of the requested kind. This could drive a busy loop in a caller with a non-settled filter. A test asserting this edge case would improve coverage.

🟡 LOW Fold-format assertions use toContain, not exact match — tests/loops/inbox.test.ts

Lines 31-33 assert the folded operator turn with three separate toContain checks ('[SUPERVISOR]', 'New instruction from your supervisor: switch to recursion', 'Answer to your question (q7): v2'). A malformed fold that interleaves garbage between these substrings would still pass. Minor, but since fold() is the worker's only view of out-of-band messages, an exact toEqual on the full string would be more robust against format regressions.


tangletools · 2026-06-16T23:32:37Z · trace

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ 1 Blocking Finding — 809e53ce

Full multi-shot audit completed 5/5 planned shots over 17 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 17 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary


tangletools · 2026-06-16T23:32:37Z · immutable trace

…gent-eval kernel)

createDetectorMonitor (supervise/detector-monitor.ts) — the online analyst on the live
worker pipe. Folds each tool step through agent-eval 0.93.0's published streaming kernel
(repeatedActionDetector/errorStreakDetector — the SAME kernel control-runtime folds; no
detection logic reimplemented) and fires onSignal → a finding on the bus the moment a
worker loops or error-storms. routerToolsInlineExecutor feeds it via a new onToolStep seam.

Bumps agent-eval ^0.93.0. monitor tests (4); full suite 1011 pass; typecheck/build/lint clean.
tangletools
tangletools previously approved these changes Jun 17, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 6f8b82d0

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-17T00:25:42Z

Last mile: createCoordinationTools.raiseFinding (exposed on the MCP handle) — the seam
an ONLINE detector uses to publish a finding on the live bus mid-run. Proven end-to-end:
a stuck-loop on the worker pipe → monitor → raiseFinding → await_event surfaces it.

Review fixes (audit on the earlier commit):
- HIGH: AbortSignal.any (needs Node 20.3, floor is 20) → portable mergeAbortSignals.
- forceful interrupt: docstring no longer overpromises (aborts in-flight inference, a
  tool mid-exec finishes first); interrupted turns no longer count toward maxTurns;
  added the e2e test (forceful steer aborts the turn, re-plans, aborted turn is free).
- answer to a BLOCKING question is now delivered forcefully (interrupt) to unpark the
  worker immediately, not at its next boundary.
- sendDown 'answer' now REQUIRES questionId (overload; no silent ?? '' mask).
- tool-step status captured (error vs ok) for the error-streak detector.
- stale await_next purged from bench prompts + docs; history() docstring drops 'resume'.
- added tests: answer delivered:false + return asserted; await_event idle-on-mismatch.

full suite 1014 pass; typecheck/build/lint clean.
tangletools
tangletools previously approved these changes Jun 17, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — d92fe1a6

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-17T00:36:33Z

@drewstone

Copy link
Copy Markdown
Contributor Author

Final-review pass (d92fe1a) — addressed every relevant finding from the audit + finished the last mile.

Last mile: raiseFinding wires the online detector to the live bus — proven end-to-end (stuck-loop on the worker pipe → monitor → raiseFindingawait_event surfaces the finding mid-run).

Resolved:

  • 🔴 AbortSignal.any (needs Node 20.3, floor is 20) → portable mergeAbortSignals.
  • 🟠 interrupt docstring softened to the truth (aborts in-flight inference; a tool mid-exec finishes; owned-loop only); interrupted turns no longer count toward maxTurns; forceful-interrupt path now tested e2e.
  • 🟠 deliver router-tools-only → by design (the owned loop), documented.
  • 🟡 blocking-question answers now forceful (unpark immediately); sendDown requires questionId (no ?? '' mask); tool-step error status captured; stale await_next/resume purged from bench prompts + docs; added answer-delivered:false + await_event idle-on-mismatch tests.

Intentional (this @experimental surface): await_next/peek removed in the 12→10 tool simplification.

Full suite 1014 pass; typecheck/build/lint clean; merges cleanly into main; depends on agent-eval 0.93.0 (published).

…es agent-eval)

createTrajectoryRecorder (supervise/trajectory-recorder.ts) — the post-hoc half of the
analyst pipe. Replays a worker's captured tool steps as agent-eval spans (InMemoryTraceStore)
and runs its PUBLISHED batch analyzers — buildTrajectory (structured run summary),
stuckLoopView (full-run repeated-call view, complementing the online consecutive detector),
toolWasteView. No analysis reimplemented; the thin bridge from live tool steps to the
substrate trace model. Feeds from the same onToolStep seam as the online monitor.

3 recorder tests (real spans → real agent-eval findings); full suite 1017 pass;
typecheck/build/lint clean. Closes both legs: online (mid-run) + settle (post-hoc).
tangletools
tangletools previously approved these changes Jun 17, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 27dd2cec

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-17T00:44:16Z

@drewstone

Copy link
Copy Markdown
Contributor Author

Last mile #2 done (27dd2ce) — the settle-time half is wired, completing the analyst pipe both ways.

createTrajectoryRecorder replays a worker's captured tool steps as agent-eval spans and runs its published batch analyzersbuildTrajectory (structured run summary), stuckLoopView (full-run repeated-call view, catching loops the online consecutive detector interleaves past), toolWaseView. No analysis reimplemented — it's the thin bridge from live tool steps into the substrate's trace model, fed from the same onToolStep seam as the online monitor.

So the pipe is now complete:

  • ONLINE (mid-run): createDetectorMonitor → raises a finding the moment a worker loops/error-storms.
  • SETTLE (post-hoc): createTrajectoryRecorder → the full-trajectory analysis at finish.

Both reuse agent-eval 0.93.0's published analyzers; both feed from onToolStep. 3 new recorder tests (real spans → real agent-eval findings). Full suite 1017 pass; typecheck/build/lint clean; merges cleanly into main.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Value Audit — sound

Verdict sound
Concerns 1 (1 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 328.9s (2 bridge agents)
Total 328.9s

💰 Value — sound

Makes the coordination bus truly bidirectional by adding the down-leg (parent→child steer/answer/resume) on the same EventBus via a queue:false record-only flag, plus two analysis adapters that reuse agent-eval — clean architecture, no duplication, no better alternative found.

  • What it does: The PR completes the bidirectional coordination surface for the supervise system. Previously (#316/#317) the EventBus was one-directional: child→parent events (settled/question/finding) queued for the driver to pull. This PR adds: (1) PublishOptions.queue on EventBus — default true (queued/pullable), false means record-only (lands in history() + reaches subscribe observers but never ente
  • Goals it achieves: ONE observability spine for both directions. Before this PR, UP messages (child→parent) were stamped, ordered, and replayable in history(), but DOWN messages (parent→child) were fire-and-forget — delivered via scope.send with no audit trail. Now both directions share the same EventBus.history() record, so a single history() call shows the full ordered conversation in both directions. Secon
  • Assessment: Coherent and well-architected, built firmly in the grain of the codebase. The queue:false design is elegant: a single boolean on PublishOptions distinguishes the two access patterns (UP = pull-queue, DOWN = addressed routing) without splitting into two buses, preserving unified history() ordering. The sendDown overload that REQUIRES questionId for answer events (no silent ?? '' fallb
  • Better / existing approach: none — this is the right approach. The alternative (two separate buses, one per direction) would lose the unified history() ordering trail and the single subscriber model. The single-bus-with-queue-flag design is simpler and gives both directions one stamp/seq/timestamp spine. I searched for existing parent→child delivery mechanisms (scope.send, Executor.deliver, RootSignal resume/cancel
  • Model: opencode/zai-coding-plan/glm-5.1
  • Bridge attempts: 2
  • Bridge warning: opencode/deepseek/deepseek-v4-pro: Bridge returned 503: {"error":{"message":"cli-bridge admission timed out after 30000ms","type":"admission_rejected","reason":"queue_timeout","admission":{"active":24,"queued":0,"maxActive":24,"maxQueue":32}}}

🎯 Usefulness — error

usefulness agent produced no parseable value-audit JSON.

  • Model: kimi-code/kimi-for-coding
  • Bridge attempts: 3
  • Bridge error: opencode/deepseek/deepseek-v4-pro: Bridge returned 503: {"error":{"message":"cli-bridge admission timed out after 30000ms","type":"admission_rejected","reason":"queue_timeout","admission":{"active":24,"queued":1,"maxActive":24,"maxQueue":32}}}; opencode/zai-coding-plan/glm-5.1: Bridge returned 503: {"error":{"message":"cli-bridge admission timed out after 30000ms","type":"admission_rejected","reas

💰 Value Audit

🟡 Analysis modules exported but not wired into any default path [maintenance] ``

createDetectorMonitor and createTrajectoryRecorder are exported from runtime/index.ts and have full test coverage, but neither is constructed or connected inside runtime.ts or coordination.ts. The onToolStep seam exists (runtime.ts:249) and raiseFinding is exposed on CoordinationMcpHandle, so a consumer CAN wire them — but there is no default wiring, meaning these capabilities are inert unless someone explicitly assembles them. This follows the codebase's established 'export pr


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260617T005843Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — 27dd2cec

Readiness 55/100 · Confidence 95/100 · 14 findings (3 medium, 11 low)

deepseek glm aggregate
Readiness 63 55 55
Confidence 95 95 95
Correctness 63 55 55
Security 63 55 55
Testing 63 55 55
Architecture 63 55 55

Full multi-shot audit completed 8/8 planned shots over 29 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 29 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM Comment claims interrupted turns don't count toward maxTurns, but for-loop counter still increments — src/runtime/supervise/runtime.ts

Lines 344-347: The continue after an interrupt skips turns += 1 (spend tracking), but the for-loop for (let t = 0; t < maxTurns; t += 1) unconditionally increments t. So each interrupted turn DOES consume a maxTurns budget slot. An attacker who can repeatedly send interrupt messages could DOS the worker by consuming all maxTurns slots without the worker making progress. Either decrement t before continue, switch to a while loop, or update the comment to reflect the actual behavior.

🟠 MEDIUM Per-turn mergeAbortSignals leaks listeners on long-lived signals — src/runtime/supervise/runtime.ts

runtime.ts:325 calls mergeAbortSignals(signal, controller.signal, interruptSig) INSIDE the per-turn for-loop, whereas the old code called linkSignals(signal, controller.signal) once before the loop (runtime.ts:1015-1026). Each mergeAbortSignals call registers { once: true } abort listeners on signal and controller.signal via s.addEventListener('abort', onAbort, { once: true }). In normal operation (no interrupt, no teardown), these listeners never fire, so they are never removed. Over N turns, N closures accumulate on each long-lived signal. For default maxTurns=200 that's 200 leaked closures on controller.signal alone; for a long workflow with maxTurns=10000 it's 10000. They DO self-clean when controller.abort() fires on teardown (each fires once and self-removes), but t

🟠 MEDIUM mergeAbortSignals called per-turn accumulates AbortSignal listeners on parent signals — src/runtime/supervise/runtime.ts

Line 325: mergeAbortSignals(signal, controller.signal, interruptSig) is called inside every for-loop iteration. Each call registers new { once: true } listeners on signal and controller.signal. The pre-PR code called linkSignals(signal, controller.signal) ONCE before the loop (line 200 in old code). With default maxTurns=200, this adds up to 400 cumulative listeners on the parent signals, only cleaned up at teardown when the signals fire. Fix: pre-link signal and controller.signal once before the loop, then per

🟡 LOW Paragraph density — single ~1000-word line — CLAUDE.md

The agent-driver bullet is now one continuous line covering bidirectional bus, inbox, executor deliver, interrupt, online detector, settle recorder, and the existing substrate notes. Pre-existing style, but this change nearly doubles its length. Impact: future maintainers will have trouble locating a specific claim. Fix: split into 2-3 logical lines (bus/down-leg | online+settle analysts | substrate pointers) — optional, cosmetic.

🟡 LOW Redundant restatement of the single-wait-verb description — CLAUDE.md

The phrase describing 'await_event({kinds?}) — the ONE wait verb; kinds:[settled] = next finished worker, omit = also questions/findings' appears twice within the same paragraph (once near the top in the bus description, once again mid-paragraph after the interrupt sentence). The second occurrence ('The driver waits on ONE verb — await_event({kinds?})...') adds nothing new. Impact: readability only. Fix: drop the second occurrence or merge the interrupt context into the first.

🟡 LOW Tool-name/prompt coupling has no compile-time or test guard — bench/src/atom-humaneval.mts

The driverSystem prompt at line 146 hardcodes the tool name await_event as a string. The coordination driver (coordination-driver.ts:165-168) dispatches by byName.get(tc.name) — a mismatch yields a silent unknown tool error at runtime, not a type error at build. This PR fixed the mismatch (await_next→await_event) but nothing prevents it from recurring if the canonical tool name changes again. Consider deriving tool names from a single const (e.g. COORDINATION_TOOL.AWAIT_EVENT) shared between coordination.ts and the bench prompts, or adding a smoke test that asserts every tool name mentioned in a prompt exists in createCoordinationTools(...).tools. Low

🟡 LOW DownMessageEvent.instruction is semantically misnamed for the answer kind — src/mcp/tools/coordination.ts

coordination.ts:76-80 + :476 — DownMessageEvent.instruction carries the raw answer text for answer events (instruction: answer). A reader auditing history() sees {toWorker, instruction, delivered} with no signal that the payload is an answer to a specific question rather than a steer. The questionId lives only on the outer answer event wrapper (:90), not on the DownMessageEvent. Cosmetic/readability only; behavior is correct. Optional: rename to payload or add a discriminating field on DownMessageEvent.

🟡 LOW projectEvent defensive branch drops questionId for answer events — src/mcp/tools/coordination.ts

coordination.ts:245 — the fallback return { type: ev.type, ...ev.down } projects a steer/answer event without its questionId. The branch is effectively unreachable in normal flow because sendDown publishes with {queue:false} (event-bus.ts:105), so steer/answer never enter the pull queue and pull() can never return them; the comment at :243-244 acknowledges this. Impact: none in practice. If a future caller ever routes a down-event through the queue (e.g. a replayer), the audit projection would lose the questionId linkage. Fix when touching this: if (ev.type === 'answer') return { type: 'answer', ...ev.down, questionId: ev.questionId } before the fallback.

🟡 LOW Adversarial interrupt spam can exhaust maxTurns with zero useful work — src/runtime/supervise/runtime.ts

runtime.ts:347: when an interrupt aborts the fetch, the catch does continue, which consumes a for-loop iteration (t += 1) but does NOT increment turns. A driver (or bug) that sends forceful interrupts in a tight loop causes every fetch to abort-and-continue, burning all maxTurns iterations with turns === 0 and producing no output. The loop IS bounded (terminates after maxTurns) and billing is not inflated, so this is acceptable degradation rather than a hang. But the worker silently settles empty with no error signal. Consider a dedicated interrupt-count backstop (e.g., if consecutive interrupts exceed a threshold, surface a finding or throw) so the failure mode is observable.

🟡 LOW Non-AbortError failures coinciding with an interrupt signal could be silently swallowed — src/runtime/supervise/runtime.ts

Line 347: The catch block only checks interruptSig.aborted && !signal.aborted && !controller.signal.aborted. If a non-abort fetch error (e.g., network timeout, DNS failure) occurs at the same moment an interrupt signal fires, the error is silently swallowed via continue because interruptSig.aborted is true. The worker re-plans and retries, which is correct for the interrupt case, but a real network fault would be lost and retried verbatim. Consider also checking e instanceof DOMException && e.name === 'AbortError' or inspecting the error type to distinguish genuine aborts from coincident network failures.

🟡 LOW onToolStep exceptions crash the worker loop — src/runtime/supervise/runtime.ts

runtime.ts:412 calls seam.onToolStep?.({ toolName, args, status }) synchronously with no try/catch inside the tool-call inner loop (lines 384-413). If the handler throws — e.g., a createDetectorMonitor().observeToolStep whose argHash(step.args) chokes on a circular or non-serializable args object — the exception propagates through the for-loop, aborting the entire worker execute(). A monitoring/observability side-channel should not be able to crash the production inference loop. The detector-monitor's observeToolStep (detector-monitor.ts:53-61) also lacks an internal try/catch around observeAll/argHash. Fix: wrap the `seam.onToolStep?.(...)

🟡 LOW Trajectory recorder records zero-duration spans (startedAt === endedAt) — src/runtime/supervise/trajectory-recorder.ts

trajectory-recorder.ts:68-69 sets both startedAt: s.at and endedAt: s.at to the same timestamp (the push time, captured AFTER the tool call resolved at runtime.ts:412). Individual span durations are always 0. The docstring (line 31) promises buildTrajectory yields 'total duration' as a meaningful metric, but per-span duration is lost. This does not affect loop/waste detection (those key on call patterns and cross-span windows, verified by the test asserting windowMs > 0 using the fake clock's per-step increment). Fix: capture start/end timestamps at the call site and pass them through RecordedToolStep, or have the recorder accept an opt

🟡 LOW Stale await_next references in skill docs (outside shot scope, FYI only) — tests/loops/coordination.test.ts

Two skill markdown files (skills/supervise/SKILL.md:18, skills/loop-writer/SKILL.md:113) still reference the old tool name 'await_next' after the rename to 'await_event'. These are NOT in this shot's changed files so are not blocking, but they will mislead any agent that reads them. Flag for a follow-up doc-sync pass. No code impact — all test and source files are consistently renamed.

🟡 LOW inbox executor test relies on synchronous fetch mock timing for interrupt delivery — tests/loops/inbox.test.ts

The 'FORCEFUL steer aborts' test (L101-146) calls deliver() synchronously inside the fetch mock before throwing AbortError. This works because the mock is synchronous up to the throw, but a real fetch abort is asynchronous — the abort signal fires, then fetch rejects on a later microtask. The test correctly validates the catch-block logic (runtime.ts:347: interruptSig.aborted && !signal.aborted && !controller.signal.aborted → continue), but the synchronous delivery masks any race between signal-abortion and fetch-rejection timing. Not a defect in the test's correctness for the current implementation, just a fragility note: if the executor's abort handling becomes more timing-sensitive, this test may need async delivery.


tangletools · 2026-06-17T01:12:05Z · trace

…, comment accuracy)

- mergeAbortSignals listener leak: pre-link external signals ONCE; per-turn add+remove the
  listener (no accumulation on long-lived signals over maxTurns).
- interrupt catch now requires a real AbortError (DOMException) — a network fault coincident
  with an interrupt is no longer swallowed; rethrown.
- corrected the comment: an interrupted+re-planned turn DOES consume a maxTurns slot (bounded
  backstop, not a hang) — it just doesn't bill a turn.
- onToolStep is an observability side-channel: wrapped so a throwing monitor can't crash the
  worker loop; detector-monitor.observeToolStep also defends argHash on circular/unhashable args.
- projectEvent preserves questionId on the answer branch.
- stale await_next purged from skills/{supervise,loop-writer}; trimmed CLAUDE.md redundancy;
  softened the recorder's per-span-duration claim.

full suite 1018 pass; typecheck/build/lint clean.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 13cecb5f

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-17T09:25:21Z

@drewstone

Copy link
Copy Markdown
Contributor Author

Addressed the audit (13cecb5). It was a valid No-Blockers pass; triaged each:

Fixed (real):

  • 🟠 mergeAbortSignals listener leak → external signals merged ONCE before the loop; per-turn listener added + removed (no accumulation across maxTurns).
  • 🟠 maxTurns comment was wrong → corrected: an interrupted+re-planned turn DOES consume a loop slot (bounded backstop, not a hang); it just doesn't bill a turn.
  • 🟡 interrupt catch now requires a real AbortError — a network fault coincident with an interrupt is rethrown, not swallowed.
  • 🟡 onToolStep wrapped (an observability side-channel can't crash the worker loop); detector-monitor also defends argHash on circular args (+ test).
  • 🟡 projectEvent preserves questionId on the answer branch.
  • 🟡 stale await_next purged from skills/{supervise,loop-writer}; trimmed the duplicated CLAUDE.md phrase; softened the recorder's per-span-duration note.

Opted out (cosmetic, with reason): DownMessageEvent.instruction naming + code-map density (style); the sync-fetch test-timing note (not a defect); a tool-name/prompt const guard (more machinery than the risk warrants).

Full suite 1018 pass; typecheck/build/lint clean; merges cleanly.

@drewstone drewstone merged commit 252315a into main Jun 17, 2026
1 check failed
drewstone added a commit that referenced this pull request Jun 17, 2026
…ent) — main was red (#322)

#318 renamed the coordination wait verb await_next → await_event; #319's metering test
(merged around the same time) still scripted await_next. Since the tool no longer exists,
the scripted driver's 'collect the worker' turn returned {error:'unknown tool: await_next'}
and the worker was never drained → result 'no-winner' → the 2 winner-expecting tests failed
on main. Updated all 6 refs to await_event (the driver already fails loud on the unknown
tool — only the fixed script couldn't recover, unlike a real LLM).

Full suite 1030 pass (was 2 failing on main); no stale await_next remains anywhere.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants