feat(loops): leak-free steering drivers (naive, dumb)#372
Conversation
Add two non-LLM steering Drivers to the driven-loop set as the leak-free controls for the refine reference driver. They differ only in how much of the prior verdict plan() reads: - naiveDriver: reads nothing from the verdict; issues a fixed continuation. - dumbDriver: reads ONLY verdict.valid; issues onPass/onFail. Never touches notes/scores — that boundary is the firewall, enforced by a tripwire test. The dumb->refine gap isolates how much the grader's findings inflate a result over a bare pass/fail bit. Continuation strings are parameters and the Task shape is opaque (caller supplies applyContinuation), so the builders carry zero domain coupling. They plug into runLoop unchanged; no interface addition needed since Driver.plan already receives history with verdicts.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — b6fbf3ce
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T12:34:18Z
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 3670434d
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T12:39:55Z
tangletools
left a comment
There was a problem hiding this comment.
🟡 Value Audit — sound-with-nits
| Verdict | sound-with-nits |
| Concerns | 2 (2 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 196.7s (2 bridge agents) |
| Total | 196.7s |
💰 Value — sound-with-nits
Adds two minimal, non-LLM Driver controls (naive fixed-continuation and dumb pass/fail-only) to the runLoop primitive so benchmarks can isolate how much grader findings inflate multi-shot results; clean, grain-fitting addition with a small structural duplication nit.
- What it does: Introduces
naiveDriveranddumbDriverinsrc/runtime/steering-drivers.tsand re-exports them fromsrc/runtime/index.ts:236-243. Both implement the existingDriver<Task, Output, SteeringDecision>interface used byrunLoop(src/runtime/types.ts:219).naiveDriverruns the original task at shot 0 and then issues the same caller-suppliedcontinuationstring every round, ignoring the v - Goals it achieves: Provides leak-free experimental controls for multi-round evals: by comparing
naive→dumb→refine, a caller can attribute loop improvement to (1) the bare pass/fail bit, or (2) the grader's findings/notes, instead of assuming a coached loop is better for free. It keeps the loop kernel unchanged and domain-agnostic by making continuation strings andapplyContinuationcaller parameters, not - Assessment: The change is coherent and fits the codebase's grain. It extends the existing
Driver/runLoopprimitive without touching the kernel or adding benchmark coupling. The tests enforce the leak-free firewall with a getter trap (src/runtime/steering-drivers.test.ts:81-93) and exercise stop/cap behavior. It mirrors the reference refine driver and is a worthwhile, low-risk addition. - Better / existing approach: No materially better approach or existing equivalent was found. The only nearby primitive is
singleShotDriverinsrc/runtime/supervise/runtime.ts:1272, which merely repeats the same task up to a cap and performs no verdict-aware steering, so it does not serve the same control purpose. A single genericfixedContinuationDriverwithnaiveanddumbas thin presets could remove duplicated `pl - Model: opencode/kimi-for-coding/k2p7
- Bridge attempts: 1
🎯 Usefulness — sound-with-nits
Two clean leak-free steering controls built squarely in the grain of the existing Driver interface; the one nit is a required option (onPass) that is provably unreachable in any conformant loop.
- Integration: Reachable and correctly wired. Both builders return a
Driver<Task, Output, SteeringDecision>consumed byrunLoop({ driver })(run-loop.ts:73, :229, :331); exported as public package API from src/runtime/index.ts:236-243. They conform to the contract exactly —plan/decide/describePlanall present, andSteeringDecisionvalues map correctly ontoisTerminalDecision(run-loop.ts:1131: 'pi - Fit with existing patterns: Strong. The
plan/decide/describePlanshape is a near-verbatim lift of the reference refine driver (examples/driver-loop/driver-loop.ts:125-168), including the identicaldecidesemantics (history.some(valid) → pick-winner; else refine-until-cap → fail).describePlanreturnskind: 'refine', matching the kernel's count-based inference for a single planned task (run-loop.ts:247). The gener - Real-world viability: Robust on the edges that matter: missing/undefined verdict is treated as not-valid (total, never throws) in both plan and decide; the shot cap is enforced in both paths; concurrency/abort are kernel-owned and untouched. The one structural wrinkle:
dumbDriver'sonPassbranch is dead at runtime. Per the kernel's round ordering (plan at :229 → workers → decide at :331 → terminate-on-terminal at : - Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 1
💰 Value Audit
🟡 The two driver bodies duplicate the same scaffold [duplication] ``
naiveDriver(src/runtime/steering-drivers.ts:108-132) anddumbDriver(src/runtime/steering-drivers.ts:167-191) repeat the same history-length checks, cap handling,decideUntilValidOrCappedwiring, anddescribePlanreturn. This could be collapsed into one internal builder parameterized by a(lastVerdict?) => stringcontinuation selector, withnaiveDriveranddumbDriveras thin presets. That would make the firewall boundary — what is allowed to be read from the verdict — live in
🎯 Usefulness Audit
🟡 dumbDriver's required onPass is unreachable; naive and dumb collapse in stop-on-pass [ergonomics] ``
Because
decide()(steering-drivers.ts:76) returns terminalpick-winneras soon as any shot is valid, and the kernel callsdecide()after workers and terminates before the nextplan()(run-loop.ts:331-348),plan()is only ever called when no prior iteration passed. SodumbDriver.plan'spassedis always false: theif (passed) return []early-return (line 177) and thepassed ? onPass : onFailselect (line 181) are dead, meaning the requiredonPassoption can never be exercised,
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| opencode-kimi | glm | deepseek | aggregate | |
|---|---|---|---|---|
| Readiness | 79 | 76 | 79 | 76 |
| Confidence | 65 | 65 | 65 | 65 |
| Correctness | 79 | 76 | 79 | 76 |
| Security | 79 | 76 | 79 | 76 |
| Testing | 79 | 76 | 79 | 76 |
| Architecture | 79 | 76 | 79 | 76 |
Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision.
🟠 MEDIUM describePlan() always reports 'refine' even when plan() returns [] (stop) — src/runtime/steering-drivers.ts
naiveDriver (lines 128-130) and dumbDriver (lines 187-189) return
{ kind: 'refine', ... }fromdescribePlan()unconditionally. However bothplan()implementations return[]when the last shot is valid (lines 120, 177) or the cap is reached ([lines 122](https://github.com/tangle-network/agent-runtime/blob/3670434db6d21b8ce16850a66e62ab1e0b622f93/src/runtime/steering-drivers.ts#L
🟠 MEDIUM naiveDriver docstring claim contradicts code — src/runtime/steering-drivers.ts
The file-level docstring at lines 17-19 states naiveDriver 'reads NOTHING from the verdict', but both plan() (line 120:
last?.verdict?.valid) and decide() (line 76 via decideUntilValidOrCapped:it.verdict?.valid) read verdict.valid for the stop gate and terminal decision. The inline comment at [lines 117-119](https://github.com/tangle-network/agent-runtime/blob/3670434db6d21b8ce16850a66e62
🟠 MEDIUM naive→dumb experimental gap is structurally zero; onPass is dead code — src/runtime/steering-drivers.ts
The PR's stated axis (header JSDoc lines 12-14): 'The naive → dumb gap isolates the value of the pass/fail bit alone.' It cannot. Both drivers share decideUntilValidOrCapped(), which returns terminal 'pick-winner' on ANY valid iteration (run-loop.ts:1131-1133 treats 'pick-winner' as terminal). So plan() at round N+1 is only reached when round N was non-terminal, i.e. no iteration was valid ⇒ history[last].verdict.valid is always false at plan() entry. In dumbDriver.plan() (lines 158-167) the early `if (passed) return [
🟡 LOW No test exercises the naive-vs-dumb equivalence (or exposes it) — src/runtime/steering-drivers.test.ts
The 8 tests pass (verified:
pnpm vitest run src/runtime/steering-drivers.test.ts→ 8 passed, 232ms). The tripwire test (lines 67-86) is a good regression guard for the notes/scores firewall. But there is no comparative test asserting the documented experimental axes — e.g. 'for any failing verdict, naive.plan === dumb.plan given matching continuation' would have exposed the naive≈dumb equivalence from finding #1 before merge. For a substrate whose value proposition is the leak-free three-way attribution, at least one property test should pin the intended invariant (or, once the design is fixed, pin the intended differential). As-is, the test suite
🟡 LOW naiveDriver.decide() and describePlan() untested — src/runtime/steering-drivers.test.ts
The naiveDriver test block (lines 37-69) only tests plan(), never decide(). The shared decide describe block (lines 107-125) tests only dumbDriver.decide(). While both use the same decideUntilValidOrCapped, there is no verification that naiveDriver wires it correctly. Additionally, describePlan() is never tested for either driver despite being consumed by runLoop (line 233
🟡 LOW dumbDriver onPass JSDoc says 'rarely reached'; it is never reached — src/runtime/steering-drivers.ts
DumbDriverOptions.onPass doc (lines 128-134): 'In a stop-on-pass loop this is rarely reached (a valid shot ends the loop), but it is required so the driver is total over the pass/fail bit.' Given the kernel's plan→decide ordering (verified in run-loop.ts:230-355: decide runs after each batch and terminates on any valid), onPass is not 'rarely' reached — it is NEVER reached. The 'total over the pass/fail bit' justification is hollow because the ternary that would use onPass sits behind an early return on the same condition. Either reword to 'never issued in a stop-on-pass loop; retained so the option type reflects the pass/fail bit the driver reads' or r
🟡 LOW dumbDriver onPass parameter is unreachable dead path — src/runtime/steering-drivers.ts
In dumbDriver.plan() at line 176-181:
passedis computed on line 176, checked for early return on line 177 (if (passed) return []), sopassedis always false when line 181 is reached (const continuation = passed ? onPass : onFail). The onPass option (
🟡 LOW dumbDriver requires onPass continuation that is never emitted — src/runtime/steering-drivers.ts
dumbDriver options require
onPass(lines 142, 170). Inplan()at line 176-177 the code returns[]as soon aspassedis true, before line 181 computesconst continuation = passed ? onPass : onFail. Because the passed branch exits early,onPassis unreachable dead code under the current stop-on-pass semantics. Either makeonPassoptional or remove it (and the unreachable ternary
🟡 LOW naiveDriver JSDoc claims 'reads NOTHING from verdict' but reads .valid for stop — src/runtime/steering-drivers.ts
Function doc (lines 84-93) says: 'It reads NOTHING from history[last].verdict — not .valid, not .notes, not .scores.' The code (lines 99-108) reads
last?.verdict?.validto decide whether to stop planning. The inline comment on lines 102-104 is honest ('The verdict is read ONLY for .valid here, never to compose the prompt'), contradicting the JSDoc above it. Real impact on the exper
🟡 LOW naiveDriver docs claim it reads no verdict, but plan() reads .valid — src/runtime/steering-drivers.ts
The module-level comment (lines 17-18) and the function comment (lines 102-103) state that naiveDriver reads NOTHING from the verdict. The implementation at line 120 reads
last?.verdict?.validto decide whether to stop. This is only used for termination, not prompt composition, but the documentation overstates the 'no-signal' contract. Update the comments to say it reads only.valid
tangletools · 2026-06-24T13:13:51Z · trace
…orts The naive/dumb steering drivers add exports; regenerating the catalog is what makes the CLASS-7 freshness gate pass (this was the #372 CI failure).
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — c3ce8d79
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T13:21:10Z
What
Adds two non-LLM steering drivers to the loop set, alongside
refine/blind/fanout/dynamic:naive— fixed continuation ("keep going"), conveys no grade signal.dumb— pass/fail-only from the prior verdict, no grader findings.These are the leak-free steering controls: between rounds they hand the coder no information derived from the grader, so the gap between
dumband the findings-awarerefinecoach measures how much grader-derived coaching inflates a result — a control any multi-round eval wants, not just one benchmark.Design
The steering lives in the driver's
plan()(via a continuation callback the driver applies), so the driver produces the next-round task — the consumer's loop doesn't special-case steering. Extends the existingDriver/loop primitives; does not fork the loop. Zero benchmark coupling (a tool/grader-agnostic continuation function).Tests / checks
tsc --noEmit0 errors; vitest 8/8.