feat(loops): leak-free steering drivers (naive, dumb) by drewstone · Pull Request #372 · tangle-network/agent-runtime

drewstone · 2026-06-24T12:34:10Z

What

Adds two non-LLM steering drivers to the loop set, alongside refine/blind/fanout/dynamic:

naive — fixed continuation ("keep going"), conveys no grade signal.
dumb — pass/fail-only from the prior verdict, no grader findings.

These are the leak-free steering controls: between rounds they hand the coder no information derived from the grader, so the gap between dumb and the findings-aware refine coach measures how much grader-derived coaching inflates a result — a control any multi-round eval wants, not just one benchmark.

Design

The steering lives in the driver's plan() (via a continuation callback the driver applies), so the driver produces the next-round task — the consumer's loop doesn't special-case steering. Extends the existing Driver/loop primitives; does not fork the loop. Zero benchmark coupling (a tool/grader-agnostic continuation function).

Tests / checks

tsc --noEmit 0 errors; vitest 8/8.

Add two non-LLM steering Drivers to the driven-loop set as the leak-free controls for the refine reference driver. They differ only in how much of the prior verdict plan() reads: - naiveDriver: reads nothing from the verdict; issues a fixed continuation. - dumbDriver: reads ONLY verdict.valid; issues onPass/onFail. Never touches notes/scores — that boundary is the firewall, enforced by a tripwire test. The dumb->refine gap isolates how much the grader's findings inflate a result over a bare pass/fail bit. Continuation strings are parameters and the Task shape is opaque (caller supplies applyContinuation), so the builders carry zero domain coupling. They plug into runLoop unchanged; no interface addition needed since Driver.plan already receives history with verdicts.

tangletools

✅ Auto-approved PR — `b6fbf3ce`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T12:34:18Z}

tangletools

✅ Auto-approved PR — `3670434d`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T12:39:55Z}

tangletools

🟡 Value Audit — sound-with-nits


Verdict	sound-with-nits
Concerns	2 (2 weak-concern)
Heuristic	0.0s
Duplication	0.0s
Interrogation	196.7s (2 bridge agents)
Total	196.7s

💰 Value — sound-with-nits

Adds two minimal, non-LLM Driver controls (naive fixed-continuation and dumb pass/fail-only) to the runLoop primitive so benchmarks can isolate how much grader findings inflate multi-shot results; clean, grain-fitting addition with a small structural duplication nit.

What it does: Introduces naiveDriver and dumbDriver in src/runtime/steering-drivers.ts and re-exports them from src/runtime/index.ts:236-243. Both implement the existing Driver<Task, Output, SteeringDecision> interface used by runLoop (src/runtime/types.ts:219). naiveDriver runs the original task at shot 0 and then issues the same caller-supplied continuation string every round, ignoring the v
Goals it achieves: Provides leak-free experimental controls for multi-round evals: by comparing naive → dumb → refine, a caller can attribute loop improvement to (1) the bare pass/fail bit, or (2) the grader's findings/notes, instead of assuming a coached loop is better for free. It keeps the loop kernel unchanged and domain-agnostic by making continuation strings and applyContinuation caller parameters, not
Assessment: The change is coherent and fits the codebase's grain. It extends the existing Driver/runLoop primitive without touching the kernel or adding benchmark coupling. The tests enforce the leak-free firewall with a getter trap (src/runtime/steering-drivers.test.ts:81-93) and exercise stop/cap behavior. It mirrors the reference refine driver and is a worthwhile, low-risk addition.
Better / existing approach: No materially better approach or existing equivalent was found. The only nearby primitive is singleShotDriver in src/runtime/supervise/runtime.ts:1272, which merely repeats the same task up to a cap and performs no verdict-aware steering, so it does not serve the same control purpose. A single generic fixedContinuationDriver with naive and dumb as thin presets could remove duplicated `pl
Model: opencode/kimi-for-coding/k2p7
Bridge attempts: 1

🎯 Usefulness — sound-with-nits

Two clean leak-free steering controls built squarely in the grain of the existing Driver interface; the one nit is a required option (onPass) that is provably unreachable in any conformant loop.

Integration: Reachable and correctly wired. Both builders return a Driver<Task, Output, SteeringDecision> consumed by runLoop({ driver }) (run-loop.ts:73, :229, :331); exported as public package API from src/runtime/index.ts:236-243. They conform to the contract exactly — plan/decide/describePlan all present, and SteeringDecision values map correctly onto isTerminalDecision (run-loop.ts:1131: 'pi
Fit with existing patterns: Strong. The plan/decide/describePlan shape is a near-verbatim lift of the reference refine driver (examples/driver-loop/driver-loop.ts:125-168), including the identical decide semantics (history.some(valid) → pick-winner; else refine-until-cap → fail). describePlan returns kind: 'refine', matching the kernel's count-based inference for a single planned task (run-loop.ts:247). The gener
Real-world viability: Robust on the edges that matter: missing/undefined verdict is treated as not-valid (total, never throws) in both plan and decide; the shot cap is enforced in both paths; concurrency/abort are kernel-owned and untouched. The one structural wrinkle: dumbDriver's onPass branch is dead at runtime. Per the kernel's round ordering (plan at :229 → workers → decide at :331 → terminate-on-terminal at :
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 1

💰 Value Audit

🟡 The two driver bodies duplicate the same scaffold [duplication] ``

naiveDriver (src/runtime/steering-drivers.ts:108-132) and dumbDriver (src/runtime/steering-drivers.ts:167-191) repeat the same history-length checks, cap handling, decideUntilValidOrCapped wiring, and describePlan return. This could be collapsed into one internal builder parameterized by a (lastVerdict?) => string continuation selector, with naiveDriver and dumbDriver as thin presets. That would make the firewall boundary — what is allowed to be read from the verdict — live in

🎯 Usefulness Audit

🟡 dumbDriver's required onPass is unreachable; naive and dumb collapse in stop-on-pass [ergonomics] ``

Because decide() (steering-drivers.ts:76) returns terminal pick-winner as soon as any shot is valid, and the kernel calls decide() after workers and terminates before the next plan() (run-loop.ts:331-348), plan() is only ever called when no prior iteration passed. So dumbDriver.plan's passed is always false: the if (passed) return [] early-return (line 177) and the passed ? onPass : onFail select (line 181) are dead, meaning the required onPass option can never be exercised,

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260624T131015Z}

tangletools · 2026-06-24T13:13:55Z

✅ No Blockers — `3670434d`

Readiness 76/100 · Confidence 65/100 · 10 findings (3 medium, 7 low)

	opencode-kimi	glm	deepseek	aggregate
Readiness	79	76	79	76
Confidence	65	65	65	65
Correctness	79	76	79	76
Security	79	76	79	76
Testing	79	76	79	76
Architecture	79	76	79	76

Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM describePlan() always reports 'refine' even when plan() returns [] (stop) — src/runtime/steering-drivers.ts

naiveDriver (lines 128-130) and dumbDriver (lines 187-189) return { kind: 'refine', ... } from describePlan() unconditionally. However both plan() implementations return [] when the last shot is valid (lines 120, 177) or the cap is reached ([lines 122](https://github.com/tangle-network/agent-runtime/blob/3670434db6d21b8ce16850a66e62ab1e0b622f93/src/runtime/steering-drivers.ts#L

🟠 MEDIUM naiveDriver docstring claim contradicts code — src/runtime/steering-drivers.ts

The file-level docstring at lines 17-19 states naiveDriver 'reads NOTHING from the verdict', but both plan() (line 120: last?.verdict?.valid) and decide() (line 76 via decideUntilValidOrCapped: it.verdict?.valid) read verdict.valid for the stop gate and terminal decision. The inline comment at [lines 117-119](https://github.com/tangle-network/agent-runtime/blob/3670434db6d21b8ce16850a66e62

🟠 MEDIUM naive→dumb experimental gap is structurally zero; onPass is dead code — src/runtime/steering-drivers.ts

The PR's stated axis (header JSDoc lines 12-14): 'The naive → dumb gap isolates the value of the pass/fail bit alone.' It cannot. Both drivers share decideUntilValidOrCapped(), which returns terminal 'pick-winner' on ANY valid iteration (run-loop.ts:1131-1133 treats 'pick-winner' as terminal). So plan() at round N+1 is only reached when round N was non-terminal, i.e. no iteration was valid ⇒ history[last].verdict.valid is always false at plan() entry. In dumbDriver.plan() (lines 158-167) the early `if (passed) return [

🟡 LOW No test exercises the naive-vs-dumb equivalence (or exposes it) — src/runtime/steering-drivers.test.ts

The 8 tests pass (verified: pnpm vitest run src/runtime/steering-drivers.test.ts → 8 passed, 232ms). The tripwire test (lines 67-86) is a good regression guard for the notes/scores firewall. But there is no comparative test asserting the documented experimental axes — e.g. 'for any failing verdict, naive.plan === dumb.plan given matching continuation' would have exposed the naive≈dumb equivalence from finding #1 before merge. For a substrate whose value proposition is the leak-free three-way attribution, at least one property test should pin the intended invariant (or, once the design is fixed, pin the intended differential). As-is, the test suite

🟡 LOW naiveDriver.decide() and describePlan() untested — src/runtime/steering-drivers.test.ts

The naiveDriver test block (lines 37-69) only tests plan(), never decide(). The shared decide describe block (lines 107-125) tests only dumbDriver.decide(). While both use the same decideUntilValidOrCapped, there is no verification that naiveDriver wires it correctly. Additionally, describePlan() is never tested for either driver despite being consumed by runLoop (line 233

🟡 LOW dumbDriver onPass JSDoc says 'rarely reached'; it is never reached — src/runtime/steering-drivers.ts

DumbDriverOptions.onPass doc (lines 128-134): 'In a stop-on-pass loop this is rarely reached (a valid shot ends the loop), but it is required so the driver is total over the pass/fail bit.' Given the kernel's plan→decide ordering (verified in run-loop.ts:230-355: decide runs after each batch and terminates on any valid), onPass is not 'rarely' reached — it is NEVER reached. The 'total over the pass/fail bit' justification is hollow because the ternary that would use onPass sits behind an early return on the same condition. Either reword to 'never issued in a stop-on-pass loop; retained so the option type reflects the pass/fail bit the driver reads' or r

🟡 LOW dumbDriver onPass parameter is unreachable dead path — src/runtime/steering-drivers.ts

In dumbDriver.plan() at line 176-181: passed is computed on line 176, checked for early return on line 177 (if (passed) return []), so passed is always false when line 181 is reached (const continuation = passed ? onPass : onFail). The onPass option (

🟡 LOW dumbDriver requires onPass continuation that is never emitted — src/runtime/steering-drivers.ts

dumbDriver options require onPass (lines 142, 170). In plan() at line 176-177 the code returns [] as soon as passed is true, before line 181 computes const continuation = passed ? onPass : onFail. Because the passed branch exits early, onPass is unreachable dead code under the current stop-on-pass semantics. Either make onPass optional or remove it (and the unreachable ternary

🟡 LOW naiveDriver JSDoc claims 'reads NOTHING from verdict' but reads .valid for stop — src/runtime/steering-drivers.ts

Function doc (lines 84-93) says: 'It reads NOTHING from history[last].verdict — not .valid, not .notes, not .scores.' The code (lines 99-108) reads last?.verdict?.valid to decide whether to stop planning. The inline comment on lines 102-104 is honest ('The verdict is read ONLY for .valid here, never to compose the prompt'), contradicting the JSDoc above it. Real impact on the exper

🟡 LOW naiveDriver docs claim it reads no verdict, but plan() reads .valid — src/runtime/steering-drivers.ts

The module-level comment (lines 17-18) and the function comment (lines 102-103) state that naiveDriver reads NOTHING from the verdict. The implementation at line 120 reads last?.verdict?.valid to decide whether to stop. This is only used for termination, not prompt composition, but the documentation overstates the 'no-signal' contract. Update the comments to say it reads only .valid

_{tangletools · 2026-06-24T13:13:51Z · trace}

…orts The naive/dumb steering drivers add exports; regenerating the catalog is what makes the CLASS-7 freshness gate pass (this was the #372 CI failure).

tangletools

✅ Auto-approved PR — `c3ce8d79`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T13:21:10Z}

tangletools previously approved these changes Jun 24, 2026

View reviewed changes

style(loops): sort steering-driver exports (biome)

3670434

drewstone dismissed tangletools’s stale review via 3670434 June 24, 2026 12:39

tangletools previously approved these changes Jun 24, 2026

View reviewed changes

tangletools reviewed Jun 24, 2026

View reviewed changes

drewstone added 2 commits June 24, 2026 07:19

Merge remote-tracking branch 'origin/main' into fix/372

98403a1

chore(docs): regenerate primitive catalog for the steering-driver exp…

c3ce8d7

…orts The naive/dumb steering drivers add exports; regenerating the catalog is what makes the CLASS-7 freshness gate pass (this was the #372 CI failure).

drewstone dismissed tangletools’s stale review via c3ce8d7 June 24, 2026 13:21

tangletools approved these changes Jun 24, 2026

View reviewed changes

drewstone merged commit d8708f5 into main Jun 24, 2026
1 check passed

drewstone mentioned this pull request Jun 24, 2026

chore(hooks): auto-regenerate the primitive catalog on pre-commit #375

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(loops): leak-free steering drivers (naive, dumb)#372

feat(loops): leak-free steering drivers (naive, dumb)#372
drewstone merged 4 commits into
mainfrom
lift/steering-drivers-clean

drewstone commented Jun 24, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools commented Jun 24, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 24, 2026

What

Design

Tests / checks

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — b6fbf3ce

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 3670434d

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟡 Value Audit — sound-with-nits

💰 Value — sound-with-nits

🎯 Usefulness — sound-with-nits

💰 Value Audit

🎯 Usefulness Audit

Uh oh!

tangletools commented Jun 24, 2026

✅ No Blockers — 3670434d

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — c3ce8d79

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `b6fbf3ce`

✅ Auto-approved PR — `3670434d`

✅ No Blockers — `3670434d`

✅ Auto-approved PR — `c3ce8d79`