Skip to content

feat(loops): leak-free steering drivers (naive, dumb)#372

Merged
drewstone merged 4 commits into
mainfrom
lift/steering-drivers-clean
Jun 24, 2026
Merged

feat(loops): leak-free steering drivers (naive, dumb)#372
drewstone merged 4 commits into
mainfrom
lift/steering-drivers-clean

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

What

Adds two non-LLM steering drivers to the loop set, alongside refine/blind/fanout/dynamic:

  • naive — fixed continuation ("keep going"), conveys no grade signal.
  • dumb — pass/fail-only from the prior verdict, no grader findings.

These are the leak-free steering controls: between rounds they hand the coder no information derived from the grader, so the gap between dumb and the findings-aware refine coach measures how much grader-derived coaching inflates a result — a control any multi-round eval wants, not just one benchmark.

Design

The steering lives in the driver's plan() (via a continuation callback the driver applies), so the driver produces the next-round task — the consumer's loop doesn't special-case steering. Extends the existing Driver/loop primitives; does not fork the loop. Zero benchmark coupling (a tool/grader-agnostic continuation function).

Tests / checks

tsc --noEmit 0 errors; vitest 8/8.

Add two non-LLM steering Drivers to the driven-loop set as the leak-free
controls for the refine reference driver. They differ only in how much of the
prior verdict plan() reads:

- naiveDriver: reads nothing from the verdict; issues a fixed continuation.
- dumbDriver: reads ONLY verdict.valid; issues onPass/onFail. Never touches
  notes/scores — that boundary is the firewall, enforced by a tripwire test.

The dumb->refine gap isolates how much the grader's findings inflate a result
over a bare pass/fail bit. Continuation strings are parameters and the Task
shape is opaque (caller supplies applyContinuation), so the builders carry zero
domain coupling. They plug into runLoop unchanged; no interface addition needed
since Driver.plan already receives history with verdicts.
tangletools
tangletools previously approved these changes Jun 24, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — b6fbf3ce

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T12:34:18Z

tangletools
tangletools previously approved these changes Jun 24, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 3670434d

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T12:39:55Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 2 (2 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 196.7s (2 bridge agents)
Total 196.7s

💰 Value — sound-with-nits

Adds two minimal, non-LLM Driver controls (naive fixed-continuation and dumb pass/fail-only) to the runLoop primitive so benchmarks can isolate how much grader findings inflate multi-shot results; clean, grain-fitting addition with a small structural duplication nit.

  • What it does: Introduces naiveDriver and dumbDriver in src/runtime/steering-drivers.ts and re-exports them from src/runtime/index.ts:236-243. Both implement the existing Driver<Task, Output, SteeringDecision> interface used by runLoop (src/runtime/types.ts:219). naiveDriver runs the original task at shot 0 and then issues the same caller-supplied continuation string every round, ignoring the v
  • Goals it achieves: Provides leak-free experimental controls for multi-round evals: by comparing naivedumbrefine, a caller can attribute loop improvement to (1) the bare pass/fail bit, or (2) the grader's findings/notes, instead of assuming a coached loop is better for free. It keeps the loop kernel unchanged and domain-agnostic by making continuation strings and applyContinuation caller parameters, not
  • Assessment: The change is coherent and fits the codebase's grain. It extends the existing Driver/runLoop primitive without touching the kernel or adding benchmark coupling. The tests enforce the leak-free firewall with a getter trap (src/runtime/steering-drivers.test.ts:81-93) and exercise stop/cap behavior. It mirrors the reference refine driver and is a worthwhile, low-risk addition.
  • Better / existing approach: No materially better approach or existing equivalent was found. The only nearby primitive is singleShotDriver in src/runtime/supervise/runtime.ts:1272, which merely repeats the same task up to a cap and performs no verdict-aware steering, so it does not serve the same control purpose. A single generic fixedContinuationDriver with naive and dumb as thin presets could remove duplicated `pl
  • Model: opencode/kimi-for-coding/k2p7
  • Bridge attempts: 1

🎯 Usefulness — sound-with-nits

Two clean leak-free steering controls built squarely in the grain of the existing Driver interface; the one nit is a required option (onPass) that is provably unreachable in any conformant loop.

  • Integration: Reachable and correctly wired. Both builders return a Driver<Task, Output, SteeringDecision> consumed by runLoop({ driver }) (run-loop.ts:73, :229, :331); exported as public package API from src/runtime/index.ts:236-243. They conform to the contract exactly — plan/decide/describePlan all present, and SteeringDecision values map correctly onto isTerminalDecision (run-loop.ts:1131: 'pi
  • Fit with existing patterns: Strong. The plan/decide/describePlan shape is a near-verbatim lift of the reference refine driver (examples/driver-loop/driver-loop.ts:125-168), including the identical decide semantics (history.some(valid) → pick-winner; else refine-until-cap → fail). describePlan returns kind: 'refine', matching the kernel's count-based inference for a single planned task (run-loop.ts:247). The gener
  • Real-world viability: Robust on the edges that matter: missing/undefined verdict is treated as not-valid (total, never throws) in both plan and decide; the shot cap is enforced in both paths; concurrency/abort are kernel-owned and untouched. The one structural wrinkle: dumbDriver's onPass branch is dead at runtime. Per the kernel's round ordering (plan at :229 → workers → decide at :331 → terminate-on-terminal at :
  • Model: opencode/zai-coding-plan/glm-5.2
  • Bridge attempts: 1

💰 Value Audit

🟡 The two driver bodies duplicate the same scaffold [duplication] ``

naiveDriver (src/runtime/steering-drivers.ts:108-132) and dumbDriver (src/runtime/steering-drivers.ts:167-191) repeat the same history-length checks, cap handling, decideUntilValidOrCapped wiring, and describePlan return. This could be collapsed into one internal builder parameterized by a (lastVerdict?) => string continuation selector, with naiveDriver and dumbDriver as thin presets. That would make the firewall boundary — what is allowed to be read from the verdict — live in

🎯 Usefulness Audit

🟡 dumbDriver's required onPass is unreachable; naive and dumb collapse in stop-on-pass [ergonomics] ``

Because decide() (steering-drivers.ts:76) returns terminal pick-winner as soon as any shot is valid, and the kernel calls decide() after workers and terminates before the next plan() (run-loop.ts:331-348), plan() is only ever called when no prior iteration passed. So dumbDriver.plan's passed is always false: the if (passed) return [] early-return (line 177) and the passed ? onPass : onFail select (line 181) are dead, meaning the required onPass option can never be exercised,


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260624T131015Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — 3670434d

Readiness 76/100 · Confidence 65/100 · 10 findings (3 medium, 7 low)

opencode-kimi glm deepseek aggregate
Readiness 79 76 79 76
Confidence 65 65 65 65
Correctness 79 76 79 76
Security 79 76 79 76
Testing 79 76 79 76
Architecture 79 76 79 76

Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 3 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM describePlan() always reports 'refine' even when plan() returns [] (stop) — src/runtime/steering-drivers.ts

naiveDriver (lines 128-130) and dumbDriver (lines 187-189) return { kind: 'refine', ... } from describePlan() unconditionally. However both plan() implementations return [] when the last shot is valid (lines 120, 177) or the cap is reached ([lines 122](https://github.com/tangle-network/agent-runtime/blob/3670434db6d21b8ce16850a66e62ab1e0b622f93/src/runtime/steering-drivers.ts#L

🟠 MEDIUM naiveDriver docstring claim contradicts code — src/runtime/steering-drivers.ts

The file-level docstring at lines 17-19 states naiveDriver 'reads NOTHING from the verdict', but both plan() (line 120: last?.verdict?.valid) and decide() (line 76 via decideUntilValidOrCapped: it.verdict?.valid) read verdict.valid for the stop gate and terminal decision. The inline comment at [lines 117-119](https://github.com/tangle-network/agent-runtime/blob/3670434db6d21b8ce16850a66e62

🟠 MEDIUM naive→dumb experimental gap is structurally zero; onPass is dead code — src/runtime/steering-drivers.ts

The PR's stated axis (header JSDoc lines 12-14): 'The naive → dumb gap isolates the value of the pass/fail bit alone.' It cannot. Both drivers share decideUntilValidOrCapped(), which returns terminal 'pick-winner' on ANY valid iteration (run-loop.ts:1131-1133 treats 'pick-winner' as terminal). So plan() at round N+1 is only reached when round N was non-terminal, i.e. no iteration was valid ⇒ history[last].verdict.valid is always false at plan() entry. In dumbDriver.plan() (lines 158-167) the early `if (passed) return [

🟡 LOW No test exercises the naive-vs-dumb equivalence (or exposes it) — src/runtime/steering-drivers.test.ts

The 8 tests pass (verified: pnpm vitest run src/runtime/steering-drivers.test.ts → 8 passed, 232ms). The tripwire test (lines 67-86) is a good regression guard for the notes/scores firewall. But there is no comparative test asserting the documented experimental axes — e.g. 'for any failing verdict, naive.plan === dumb.plan given matching continuation' would have exposed the naive≈dumb equivalence from finding #1 before merge. For a substrate whose value proposition is the leak-free three-way attribution, at least one property test should pin the intended invariant (or, once the design is fixed, pin the intended differential). As-is, the test suite

🟡 LOW naiveDriver.decide() and describePlan() untested — src/runtime/steering-drivers.test.ts

The naiveDriver test block (lines 37-69) only tests plan(), never decide(). The shared decide describe block (lines 107-125) tests only dumbDriver.decide(). While both use the same decideUntilValidOrCapped, there is no verification that naiveDriver wires it correctly. Additionally, describePlan() is never tested for either driver despite being consumed by runLoop (line 233

🟡 LOW dumbDriver onPass JSDoc says 'rarely reached'; it is never reached — src/runtime/steering-drivers.ts

DumbDriverOptions.onPass doc (lines 128-134): 'In a stop-on-pass loop this is rarely reached (a valid shot ends the loop), but it is required so the driver is total over the pass/fail bit.' Given the kernel's plan→decide ordering (verified in run-loop.ts:230-355: decide runs after each batch and terminates on any valid), onPass is not 'rarely' reached — it is NEVER reached. The 'total over the pass/fail bit' justification is hollow because the ternary that would use onPass sits behind an early return on the same condition. Either reword to 'never issued in a stop-on-pass loop; retained so the option type reflects the pass/fail bit the driver reads' or r

🟡 LOW dumbDriver onPass parameter is unreachable dead path — src/runtime/steering-drivers.ts

In dumbDriver.plan() at line 176-181: passed is computed on line 176, checked for early return on line 177 (if (passed) return []), so passed is always false when line 181 is reached (const continuation = passed ? onPass : onFail). The onPass option (

🟡 LOW dumbDriver requires onPass continuation that is never emitted — src/runtime/steering-drivers.ts

dumbDriver options require onPass (lines 142, 170). In plan() at line 176-177 the code returns [] as soon as passed is true, before line 181 computes const continuation = passed ? onPass : onFail. Because the passed branch exits early, onPass is unreachable dead code under the current stop-on-pass semantics. Either make onPass optional or remove it (and the unreachable ternary

🟡 LOW naiveDriver JSDoc claims 'reads NOTHING from verdict' but reads .valid for stop — src/runtime/steering-drivers.ts

Function doc (lines 84-93) says: 'It reads NOTHING from history[last].verdict — not .valid, not .notes, not .scores.' The code (lines 99-108) reads last?.verdict?.valid to decide whether to stop planning. The inline comment on lines 102-104 is honest ('The verdict is read ONLY for .valid here, never to compose the prompt'), contradicting the JSDoc above it. Real impact on the exper

🟡 LOW naiveDriver docs claim it reads no verdict, but plan() reads .valid — src/runtime/steering-drivers.ts

The module-level comment (lines 17-18) and the function comment (lines 102-103) state that naiveDriver reads NOTHING from the verdict. The implementation at line 120 reads last?.verdict?.valid to decide whether to stop. This is only used for termination, not prompt composition, but the documentation overstates the 'no-signal' contract. Update the comments to say it reads only .valid


tangletools · 2026-06-24T13:13:51Z · trace

…orts

The naive/dumb steering drivers add exports; regenerating the catalog is what
makes the CLASS-7 freshness gate pass (this was the #372 CI failure).

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — c3ce8d79

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T13:21:10Z

@drewstone drewstone merged commit d8708f5 into main Jun 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants