feat(loops): runtime steer-firewall + dynamicLoopRunner analyze forwarding (RSI Gen-1)#139
Merged
Merged
Conversation
…rding (RSI Gen-1) selector ≠ judge, enforced at runtime. The diagnosis the dynamic driver steers from must be TRACE-derived, never judge-derived: assertTraceDerivedFindings (called in runAnalyze, fail-loud) rejects a finding whose evidence is a judge/verdict score (an EvidenceRef kind:'metric' with a verdict|judge|score uri scheme). span/event/artifact/ finding refs and empty-evidence findings stay legal, so existing analysts + the 4 fixtures are unaffected. Provenance, not content — the one coupling the architecture forbids (the external write-only judge leaking back into steering) can no longer reach the planner. Also: dynamicLoopRunner forwards an optional `analyze` hook to createDynamicDriver, closing the gap that kept the runLoop convenience wrapper from running f(trace, findings). tests/loops/dynamic.test.ts +4: artifact-ref PASSES, empty-refs PASSES, verdict-scheme metric ref REJECTED, non-judge (latency) metric ref PASSES. tsc + biome clean; 30/30 loops tests.
drewstone
added a commit
that referenced
this pull request
Jun 6, 2026
Cuts the 58-commit backlog on main into a published release. Headline surface: - runToolLoop / streamToolLoop — bounded turn-level tool-dispatch loop (#137) - RSI agent tree: recursive Agent.act, Supervisor keystone, runProgram, the adaptive-driver channel (#139/#151/#165) - optimization API collapsed onto agent-eval selfImprove; the runtime keeps the CODE-surface ImprovementDriver you pass as driver (#172) - deployable benchmark adapters: AppWorld, commit0, aec-bench, EnterpriseOps-Gym; runBenchmarks over one ADAPTERS registry (#153/#156/#157) - agent-eval floor raised to >=0.83.0 (#175)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RSI Gen-1, PR-A (runtime closures; vitest-verified on src, no dist coupling).
Steer-firewall (selector ≠ judge), enforced at runtime.
assertTraceDerivedFindings(inrunAnalyze, fail-loud) rejects a finding whose evidence is a judge/verdict score — anEvidenceRefkind:'metric'with averdict|judge|scoreuri scheme.span/event/artifact/findingrefs and empty evidence stay legal (existing analysts + 4 fixtures unaffected). Provenance, not content: the external write-only judge can no longer leak back into steering.dynamicLoopRunner.analyzeforwarding — closes the gap that kept the runLoop wrapper from runningf(trace, findings).Tests (+4): artifact-ref PASSES · empty-refs PASSES ·
verdict:scoremetric REJECTED · non-judgelatency_msmetric PASSES. tsc + biome clean; 30/30 loops tests.Designed via a full /pursue cycle (5-seam audit + 4-lens adversarial review). The planner-directive surface (outer-loop seam) was deliberately deferred — gated on the inner-loop verdict, no optimizer consumes it yet. The bench live-loop runner + the measurement pilot are PR-B.