feat(loops): make the remaining optimizer coordinates addressable by drewstone · Pull Request #224 · tangle-network/agent-runtime

drewstone · 2026-06-10T12:43:51Z

Supersedes #221 (auto-closed when its base branch was deleted post-#220-merge). Contains #221's changes + the author-parity fix (maxTokens 8192 + flash fallback — main's #220 squash predates it) + the merge of main.

What

Four unlocks so every genome coordinate is reachable by the one author→gate pipeline:

Persona reaches the LLM author: strategyAuthorContract documents ShotSpec.persona — authored strategies can now be multi-agent.
The author prompt is a coordinate: AuthorStrategyOptions.contract (caller-supplied contract text) — the authoring contract is meta-optimizable, gated like any candidate.
AgenticOptions.analystModel: the firewalled critic can run on a different model than the worker.
BenchmarkConfig.hooks: RuntimeHooks flow through runBenchmark to every cell (the observability seam was unreachable from the benchmark path).
fix(bench): author parity — maxTokens: 8192 restored at the call sites (deepseek-v4-pro returns empty content without it; reproduced live) + fallback default deepseek-v4-flash (fast enough to clear the edge-524 mode; verified authors loadable strategies).
vitest excludes **/.claude/worktrees/**.

Verification

typecheck ✓, lint ✓, 702 tests ✓ (+4: persona-in-contract pin, analystModel routing, hooks pass-through, contract override). Verified live: the relaunched clean flywheel authored critique-refine through this exact path and completed end-to-end.

- promotionGate: the statistical promotion decision as a package primitive — seeded paired bootstrap (agent-eval heldoutSignificance) over per-task holdout deltas, deterministic verdict, minimum-evidence floor (6 paired tasks), CI lower bound must clear the threshold. Replaces the bench-local unseeded pairedBootstrap whose verdict varied re-run to re-run. - authorStrategy: named fallbackModel retry (one attempt when the primary fails or returns no code block), temperature/maxTokens now passed through. - assertAuthoredCodeSafe -> assertStrategyContract: the lint enforces the harness's measurement invariants (author blindness + conserved dose) at the module boundary; docstring now says so in those terms. - bench: strategy-author.mts drops its duplicate authorStrategy/contractDoc and becomes the R0->R2 ladder CLI over the package primitive; flywheel-run authors and gates through the package; authored run artifacts gitignored and excluded from typecheck. - tests: regression coverage for harness-verified scoring, the empty-messages rule, the contract lint, and the gate's determinism/floor.

- strategyAuthorContract documents ShotSpec.persona — the LLM author can now write multi-agent strategies (researcher/engineer hand-offs, persona panels) over the same conserved budget; previously the suite's own multi-agent primitive was invisible to the authored path. - AuthorStrategyOptions.contract — caller-supplied contract text, making the author prompt itself a gateable optimization coordinate. - AgenticOptions.analystModel — the critic can run on a different model than the worker (stronger critic, cheaper worker). - BenchmarkConfig.hooks — RuntimeHooks pass through runBenchmark to every cell's runAgentic (the watchdog/route-auditor seam was unreachable from the benchmark path). - vitest excludes .claude/worktrees/** (worktree agents' copies were swept into the root test run). - tests: persona-in-contract pin, analystModel routing, hooks pass-through, contract override.

The convergence onto the package authorStrategy dropped the transport-level max_tokens the bench client sent by default; deepseek-v4-pro returns EMPTY content on the authoring prompt without it (reproduced), and with it can still hit the edge 524 on a long generation. maxTokens restored at the call sites; the fallback default becomes deepseek-v4-flash — fast enough to clear both failure modes (verified: authors a loadable strategy with and without maxTokens).

tangletools

✅ Auto-approved PR — `024c43ee`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-10T12:43:58Z}

drewstone · 2026-06-10T12:45:05Z

Consolidated into #223 (one PR carrying the full line: addressability unlocks + author-parity fix + agent-eval 0.89 + runStrategyEvolution + review fixes).

drewstone added 5 commits June 10, 2026 04:16

merge: author parity fix from the base branch

e600187

merge main (squashed #220) — branch side carries the superset

024c43e

tangletools approved these changes Jun 10, 2026

View reviewed changes

drewstone closed this Jun 10, 2026

drewstone deleted the feat/author-surface-unlocks branch June 10, 2026 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(loops): make the remaining optimizer coordinates addressable#224

feat(loops): make the remaining optimizer coordinates addressable#224
drewstone wants to merge 5 commits into
mainfrom
feat/author-surface-unlocks

drewstone commented Jun 10, 2026

Uh oh!

tangletools left a comment

Uh oh!

drewstone commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 10, 2026

What

Verification

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 024c43ee

Uh oh!

drewstone commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `024c43ee`