feat(loops): seeded promotion gate + authored-path convergence by drewstone · Pull Request #220 · tangle-network/agent-runtime

drewstone · 2026-06-10T10:17:20Z

What

promotionGate as a package primitive (src/runtime/promotion-gate.ts, exported from ./loops): the promotion decision over a holdout BenchmarkReport — SEEDED paired bootstrap via agent-eval heldoutSignificance, deterministic verdict, minimum-evidence floor (default 6 paired tasks), CI lower bound must clear the threshold. Replaces the bench-local unseeded pairedBootstrap gate, whose verdict was non-deterministic re-run to re-run and accepted n=2.
authorStrategy hardening: named fallbackModel retry (one attempt when the primary call fails or returns no code block — the edge-524/thinking-model case), and temperature/maxTokens are now actually passed through (temperature was declared but silently ignored).
assertAuthoredCodeSafe → assertStrategyContract: same checks, honest framing — the lint enforces the harness's two measurement invariants at the module boundary (author blindness: no out-of-band reads/mutations of verifier state; conserved dose: no out-of-band compute), which is what keeps harness-verified scores and equal-budget comparisons meaningful. It is a contract lint for measurement integrity, not a security boundary.
Bench convergence: bench/src/strategy-author.mts drops its duplicate authorStrategy + drifted contractDoc (it lacked the fix(runtime): authored-code import enforcement + empty-messages foot-gun #219 empty-messages rule) and becomes the R0→R2 ladder CLI over the package primitive; flywheel-run.mts authors and gates through the package. Authored run artifacts are gitignored + excluded from bench typecheck.
Regression tests (tests/loops/strategy-suite.test.ts): harness-verified scoring (a {score:1}-doing-nothing body scores 0; keep-best overrides under-reporting), the messages: []-is-fresh rule, the contract lint's accept/reject set, and the gate's determinism + floor + fail-loud pairing.
bench/HARNESS.md synced (author default deepseek-v4-pro, rotating holdout, packaged gate).

Why

The flywheel is the measurement instrument for the self-improvement program; its promotion verdict must be reproducible and its author path must have one implementation. This closes the bench/package duplication (drifted contracts), the unseeded-gate hole, and puts the #217/#219 correctness claims under test.

Verification

pnpm run typecheck clean, pnpm run lint clean (13 pre-existing warnings untouched), pnpm test 698 passed / 1 skipped (+18 new).
cd bench && npx tsc --noEmit -p tsconfig.json clean against the rebuilt dist.

- promotionGate: the statistical promotion decision as a package primitive — seeded paired bootstrap (agent-eval heldoutSignificance) over per-task holdout deltas, deterministic verdict, minimum-evidence floor (6 paired tasks), CI lower bound must clear the threshold. Replaces the bench-local unseeded pairedBootstrap whose verdict varied re-run to re-run. - authorStrategy: named fallbackModel retry (one attempt when the primary fails or returns no code block), temperature/maxTokens now passed through. - assertAuthoredCodeSafe -> assertStrategyContract: the lint enforces the harness's measurement invariants (author blindness + conserved dose) at the module boundary; docstring now says so in those terms. - bench: strategy-author.mts drops its duplicate authorStrategy/contractDoc and becomes the R0->R2 ladder CLI over the package primitive; flywheel-run authors and gates through the package; authored run artifacts gitignored and excluded from typecheck. - tests: regression coverage for harness-verified scoring, the empty-messages rule, the contract lint, and the gate's determinism/floor.

tangletools

✅ Auto-approved PR — `5151694e`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-10T10:17:27Z}

tangletools approved these changes Jun 10, 2026

View reviewed changes

drewstone merged commit 83a296b into main Jun 10, 2026
1 check passed

This was referenced Jun 10, 2026

feat(loops): make the remaining optimizer coordinates addressable #221

Closed

feat(loops): runStrategyEvolution — population × multi-generation strategy search #223

Merged

drewstone added a commit that referenced this pull request Jun 10, 2026

merge main (squashed #220) — branch side carries the superset

024c43e

drewstone mentioned this pull request Jun 10, 2026

feat(loops): make the remaining optimizer coordinates addressable #224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(loops): seeded promotion gate + authored-path convergence#220

feat(loops): seeded promotion gate + authored-path convergence#220
drewstone merged 1 commit into
mainfrom
feat/loops-promotion-gate-author-convergence

drewstone commented Jun 10, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 10, 2026

What

Why

Verification

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 5151694e

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `5151694e`