Skip to content

docs: post-POWER-16 go-live plan + verifiably-smarter-and-cheaper thesis#278

Merged
drewstone merged 3 commits into
mainfrom
docs/go-live-rederivation
Jun 13, 2026
Merged

docs: post-POWER-16 go-live plan + verifiably-smarter-and-cheaper thesis#278
drewstone merged 3 commits into
mainfrom
docs/go-live-rederivation

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

What

The rederivation Drew asked for after POWER-16 retracted the depth>breadth keystone (+16.4pp n=16 → +4.1pp CI[−1.6,+10.2] tie at n=72). Two docs:

The thesis

Verifiably smarter and cheaper. POWER-16 killed within-run-cleverness-beats-blind-at-equal-compute, NOT the ability to make agents smarter. Smarter comes from gated search (spend compute, certify the winner) + cheap serving of the certified artifact (the cost flywheel, −12 to −31%). 'Smarter' is allowed with its gate, CI, and n attached — the gate is the moat (everyone else sells lucky streaks), not a ban on the word. Proven now: a certified program transfers +31/+36pp at lower cost (a step). Prove next: that it compounds (E4).

…n thesis, Mode 0 (intelligence-off), do-not-claim firewall, Verified→Gated

The depth>breadth keystone retracted to a tie at n=72; the SDK sells cost +
verification + transfer, not quality improvement. Adds the honest-claim firewall
(normative do-not-claim block), the intelligence-OFF billing floor (sandbox-stream,
inference+compute only, billing-line-on-the-spawn-line), and renames Verified PRs →
Gated PRs. Full plan: docs/go-live-plan.md (5-lens rederivation, code-verified).
…not a ban on 'smarter')

Corrects the over-rotation: POWER-16 killed within-run-cleverness-beats-blind-at-
equal-compute, NOT the ability to make agents smarter. Smarter comes from gated search
(spend compute, certify the winner) + cheap serving of the certified artifact. 'Smarter'
is allowed with its gate+CI+n attached — that's the differentiation. Aligns the billing
boundary to usage-classification + spawn-gating (not budget-pool surgery).
Replaces the minimal 4-sentence baseline in trata-gate.mts and
trata-gepa.mts with the surface found by GEPA across 9 runs
(+8.6pp holdout on deepseek-v4-flash, confirmed twice independently).

Adds five sentences: labeled-section structure, named-peer benchmarking
with EV/EBITDA/P/E/margin metrics, verbatim guidance citation, IRR
computation with arithmetic, and explicit SYSTEM_PROMPT env override
for future experimentation.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 6488bac8

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-13T20:58:27Z

@drewstone drewstone merged commit 9168002 into main Jun 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants