docs: post-POWER-16 go-live plan + verifiably-smarter-and-cheaper thesis#278
Merged
Conversation
…n thesis, Mode 0 (intelligence-off), do-not-claim firewall, Verified→Gated The depth>breadth keystone retracted to a tie at n=72; the SDK sells cost + verification + transfer, not quality improvement. Adds the honest-claim firewall (normative do-not-claim block), the intelligence-OFF billing floor (sandbox-stream, inference+compute only, billing-line-on-the-spawn-line), and renames Verified PRs → Gated PRs. Full plan: docs/go-live-plan.md (5-lens rederivation, code-verified).
…not a ban on 'smarter') Corrects the over-rotation: POWER-16 killed within-run-cleverness-beats-blind-at- equal-compute, NOT the ability to make agents smarter. Smarter comes from gated search (spend compute, certify the winner) + cheap serving of the certified artifact. 'Smarter' is allowed with its gate+CI+n attached — that's the differentiation. Aligns the billing boundary to usage-classification + spawn-gating (not budget-pool surgery).
Replaces the minimal 4-sentence baseline in trata-gate.mts and trata-gepa.mts with the surface found by GEPA across 9 runs (+8.6pp holdout on deepseek-v4-flash, confirmed twice independently). Adds five sentences: labeled-section structure, named-peer benchmarking with EV/EBITDA/P/E/margin metrics, verbatim guidance citation, IRR computation with arithmetic, and explicit SYSTEM_PROMPT env override for future experimentation.
tangletools
approved these changes
Jun 13, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved PR — 6488bac8
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-13T20:58:27Z
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The rederivation Drew asked for after POWER-16 retracted the depth>breadth keystone (+16.4pp n=16 → +4.1pp CI[−1.6,+10.2] tie at n=72). Two docs:
docs/go-live-plan.md— the full plan (5-lens workflow → synthesis → adversarial hardening, every code claim verified): the honest thesis, SDK roadmap deltas, the off/eco/standard/thorough/max tier table, the 2-week slice (Mode-0 + Observe on gtm), the do-not-claim firewall, the fix: persist final runtime stream failures #1 build (shipped in feat(intelligence): Observe + Mode-0 Intelligence SDK wrapper + effort/billing boundary #277) + fix: persist final runtime stream failures #1 experiment (E4: does the cost flywheel compound).docs/intelligence-sdk.md— the contract gains the honest-claim block, Mode 0 (intelligence-off), and the Verified→Gated rename.The thesis
Verifiably smarter and cheaper. POWER-16 killed within-run-cleverness-beats-blind-at-equal-compute, NOT the ability to make agents smarter. Smarter comes from gated search (spend compute, certify the winner) + cheap serving of the certified artifact (the cost flywheel, −12 to −31%). 'Smarter' is allowed with its gate, CI, and n attached — the gate is the moat (everyone else sells lucky streaks), not a ban on the word. Proven now: a certified program transfers +31/+36pp at lower cost (a step). Prove next: that it compounds (E4).