docs(examples): three-persona cleanup — newcomer/senior/junior#376
Merged
Conversation
Apply the genuine-defect fixes from the three-persona example verdict, leaving good/great examples untouched. - self-improving-loop: add a loud minimum-evidence-floor caveat at the gate and in the README — the demo gates at n=3 for runnability, but the production gate floors at minSamples (8 in heldoutSignificance) / minProductiveRuns; never ship a real change on n=3 (the small-n mirage). - delegate: add a README, split a lean teaching delegate.ts + shared.ts from the regression proof (moved to tests/delegate-example.test.ts — env-gated live e2e + an always-on offline fail-loud assertion); drop the test-in-example clothing (E2E PASSED / process.exit) and the internal history from the header. - improve: README symbol drift — ImprovementDriver → SurfaceProposer, gepaDriver → gepaProposer (match what improve() actually builds); fix the test path to src/improvement/improve.test.ts. - knowledge-gating: wire the headline onKnowledgeBlocked hook into the adapter so the README's documented hook is demonstrated (the blocked run now converts the gap into a "would ask the user" decision). - coding-benchmark: simplify offlineSolutions → offlineAgentScripts; keep the rate-limiter cheat/real pair inline (the one anti-cheat teaching moment), move csv/lru real impls to fixtures.ts as readable template literals (no escaped strings). The held-out anti-cheat + smoke + firewall tests stay intact. - supervise: wrap the flagship in main().catch (match the siblings) and uncomment the completion-oracle deliverable so the headline models the safe path. - ui-audit: LENSES_TO_RUN → lensesToRun (the publish-safe module-global convention). - driver-loop / researcher-loop: one-line justification at the offline-box casts. - strategy-evolution README: note promoted:false at toy scale is the gate working.
tangletools
approved these changes
Jun 24, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved PR — efb6b428
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T15:09:11Z
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Applies the genuine-defect fixes from a three-persona (newcomer / senior / junior) review of the 22 examples. Scope: fix the things that actively mislead a reader, and clean the rough edges that teach the wrong habit — without churning the examples judged good/great. Every example still typechecks (
typecheck:examples), the held-out coding-benchmark anti-cheat + its smoke test stay intact, anddocs:checkstays green.Genuine defects fixed (these mislead a reader)
gate()definition AND in the README: the production gate floors the evidence (heldoutSignificancewon't report a pair underminSamples, default 8;HeldOutGaterejects belowminProductiveRunswithfew_runs) — never ship a real change on n=3. (helps the junior most — the persona most likely to liftgate()verbatim).E2E PASSED,process.exit) wearing an example's clothes, with internal history in the header. AddedREADME.md; split a lean teachingdelegate.ts+ reusableshared.tsfrom the regression proof, which moved totests/delegate-example.test.ts(env-gated: a paid live e2e whenTANGLE_API_KEYis set, an always-on offline fail-loud assertion otherwise); stripped the history per the repo's no-history-in-source rule. (newcomer + senior).ImprovementDriver/gepaDriver;improve()actually buildsSurfaceProposer/gepaProposer. Corrected both, plus the stale test path (src/improvement/improve.test.ts). (a README grep now resolves).adapter.onKnowledgeBlockedas the headline hook but the adapter never defined it (doc/code drift). Wired the hook into the adapter so the blocked run demonstrates it (converts the gap into a "would ask the user" decision that flows through as the stop reason).Quality wins (remove ceremony / amateur tells, no behavior change)
offlineSolutions→offlineAgentScriptswith a clear 2-line header; kept the rate-limiter cheat/real pair inline (the one anti-cheat teaching moment), moved the round-invariantcsv-parser/lru-cachereal impls tofixtures.tsas readable template literals (no+ '\n' +escaped-string ceremony). The held-out anti-cheat, the firewall test, the reps-don't-fake-n regression, and the BH-corrected stats are all unchanged and still pass.main().catch(matching the sibling examples) and uncommented the completion-oracledeliverableso the headline models the safe path.LENSES_TO_RUN→lensesToRun(the publish-safe module-global convention; an UPPERCASE module-global trips the Tangle obfuscator).as unknown as SandboxInstancecasts (the other casts already carried one).promoted: falseat toy scale is the gate working, not a break.Left alone (already good/great — no churn)
driver-loop, strategy-suite, supervisor-loop, chat-handler, recursive-supervisor, runtime-run, stream-backends, sanitized-telemetry-streaming, mcp-delegation, fleet-delegation, intelligence-recommend, intelligence-drop-in, agents-of-all-shapes, product-eval. The verdict's older snapshot flagged a few of these (mcp-delegation's
delegate_ui_audit, the fleet-delegation casts) but they already carry the right framing/justification on currentmain; forcing fake-completeSandboxInstancehelpers would add ceremony against the cleanup goal.Verification
pnpm run build— cleanpnpm run typecheck(src + examples) — cleanpnpm run lint— clean (333 files)pnpm run docs:check— green (0 errors; freshness OK)pnpm test— 115 files / 1120 passed, 2 skipped (the env-gated live e2es)