docs(api): generate the primitive catalog so the anti-reinvention reference cannot go stale#371
Conversation
…erence cannot go stale The hand-listed primitive inventory in docs/canonical-api.md drifted from source: it had zero mentions of live exports (scoreAuthenticity, gateRealness, MultiLayerVerifier, wilson, pairedTTest, runProfileMatrix, extractUsage, …). Anything derivable from source must be generated, not hand-written — only judgment stays curated. - scripts/gen-primitive-catalog.mjs reads the LIVE exports of (a) this package's own public subpaths (from package.json `exports`) and (b) a curated category->subpath map of the @tangle-network/agent-eval substrate surfaces agents should reuse (judge, authenticity, verification, statistics, campaign, token/usage). Extraction is via the TypeScript compiler API over a virtual re-export entry, so it follows aliased re-exports and content-hashed bundle files — the exact things that rot a hand list. Emits docs/api/primitive-catalog.md with a GENERATED header (name, import path, one-line summary per export, grouped by surface). - Wired into `docs:api` (runs after TypeDoc). The freshness gate gains a seventh class (CATALOG): it regenerates the catalog to a temp file and byte-compares to the committed copy, so a new/removed/renamed live export absent from the catalog is a RED BUILD. - Shrank canonical-api.md: removed the export-inventory enumeration from the banner and the §2 preamble, replaced with pointers to docs/api/primitive-catalog.md. Kept all the judgment — the decision gate, §1.5 AgentProfile law, the §2 "I want to -> use -> NOT" table and every "Do NOT". The version + substrate-peer pins stay (gate-enforced). - MAINTAINING.md documents the generated-inventory layer, CLASS 7, and its fix path.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — dd248315
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T08:58:26Z
agent-eval 0.99.0 adds llmJudge (+ the full current judge/auth/verify/stats surface); regenerating the generated catalog picks it up with zero hand-work, which is the point of the generator. Lockfile was pinned at 0.97.0 (pre-llmJudge) despite agent-eval already being in minimumReleaseAgeExclude.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 22dedbfd
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T09:04:12Z
Only the examples (devDependency) need 0.99.0 for llmJudge; agent-runtime's src does not, so the peer floor must not force consumers onto 0.99.0. Catalog + lockfile stay on the resolved 0.99.0 so the examples get llmJudge.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 2a6e393b
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T09:07:05Z
…fImprove not a bare mean gate Folds the one real example fix from #370 (otherwise superseded by the generated catalog) into this PR: self-improving-loop hand-rolled the ship gate as a bare mean comparison instead of the real HeldOutGate/selfImprove primitives.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — be4c1288
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T09:09:47Z
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 2 (2 low) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 206.6s (2 bridge agents) |
| Total | 206.6s |
💰 Value — sound
This PR adds a generated, CI-gated primitive catalog and updates the self-improving-loop example to use real substrate primitives (makeFinding, pairedBootstrap) instead of local one-offs — a coherent anti-staleness improvement.
- What it does: It introduces scripts/gen-primitive-catalog.mjs:1, which reads the live exports of (a) every package.json exports subpath and (b) a curated set of @tangle-network/agent-eval substrate subpaths via the TypeScript compiler API, then emits docs/api/primitive-catalog.md with name, import path, kind, and one-line TSDoc summary per symbol. scripts/check-docs-freshness.mjs:568 adds CLASS 7 [CATALOG], whi
- Goals it achieves: Eliminate the hand-maintained export inventory that had already gone stale (the PR notes canonical-api.md missed scoreAuthenticity, gateRealness, MultiLayerVerifier, wilson, pairedTTest, runProfileMatrix, extractUsage, etc.), enforce the existing anti-staleness law (CLAUDE.md:42), keep canonical-api.md as curated judgment while moving the mechanical inventory to generated docs, and stop the exampl
- Assessment: Good on its merits. The change is in the grain of the codebase: it extends the existing docs-check pipeline and the same fail-loud freshness-gate pattern, commits the generated file alongside the existing TypeDoc output, and uses the TypeScript compiler API so it follows aliased re-exports and content-hashed bundle filenames. The example de-reinvention is consistent with the canonical-api decision
- Better / existing approach: none — this is the right approach. I checked scripts/ (check-docs-freshness.mjs, verify-package-exports.mjs), docs/api/, and git log; there is no existing cross-package primitive inventory generator or catalog. TypeDoc already owns per-module signature pages but excludes externals, so it cannot produce the agent-eval reuse inventory without either adding external entry points or still writing subs
- Model: opencode/kimi-for-coding/k2p7
- Bridge attempts: 1
🎯 Usefulness — sound
A generated, CI-gated primitive catalog that replaces a rotting hand-listed inventory, plus an example de-reinvention — coherent, fully wired, and complementary to the existing TypeDoc pattern.
- Integration: Fully reachable now, not ahead-of-caller. The generator (scripts/gen-primitive-catalog.mjs) is wired into
docs:api(package.json:"docs:api": "typedoc && node scripts/gen-primitive-catalog.mjs"), which runs in CI viadocs:check(.github/workflows/ci.yml:41pnpm run docs:check). The new CATALOG class in check-docs-freshness.mjs:577-613 regenerates-to-temp + byte-compares, and is itself invo - Fit with existing patterns: Fits the codebase's grain precisely. The repo already had a generate-don't-hand-maintain pattern (TypeDoc emits per-module pages under docs/api/); this adds a flat grouped INDEX that is complementary, not competing — canonical-api.md §2 is the WHICH-to-reach-for judgment, primitive-catalog.md is the WHAT-exists inventory, and the split is explicitly documented (canonical-api.md:3 header comment).
- Real-world viability: Holds up under realistic use, not just the happy path. Determinism: export enumeration is declaration-order stable (same TS compiler + same input); the substrate version is pinned by pnpm-lock.yaml + CI's
--frozen-lockfile, and the catalog header embeds the exact resolved version (catalog:12agent-eval@0.99.0) so a divergently-resolved local copy fails loudly — the correct behavior. The genera - Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 1
🔎 Heuristic Signals
🟡 Cruft: console debug added examples/self-improving-loop/self-improving-loop.ts
- console.log(
root cause: ${finding.claim})
🟡 Cruft: commented out code scripts/check-docs-freshness.mjs
+// CLASS 7 — PRIMITIVE-CATALOG: the generated anti-reinvention inventory
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
❌ Needs Work —
|
| opencode-kimi | glm | deepseek | aggregate | |
|---|---|---|---|---|
| Readiness | 32 | 62 | 51 | 32 |
| Confidence | 95 | 95 | 95 | 95 |
| Correctness | 32 | 62 | 51 | 32 |
| Security | 32 | 62 | 51 | 32 |
| Testing | 32 | 62 | 51 | 32 |
| Architecture | 32 | 62 | 51 | 32 |
Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision.
Blocking
🔴 HIGH Pre-existing high-severity transitive dependency ws@8.20.1 — pnpm-lock.yaml
pnpm audit reports GHSA-96hv-2xvq-fx4p / CVE-2026-48779: ws@8.20.1 (Memory exhaustion DoS from tiny fragments). Path: .>@tangle-network/agent-eval>@tangle-network/tcloud>viem>ws. The base lockfile (cfacf1c) already pinned ws@8.20.1, so this is not a regression from the agent-eval bump, but the lockfile still contains an unpatched high-severity transitive dependency. Recommend updating tcloud/viem or adding a pnpm override to ws>=8.21.0 in a follow-up.
Other
🟠 MEDIUM JSDoc {@link} tags leak into plain markdown summaries — docs/api/primitive-catalog.md
14 symbol summaries contain JSDoc
{@link ...}tags that do not render as hyperlinks in standard markdown viewers (GitHub, editors). Examples: line 27{@link buildLoopOtelSpans}, line 30{@link RuntimeHooks}, line 73{@link runPersonaConversation}. The generator's firstDocLine() passes TSDoc raw throughts.displayPartsToString()without stripping{@link}markup. Fix: add a regex pass in the ge
🟠 MEDIUM agent-eval dev vs peer dependency floor mismatch — package.json
devDependencies pins @tangle-network/agent-eval to >=0.99.0 <1.0.0 (line 92) while peerDependencies still allows >=0.97.0 <1.0.0 (line 120). The repo is built and tested against 0.99.0 (pnpm-lock.yaml resolves to 0.99.0), but consumers are told 0.97.0 is acceptable. Because this package imports dozens of agent-eval runtime symbols (src/runtime/index.ts re-exports AnalystFinding/computeFindingId/makeFinding; src/lifecycle/gate.ts imports HeldOutGate; src/run.ts, src/runtime/strategy.ts, and many others import ChatClient/estimateCost/etc.) and agent-eval uses 0.x ver
🟠 MEDIUM dev/peer dependency floor mismatch leaves 0.97.0-0.98.x untested — package.json
package.json:92 (devDependencies):
@tangle-network/agent-evalfloor is>=0.99.0. package.json:120 (peerDependencies):@tangle-network/agent-evalfloor is>=0.97.0. CI resolves dev version 0.99.0 (pnpm-lock.yaml:19 confirms 0.99.0 is locked). Consumers can install 0.97.0-0.98.x and won't hit CI failures. The codebase has 64import ... from '@tangle-network/agent-eval'sites using both types (safe, erased) and runtime values (HeldOutGate, createChatClient, estimateCost, gepaProposer, etc.). Any runtime symbol added in 0.98.0-0.99.0 that's used here will crash consumers on 0.97.x. Commit 2a6e393 acknowledges the intent, but there is no CI matrix running the peer floor. Mitigation: either add a CI matrix entry resolving agent-eval@0.97.0, or bump the peer floor to 0.99.0 to match.
🟠 MEDIUM Predictable temp path and leak on generator failure — scripts/check-docs-freshness.mjs
Line 587 builds
tmpOutwithjoin(tmpdir(), \primitive-catalog-check-${process.pid}.md`). The name is predictable (only PID), lives in a shared directory, andwriteFileSyncfollows symlinks, so a symlink planted at that path can cause the generator to overwrite an arbitrary file that the check then reads. Also, on theres.status !== 0branch ([lines 593-599](https://github.com/tangle-network/agent-runtime/blob/be4c1288cdfffe5cd7be6c151da3d62ea4d51909/scripts/check-docs-freshness.mjs#L593-L599))tmpOutis never removed, leaking the temp file. Fix: create a private temp directory withmkdtempSync(or a random filename withO_CREAT|O_EXCL`) and d
🟡 LOW Section heading + intro say 'two layers' but table now lists three — docs/MAINTAINING.md
Line 5 ('splits its API documentation into two layers') and line 9 ('## The two layers') were not updated when the new 'Generated inventory' row was added at line 14. The table now has 3 rows (Generated reference, Generated inventory, Judgment) while the heading still says two. The intro's 'two layers with opposite maintenance rules' is still conceptually correct (generated-vs-judgment), since both generated rows share the same mainten
🟡 LOW Catalog is mostly placeholder summaries — docs/api/primitive-catalog.md
580 of 1258 catalog rows contain
_(no summary — add a TSDoc line at the declaration)_(e.g., lines 22, 24, 32, 34). This is generated faithfully from source declarations that lack a one-line TSDoc comment, so the file is not wrong, but it substantially reduces the usefulness of the anti-reinvention inventory. The fix is to add one-line TSDoc summaries to the underlying source declarations and rerunpnpm run docs:api; no hand-editing of the generated catalog.
🟡 LOW Cross-category symbol duplication (llmJudge, JudgeConfig) — docs/api/primitive-catalog.md
llmJudgeandJudgeConfigappear both in JUDGE (root.barrel, lines ~1057/1053) and CAMPAIGN (./campaignsubpath, lines 1196/1281) because both subpaths genuinely re-export them. This is accurate (importable from either path) but a reader scanning the inventory sees the same symbol twice under different 'import from' labels with no cross-reference. Cosmetic only; the canonical-api.md decision table resolves which to use. No fix required for correctness.
🟡 LOW Filtered root-barrel categories share one import label — docs/api/primitive-catalog.md
JUDGE/VERIFICATION/STATISTICS/TOKEN categories all read the root
@tangle-network/agent-evalbarrel with name-pattern filters, so each prints 'Import from@tangle-network/agent-eval— N exports' where N is the filtered subset (e.g. VERIFICATION shows 10 of the barrel's ~300). A reader could misread this as the barrel having only 10 exports total. This is by-design judgment (category->subpath mapping) and the §2 intro states symbols are filtered, but the per-section 'N exports' wording undersells the barrel size. Documentation clarity nit only.
🟡 LOW Large fraction of symbols carry no TSDoc summary — docs/api/primitive-catalog.md
Many rows render '(no summary — add a TSDoc line at the declaration)' (e.g. agenticGenerator:22, auditLoopRunner:24, createConversationBackend:32, and the majority of the MCP/campaign surfaces). The generator deliberately surfaces missing docs rather than hiding them, so this is honest output, but it reduces the catalog's value as an anti-reinvention reference — a reader learns a symbol exists but not what it does. Impact is documentation quality, not correctness. Fix is upstream: add first-line TSDoc at the declarations; the catalog regenerates automatically. Not a merge blocker.
🟡 LOW Forward-reference to catalog's §2 is fragile if catalog is regenerated with renumbered sections — docs/canonical-api.md
Line 5 says 'the catalog's §2 shows exactly which subpath each lives under' and line 31 lists the agent-eval surfaces by name. Section numbers in a GENERATED file are not contractually stable — if scripts/gen-primitive-catalog.mjs ever inserts a new top-level ## section, '§2' silently points at the wrong group. The freshness gate (check-docs-freshness.mjs) checks for export/symbol drift and stale file:line, NOT for section-number drift in canonical-api.md's prose. Low impact (the named surface list still works as a fallback identifier), but a future
🟡 LOW Mermaid diagram still shows mean flow, now gate uses paired arrays — examples/self-improving-loop/README.md
Line 55-56: Diagram shows
v0mean -.v0 paired.-> gateandv1mean --> gate, but the gate function now takesv0Scores[]andv1Scores[](per-score arrays, not means). The text was updated to say 'pairedBootstrap(v0, v1).low > 0?' which is correct, but the diagram nodes feeding the gate are labeled as means. Visual-only inconsistency; the runtime code at self-improving-loop.ts:259-261 correctly extracts per-run scores.
🟡 LOW README references 'Phase 6' (re-run multishot) in the prose '7-phase' framing but the mermaid only renders P1-P5 — examples/self-improving-loop/README.md
Header comment in self-improving-loop.ts:5-14 lists 7 phases (baseline, multishot, judges, analyst, apply mutation, re-run, gate), but README mermaid (lines 23-60) only renders Phase 1, 2, 3, 4, 5 — Phase 4 in the diagram is the v1 re-run, conflating the code's phases 6 and 4. Pre-existing (the diff only touched P2/P5 box labels and the ship/hold edge text), not introduced here, but worth aligning if you ever do another docs pass.
🟡 LOW No typecheck verification possible in this worktree — examples/self-improving-loop/self-improving-loop.ts
Line 21:
import { type AnalystFinding, makeFinding, pairedBootstrap } from '@tangle-network/agent-eval'— node_modules is absent from this worktree so tsc cannot verify the import paths. Static grep confirms these are real exports used in production code (src/runtime/run-benchmark.ts:16, src/runtime/index.ts:15). The import pattern matches existing callers exactly. Low risk; CI should catch any real mismatch.
🟡 LOW Paired bootstrap at n=3 is statistically degenerate; 'production statistical core' framing oversells what the demo exercises — examples/self-improving-loop/self-improving-loop.ts
gate() at line 166 calls pairedBootstrap(v0Scores, v1Scores, { seed: 42 }) with n=3 (one shot per persona). With n=3 the bootstrap resamples can only recombine the 3 observed deltas [5,6,5], so the CI degenerates to a permutation of those values ([5.00, 6.00] here). The demo reports n=3 honestly in the verdict string, but the README (line 20) and the gate comment ([lines 154-160](https://github.com/tangle-network/agent-runtime/blob/be4c1288cdfffe5cd7be6c151da3d62ea4d51909/examples/self-im
🟡 LOW gate() has no length-mismatch guard on input arrays — examples/self-improving-loop/self-improving-loop.ts
Line 162-166:
gate(v0Scores, v1Scores)passes both arrays topairedBootstrapwithout checking they have equal length. In current usage both derive from the same PERSONAS loop so lengths always match, but the function is a named export within the file and could be called with misaligned arrays. Addingif (v0Scores.length !== v1Scores.length) throw ...would make the contract explicit. Not a runtime bug in the current demo.
🟡 LOW recommended_action ?? '' fallback would silently produce a degenerate v1 profile — examples/self-improving-loop/self-improving-loop.ts
Line 227:
const mutation = finding.recommended_action ?? ''. If a future analyst body emits an AnalystFinding without recommended_action (the field is optional on the canonical type per src/analyst/types.ts:50), applyMutation would produce a systemPrompt ending in '\n\nIMPROVED v1: ' — a no-op mutation that v1 would then 'promote' on the paired bootstrap of unchanged scores. runAnalyst in this file always sets recommended_action so it cannot fire today, but the silent-fallback pattern is the wrong default for a pedagogical template other analysts will copy. Prefer `if (!finding.recommended_action) throw new Error('analyst: finding missi
🟡 LOW Asymmetric agent-eval devDep (0.99.0) vs peerDep (0.97.0) floor — package.json
Lines 92 vs 120: devDep is
>=0.99.0 <1.0.0while peerDep is>=0.97.0 <1.0.0. Commit 2a6e393 makes the intent explicit and correct ('Only the examples need 0.99.0 for llmJudge; agent-runtime's src does not'). Validated by grepping src/ for agent-eval imports — all use stable APIs (AgentEvalError, AnalystFinding, gepaProposer, createChatClient, etc.) present at 0.97.0. Minor risk: a future src/ change that starts using a 0.99.0-only API would silently compile against the devDep but break consumers on 0.97.0/0.98.0. No action required for this PR; consider adding a CI assertion (e.g. typecheck against the peer floor) if this pattern recurs.
🟡 LOW No automated test for the new CATALOG class — scripts/check-docs-freshness.mjs
Lines 567-613 add a non-trivial check that shells out to the generator and compares outputs. No tests in the repo exercise this script (grep found none), so regressions are only caught by the full CI
docs:checkstep. Consider a minimal test that stubs the generator to assert stale/missing catalog detection.
🟡 LOW PID-only temp filename risks collision under concurrent CI — scripts/check-docs-freshness.mjs
Line 587:
join(tmpdir(), primitive-catalog-check-${process.pid}.md)uses only the PID for uniqueness. In containerized CI where PIDs can recycle across jobs, two concurrent runs of this script could step on each other's temp file. Low probability in practice; addingDate.now()orcrypto.randomUUID()would make it deterministic-safe. The same pattern appears ingen-primitive-catalog.mjs:114for its virtual entry file.
🟡 LOW Success output omits catalog check — scripts/check-docs-freshness.mjs
Lines 622-633: the success message enumerates every checked CLASS (version, substrate peers, citations, §2 table, §3 signatures, exports↔typedoc, prose symbols) but omits any mention of the CLASS 7 primitive-catalog check. Cosmetic — the check still runs, but a user reading CI output won't see
primitive catalog: OKin the report.
🟡 LOW Temp file not cleaned on generator failure — scripts/check-docs-freshness.mjs
Lines 593-599: when
res.status !== 0, the error is reported butrmSync(tmpOut)is never called. The temp file (primitive-catalog-check-<pid>.mdin/tmp) is orphaned. Low impact — small file,/tmpis ephemeral. The success path (line 603) does clean up explicitly.
🟡 LOW Temp file not cleaned on generator-failure branch — scripts/check-docs-freshness.mjs
At scripts/check-docs-freshness.mjs:593-599, when
res.status !== 0the block reports CATALOG and continues withoutrmSync(tmpOut). The success branch (line 603) does clean up. In practice gen-primitive-catalog.mjs exits before itswriteFileSync(outPath, rendered)(line 339) on every error path, so the leak is effectively unreachable; still, for symmetry and defense-in-depth, rmSync the tmp path in the failure branch too. Impact: negligible — leftover empty/missing file in tmpdir.
🟡 LOW spawnSync error/signal cases not reported — scripts/check-docs-freshness.mjs
Line 593 only checks
res.status !== 0. If the child is killed by a signal,res.statusisnulland the message prints 'exit null'. IfspawnSyncitself fails,res.erroris set but ignored. Handleres.errorandres.signalfor clearer failure diagnostics.
🟡 LOW ./runtime alias silently suppresses any future export — scripts/gen-primitive-catalog.mjs
Line 61 adds
./runtimetoownSubpathAliases, so the loop at lines 196-199 skips it without requiring a label. If a future maintainer adds a./runtimeexport to package.json, it will be omitted from the catalog with no error. Consider removing the alias or adding a guard that errors if both./loopsand./runtimeexist.
🟡 LOW Build tooling excludes scripts — scripts/gen-primitive-catalog.mjs
package.json lint is
biome check src tests examplesand tsconfig includes onlysrc, while biome.json includes onlysrc/**,tests/**, andexamples/**. The new generator is therefore not covered by lint or typecheck. Evidence: package.json line 85, tsconfig.jsoninclude: ['src'], biome.jsonfiles.includes.
🟡 LOW Log message hardcodes output path; misleading when PRIMITIVE_CATALOG_OUT is set — scripts/gen-primitive-catalog.mjs
Line 344–347: the console.error summary says "wrote docs/api/primitive-catalog.md" regardless of outPath. When the freshness gate sets PRIMITIVE_CATALOG_OUT to a temp path, this message is incorrect — the actual write target is e.g. /tmp/primitive-catalog-check-12345.md. Fix: reference outPath in the log message (e.g.
wrote ${outPath}) so it reflects the real output destination.
🟡 LOW No unit-level tests for the generator script — scripts/gen-primitive-catalog.mjs
The script is integration-tested via check-docs-freshness.mjs CLASS 7 (regenerates + diffs), but has no direct tests for its extraction, filtering, or rendering logic. If the TS compiler API usage (extractModules, resolveAlias, declKind, firstDocLine) regresses, it would be caught as a catalog diff in CI — but the root cause would require manual diagnosis. Consider adding a vitest test that calls extractModules with a known fixture package to verify symbol extraction, filter application, and markdown rendering independently.
🟡 LOW Orphaned virtual-entry dotfile not in .gitignore — scripts/gen-primitive-catalog.mjs
L114 writes
repoRoot/.primcat-entry-${process.pid}.mtsand L149'srmSync(entry, {force:true})only runs in thefinallyblock. On SIGKILL / OOM / CI timeout / laptop force-shutdown the file survives in repoRoot as an untracked dotfile. Verified.gitignore(9 lines) has no.primcat*entry, so a carelessgit add .sweeps it in. The repoRoot location is correctly justified (L110-112: Node module resolution walks up from the entry's dir, so /tmp can't see this repo's node_modules and the self-reference fails) — so the fix is not to relocate, but to add.primcat-entry-*.mtsto.gitignore. Cheap, durable.
🟡 LOW Substrate filters use unanchored substring matches — scripts/gen-primitive-catalog.mjs
Line 73:
judgeFiltermatchesCalibrationanywhere, and line 91:usageFiltermatchesUsageEventanywhere. A future symbol such asCalibrationDataorMyUsageEventHandlercould be miscategorized into JUDGE or TOKEN/USAGE. Tighten to anchored patterns where the intent is a whole-symbol match.
tangletools · 2026-06-24T11:11:58Z · trace
tangletools
left a comment
There was a problem hiding this comment.
❌ 1 Blocking Finding — be4c1288
Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision.
Full immutable report for this review: trace
Summary comment for this run: full summary
tangletools · 2026-06-24T11:11:58Z · immutable trace
Problem
docs/canonical-api.mdhand-listed which primitives exist — and it went stale. It had zero mentions of live exports:scoreAuthenticity,gateRealness,MultiLayerVerifier,wilson,pairedTTest,runProfileMatrix,extractUsage,extractLlmCallEvent, the wholeJudgeConfig/ensembleJudgejudge surface. A hand-maintained inventory always rots. The principle: anything derivable from source must be generated, not hand-written; only non-derivable judgment stays curated.What this does
1. A generator —
scripts/gen-primitive-catalog.mjs(runs insidedocs:api, after TypeDoc) that reads the live exports of:package.jsonexports(9 surfaces, 944 exports).For each export it emits name, import path, one-line TSDoc summary, grouped by surface, into
docs/api/primitive-catalog.mdwith aGENERATED — do not edit; run pnpm docs:apiheader (1254 symbols total).Extraction is via the TypeScript compiler API (the same compiler TypeDoc uses) over a virtual re-export entry resolved through real Node resolution — so it follows aliased re-exports (
S as wilson) and content-hashed bundle filenames (statistics-<hash>.d.ts). Those are exactly what rots a hand-written list, and exactly what this is immune to.2. Gate enforcement — a seventh class in
scripts/check-docs-freshness.mjs([CATALOG]): it re-runs the generator to a temp file and byte-compares to the committed catalog. A live export added/removed/renamed (or a summary changed) without regenerating = RED BUILD. Belt-and-suspenders with the existinggit diff --exit-code -- docs/apistep now that the catalog is a tracked file underdocs/api/.3. Shrank
docs/canonical-api.md— removed the export-inventory enumeration from the banner (selfImprove/gepaProposer/… list) and the §2 preamble ("Every symbol below is a LOCAL export…"); replaced with pointers todocs/api/primitive-catalog.md. Kept all the judgment: the decision gate, §1.5 AgentProfile law, the §2 "I want to ___ → use ___ → NOT ___" table and every "Do NOT". Version + substrate-peer pins stay (the gate asserts them).docs/MAINTAINING.mddocuments the new generated-inventory layer, CLASS 7, and its fix path.Finding surfaced by doing this
The task's own hand-named symbol list was itself partly stale against the pinned substrate (
agent-eval@0.97.0, the floor this repo depends on):llmJudgeandtokenUsageFieldare not public exports at 0.97.0 — they appear only in later versions / internal bundles. The generated catalog reads live exports, so it correctly includes what exists (ensembleJudge,JudgeConfig,extractUsage*, the full statistics surface) and omits what doesn't. That is the whole point: the inventory can no longer claim a symbol the pinned code doesn't export.Verify
pnpm run docs:apiregenerates the catalog cleanly (typedoc → generator, 1254 symbols).pnpm docs:checkpasses (typedoc + generator +git diff+ freshness gate, all green).wilson(andensembleJudge) row from the catalog makes both the freshness gate ([CATALOG], exit 1) andgit diff --exit-code -- docs/api(exit 1) go RED; reverting → green.pnpm run build,pnpm run typecheck,pnpm run lintgreen.pnpm test: 1102 passed / 1 skipped.Operator review requested — do not merge.