docs(api): generate the primitive catalog so the anti-reinvention reference cannot go stale by drewstone · Pull Request #371 · tangle-network/agent-runtime

drewstone · 2026-06-24T08:58:19Z

Problem

docs/canonical-api.md hand-listed which primitives exist — and it went stale. It had zero mentions of live exports: scoreAuthenticity, gateRealness, MultiLayerVerifier, wilson, pairedTTest, runProfileMatrix, extractUsage, extractLlmCallEvent, the whole JudgeConfig/ensembleJudge judge surface. A hand-maintained inventory always rots. The principle: anything derivable from source must be generated, not hand-written; only non-derivable judgment stays curated.

What this does

1. A generator — scripts/gen-primitive-catalog.mjs (runs inside docs:api, after TypeDoc) that reads the live exports of:

(a) agent-runtime's own public surface — every subpath in package.json exports (9 surfaces, 944 exports).
(b) the agent-eval substrate primitives to reuse — a small curated category → exports-map subpath map (JUDGE / AUTHENTICITY / VERIFICATION / STATISTICS / CAMPAIGN / TOKEN+USAGE — 6 surfaces, 310 exports). The category→subpath mapping is the only hand-curated part; the symbol list under each is generated.

For each export it emits name, import path, one-line TSDoc summary, grouped by surface, into docs/api/primitive-catalog.md with a GENERATED — do not edit; run pnpm docs:api header (1254 symbols total).

Extraction is via the TypeScript compiler API (the same compiler TypeDoc uses) over a virtual re-export entry resolved through real Node resolution — so it follows aliased re-exports (S as wilson) and content-hashed bundle filenames (statistics-<hash>.d.ts). Those are exactly what rots a hand-written list, and exactly what this is immune to.

2. Gate enforcement — a seventh class in scripts/check-docs-freshness.mjs ([CATALOG]): it re-runs the generator to a temp file and byte-compares to the committed catalog. A live export added/removed/renamed (or a summary changed) without regenerating = RED BUILD. Belt-and-suspenders with the existing git diff --exit-code -- docs/api step now that the catalog is a tracked file under docs/api/.

3. Shrank docs/canonical-api.md — removed the export-inventory enumeration from the banner (selfImprove/gepaProposer/… list) and the §2 preamble ("Every symbol below is a LOCAL export…"); replaced with pointers to docs/api/primitive-catalog.md. Kept all the judgment: the decision gate, §1.5 AgentProfile law, the §2 "I want to ___ → use ___ → NOT ___" table and every "Do NOT". Version + substrate-peer pins stay (the gate asserts them).

docs/MAINTAINING.md documents the new generated-inventory layer, CLASS 7, and its fix path.

Finding surfaced by doing this

The task's own hand-named symbol list was itself partly stale against the pinned substrate (agent-eval@0.97.0, the floor this repo depends on): llmJudge and tokenUsageField are not public exports at 0.97.0 — they appear only in later versions / internal bundles. The generated catalog reads live exports, so it correctly includes what exists (ensembleJudge, JudgeConfig, extractUsage*, the full statistics surface) and omits what doesn't. That is the whole point: the inventory can no longer claim a symbol the pinned code doesn't export.

Verify

pnpm run docs:api regenerates the catalog cleanly (typedoc → generator, 1254 symbols).
pnpm docs:check passes (typedoc + generator + git diff + freshness gate, all green).
Gate proof: deleting the wilson (and ensembleJudge) row from the catalog makes both the freshness gate ([CATALOG], exit 1) and git diff --exit-code -- docs/api (exit 1) go RED; reverting → green.
pnpm run build, pnpm run typecheck, pnpm run lint green. pnpm test: 1102 passed / 1 skipped.

Operator review requested — do not merge.

…erence cannot go stale The hand-listed primitive inventory in docs/canonical-api.md drifted from source: it had zero mentions of live exports (scoreAuthenticity, gateRealness, MultiLayerVerifier, wilson, pairedTTest, runProfileMatrix, extractUsage, …). Anything derivable from source must be generated, not hand-written — only judgment stays curated. - scripts/gen-primitive-catalog.mjs reads the LIVE exports of (a) this package's own public subpaths (from package.json `exports`) and (b) a curated category->subpath map of the @tangle-network/agent-eval substrate surfaces agents should reuse (judge, authenticity, verification, statistics, campaign, token/usage). Extraction is via the TypeScript compiler API over a virtual re-export entry, so it follows aliased re-exports and content-hashed bundle files — the exact things that rot a hand list. Emits docs/api/primitive-catalog.md with a GENERATED header (name, import path, one-line summary per export, grouped by surface). - Wired into `docs:api` (runs after TypeDoc). The freshness gate gains a seventh class (CATALOG): it regenerates the catalog to a temp file and byte-compares to the committed copy, so a new/removed/renamed live export absent from the catalog is a RED BUILD. - Shrank canonical-api.md: removed the export-inventory enumeration from the banner and the §2 preamble, replaced with pointers to docs/api/primitive-catalog.md. Kept all the judgment — the decision gate, §1.5 AgentProfile law, the §2 "I want to -> use -> NOT" table and every "Do NOT". The version + substrate-peer pins stay (gate-enforced). - MAINTAINING.md documents the generated-inventory layer, CLASS 7, and its fix path.

tangletools

✅ Auto-approved PR — `dd248315`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T08:58:26Z}

agent-eval 0.99.0 adds llmJudge (+ the full current judge/auth/verify/stats surface); regenerating the generated catalog picks it up with zero hand-work, which is the point of the generator. Lockfile was pinned at 0.97.0 (pre-llmJudge) despite agent-eval already being in minimumReleaseAgeExclude.

tangletools

✅ Auto-approved PR — `22dedbfd`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T09:04:12Z}

Only the examples (devDependency) need 0.99.0 for llmJudge; agent-runtime's src does not, so the peer floor must not force consumers onto 0.99.0. Catalog + lockfile stay on the resolved 0.99.0 so the examples get llmJudge.

tangletools

✅ Auto-approved PR — `2a6e393b`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T09:07:05Z}

…fImprove not a bare mean gate Folds the one real example fix from #370 (otherwise superseded by the generated catalog) into this PR: self-improving-loop hand-rolled the ship gate as a bare mean comparison instead of the real HeldOutGate/selfImprove primitives.

tangletools

✅ Auto-approved PR — `be4c1288`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T09:09:47Z}

tangletools

🟢 Value Audit — sound


Verdict	sound
Concerns	2 (2 low)
Heuristic	0.0s
Duplication	0.0s
Interrogation	206.6s (2 bridge agents)
Total	206.6s

💰 Value — sound

This PR adds a generated, CI-gated primitive catalog and updates the self-improving-loop example to use real substrate primitives (makeFinding, pairedBootstrap) instead of local one-offs — a coherent anti-staleness improvement.

What it does: It introduces scripts/gen-primitive-catalog.mjs:1, which reads the live exports of (a) every package.json exports subpath and (b) a curated set of @tangle-network/agent-eval substrate subpaths via the TypeScript compiler API, then emits docs/api/primitive-catalog.md with name, import path, kind, and one-line TSDoc summary per symbol. scripts/check-docs-freshness.mjs:568 adds CLASS 7 [CATALOG], whi
Goals it achieves: Eliminate the hand-maintained export inventory that had already gone stale (the PR notes canonical-api.md missed scoreAuthenticity, gateRealness, MultiLayerVerifier, wilson, pairedTTest, runProfileMatrix, extractUsage, etc.), enforce the existing anti-staleness law (CLAUDE.md:42), keep canonical-api.md as curated judgment while moving the mechanical inventory to generated docs, and stop the exampl
Assessment: Good on its merits. The change is in the grain of the codebase: it extends the existing docs-check pipeline and the same fail-loud freshness-gate pattern, commits the generated file alongside the existing TypeDoc output, and uses the TypeScript compiler API so it follows aliased re-exports and content-hashed bundle filenames. The example de-reinvention is consistent with the canonical-api decision
Better / existing approach: none — this is the right approach. I checked scripts/ (check-docs-freshness.mjs, verify-package-exports.mjs), docs/api/, and git log; there is no existing cross-package primitive inventory generator or catalog. TypeDoc already owns per-module signature pages but excludes externals, so it cannot produce the agent-eval reuse inventory without either adding external entry points or still writing subs
Model: opencode/kimi-for-coding/k2p7
Bridge attempts: 1

🎯 Usefulness — sound

A generated, CI-gated primitive catalog that replaces a rotting hand-listed inventory, plus an example de-reinvention — coherent, fully wired, and complementary to the existing TypeDoc pattern.

Integration: Fully reachable now, not ahead-of-caller. The generator (scripts/gen-primitive-catalog.mjs) is wired into docs:api (package.json: "docs:api": "typedoc && node scripts/gen-primitive-catalog.mjs"), which runs in CI via docs:check (.github/workflows/ci.yml:41 pnpm run docs:check). The new CATALOG class in check-docs-freshness.mjs:577-613 regenerates-to-temp + byte-compares, and is itself invo
Fit with existing patterns: Fits the codebase's grain precisely. The repo already had a generate-don't-hand-maintain pattern (TypeDoc emits per-module pages under docs/api/); this adds a flat grouped INDEX that is complementary, not competing — canonical-api.md §2 is the WHICH-to-reach-for judgment, primitive-catalog.md is the WHAT-exists inventory, and the split is explicitly documented (canonical-api.md:3 header comment).
Real-world viability: Holds up under realistic use, not just the happy path. Determinism: export enumeration is declaration-order stable (same TS compiler + same input); the substrate version is pinned by pnpm-lock.yaml + CI's --frozen-lockfile, and the catalog header embeds the exact resolved version (catalog:12 agent-eval@0.99.0) so a divergently-resolved local copy fails loudly — the correct behavior. The genera
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 1

🔎 Heuristic Signals

🟡 Cruft: console debug added examples/self-improving-loop/self-improving-loop.ts

console.log( root cause: ${finding.claim})

🟡 Cruft: commented out code scripts/check-docs-freshness.mjs

+// CLASS 7 — PRIMITIVE-CATALOG: the generated anti-reinvention inventory

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260624T104715Z}

tangletools · 2026-06-24T11:12:01Z

❌ Needs Work — `be4c1288`

Readiness 32/100 · Confidence 95/100 · 30 findings (1 high, 4 medium, 25 low)

	opencode-kimi	glm	deepseek	aggregate
Readiness	32	62	51	32
Confidence	95	95	95	95
Correctness	32	62	51	32
Security	32	62	51	32
Testing	32	62	51	32
Architecture	32	62	51	32

Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision.

Blocking

🔴 HIGH Pre-existing high-severity transitive dependency ws@8.20.1 — pnpm-lock.yaml

pnpm audit reports GHSA-96hv-2xvq-fx4p / CVE-2026-48779: ws@8.20.1 (Memory exhaustion DoS from tiny fragments). Path: .>@tangle-network/agent-eval>@tangle-network/tcloud>viem>ws. The base lockfile (cfacf1c) already pinned ws@8.20.1, so this is not a regression from the agent-eval bump, but the lockfile still contains an unpatched high-severity transitive dependency. Recommend updating tcloud/viem or adding a pnpm override to ws>=8.21.0 in a follow-up.

Other

🟠 MEDIUM JSDoc {@link} tags leak into plain markdown summaries — docs/api/primitive-catalog.md

14 symbol summaries contain JSDoc {@link ...} tags that do not render as hyperlinks in standard markdown viewers (GitHub, editors). Examples: line 27 {@link buildLoopOtelSpans}, line 30 {@link RuntimeHooks}, line 73 {@link runPersonaConversation}. The generator's firstDocLine() passes TSDoc raw through ts.displayPartsToString() without stripping {@link} markup. Fix: add a regex pass in the ge

🟠 MEDIUM agent-eval dev vs peer dependency floor mismatch — package.json

devDependencies pins @tangle-network/agent-eval to >=0.99.0 <1.0.0 (line 92) while peerDependencies still allows >=0.97.0 <1.0.0 (line 120). The repo is built and tested against 0.99.0 (pnpm-lock.yaml resolves to 0.99.0), but consumers are told 0.97.0 is acceptable. Because this package imports dozens of agent-eval runtime symbols (src/runtime/index.ts re-exports AnalystFinding/computeFindingId/makeFinding; src/lifecycle/gate.ts imports HeldOutGate; src/run.ts, src/runtime/strategy.ts, and many others import ChatClient/estimateCost/etc.) and agent-eval uses 0.x ver

🟠 MEDIUM dev/peer dependency floor mismatch leaves 0.97.0-0.98.x untested — package.json

package.json:92 (devDependencies): @tangle-network/agent-eval floor is >=0.99.0. package.json:120 (peerDependencies): @tangle-network/agent-eval floor is >=0.97.0. CI resolves dev version 0.99.0 (pnpm-lock.yaml:19 confirms 0.99.0 is locked). Consumers can install 0.97.0-0.98.x and won't hit CI failures. The codebase has 64 import ... from '@tangle-network/agent-eval' sites using both types (safe, erased) and runtime values (HeldOutGate, createChatClient, estimateCost, gepaProposer, etc.). Any runtime symbol added in 0.98.0-0.99.0 that's used here will crash consumers on 0.97.x. Commit 2a6e393 acknowledges the intent, but there is no CI matrix running the peer floor. Mitigation: either add a CI matrix entry resolving agent-eval@0.97.0, or bump the peer floor to 0.99.0 to match.

🟠 MEDIUM Predictable temp path and leak on generator failure — scripts/check-docs-freshness.mjs

Line 587 builds tmpOut with join(tmpdir(), \primitive-catalog-check-${process.pid}.md`). The name is predictable (only PID), lives in a shared directory, and writeFileSyncfollows symlinks, so a symlink planted at that path can cause the generator to overwrite an arbitrary file that the check then reads. Also, on theres.status !== 0branch ([lines 593-599](https://github.com/tangle-network/agent-runtime/blob/be4c1288cdfffe5cd7be6c151da3d62ea4d51909/scripts/check-docs-freshness.mjs#L593-L599))tmpOutis never removed, leaking the temp file. Fix: create a private temp directory withmkdtempSync(or a random filename withO_CREAT|O_EXCL`) and d

🟡 LOW Section heading + intro say 'two layers' but table now lists three — docs/MAINTAINING.md

Line 5 ('splits its API documentation into two layers') and line 9 ('## The two layers') were not updated when the new 'Generated inventory' row was added at line 14. The table now has 3 rows (Generated reference, Generated inventory, Judgment) while the heading still says two. The intro's 'two layers with opposite maintenance rules' is still conceptually correct (generated-vs-judgment), since both generated rows share the same mainten

🟡 LOW Catalog is mostly placeholder summaries — docs/api/primitive-catalog.md

580 of 1258 catalog rows contain _(no summary — add a TSDoc line at the declaration)_ (e.g., lines 22, 24, 32, 34). This is generated faithfully from source declarations that lack a one-line TSDoc comment, so the file is not wrong, but it substantially reduces the usefulness of the anti-reinvention inventory. The fix is to add one-line TSDoc summaries to the underlying source declarations and rerun pnpm run docs:api; no hand-editing of the generated catalog.

🟡 LOW Cross-category symbol duplication (llmJudge, JudgeConfig) — docs/api/primitive-catalog.md

llmJudge and JudgeConfig appear both in JUDGE (root . barrel, lines ~1057/1053) and CAMPAIGN (./campaign subpath, lines 1196/1281) because both subpaths genuinely re-export them. This is accurate (importable from either path) but a reader scanning the inventory sees the same symbol twice under different 'import from' labels with no cross-reference. Cosmetic only; the canonical-api.md decision table resolves which to use. No fix required for correctness.

🟡 LOW Filtered root-barrel categories share one import label — docs/api/primitive-catalog.md

JUDGE/VERIFICATION/STATISTICS/TOKEN categories all read the root @tangle-network/agent-eval barrel with name-pattern filters, so each prints 'Import from @tangle-network/agent-eval — N exports' where N is the filtered subset (e.g. VERIFICATION shows 10 of the barrel's ~300). A reader could misread this as the barrel having only 10 exports total. This is by-design judgment (category->subpath mapping) and the §2 intro states symbols are filtered, but the per-section 'N exports' wording undersells the barrel size. Documentation clarity nit only.

🟡 LOW Large fraction of symbols carry no TSDoc summary — docs/api/primitive-catalog.md

Many rows render '(no summary — add a TSDoc line at the declaration)' (e.g. agenticGenerator:22, auditLoopRunner:24, createConversationBackend:32, and the majority of the MCP/campaign surfaces). The generator deliberately surfaces missing docs rather than hiding them, so this is honest output, but it reduces the catalog's value as an anti-reinvention reference — a reader learns a symbol exists but not what it does. Impact is documentation quality, not correctness. Fix is upstream: add first-line TSDoc at the declarations; the catalog regenerates automatically. Not a merge blocker.

🟡 LOW Forward-reference to catalog's §2 is fragile if catalog is regenerated with renumbered sections — docs/canonical-api.md

Line 5 says 'the catalog's §2 shows exactly which subpath each lives under' and line 31 lists the agent-eval surfaces by name. Section numbers in a GENERATED file are not contractually stable — if scripts/gen-primitive-catalog.mjs ever inserts a new top-level ## section, '§2' silently points at the wrong group. The freshness gate (check-docs-freshness.mjs) checks for export/symbol drift and stale file:line, NOT for section-number drift in canonical-api.md's prose. Low impact (the named surface list still works as a fallback identifier), but a future

🟡 LOW Mermaid diagram still shows mean flow, now gate uses paired arrays — examples/self-improving-loop/README.md

Line 55-56: Diagram shows v0mean -.v0 paired.-> gate and v1mean --> gate, but the gate function now takes v0Scores[] and v1Scores[] (per-score arrays, not means). The text was updated to say 'pairedBootstrap(v0, v1).low > 0?' which is correct, but the diagram nodes feeding the gate are labeled as means. Visual-only inconsistency; the runtime code at self-improving-loop.ts:259-261 correctly extracts per-run scores.

🟡 LOW README references 'Phase 6' (re-run multishot) in the prose '7-phase' framing but the mermaid only renders P1-P5 — examples/self-improving-loop/README.md

Header comment in self-improving-loop.ts:5-14 lists 7 phases (baseline, multishot, judges, analyst, apply mutation, re-run, gate), but README mermaid (lines 23-60) only renders Phase 1, 2, 3, 4, 5 — Phase 4 in the diagram is the v1 re-run, conflating the code's phases 6 and 4. Pre-existing (the diff only touched P2/P5 box labels and the ship/hold edge text), not introduced here, but worth aligning if you ever do another docs pass.

🟡 LOW No typecheck verification possible in this worktree — examples/self-improving-loop/self-improving-loop.ts

Line 21: import { type AnalystFinding, makeFinding, pairedBootstrap } from '@tangle-network/agent-eval' — node_modules is absent from this worktree so tsc cannot verify the import paths. Static grep confirms these are real exports used in production code (src/runtime/run-benchmark.ts:16, src/runtime/index.ts:15). The import pattern matches existing callers exactly. Low risk; CI should catch any real mismatch.

🟡 LOW Paired bootstrap at n=3 is statistically degenerate; 'production statistical core' framing oversells what the demo exercises — examples/self-improving-loop/self-improving-loop.ts

gate() at line 166 calls pairedBootstrap(v0Scores, v1Scores, { seed: 42 }) with n=3 (one shot per persona). With n=3 the bootstrap resamples can only recombine the 3 observed deltas [5,6,5], so the CI degenerates to a permutation of those values ([5.00, 6.00] here). The demo reports n=3 honestly in the verdict string, but the README (line 20) and the gate comment ([lines 154-160](https://github.com/tangle-network/agent-runtime/blob/be4c1288cdfffe5cd7be6c151da3d62ea4d51909/examples/self-im

🟡 LOW gate() has no length-mismatch guard on input arrays — examples/self-improving-loop/self-improving-loop.ts

Line 162-166: gate(v0Scores, v1Scores) passes both arrays to pairedBootstrap without checking they have equal length. In current usage both derive from the same PERSONAS loop so lengths always match, but the function is a named export within the file and could be called with misaligned arrays. Adding if (v0Scores.length !== v1Scores.length) throw ... would make the contract explicit. Not a runtime bug in the current demo.

🟡 LOW recommended_action ?? '' fallback would silently produce a degenerate v1 profile — examples/self-improving-loop/self-improving-loop.ts

Line 227: const mutation = finding.recommended_action ?? ''. If a future analyst body emits an AnalystFinding without recommended_action (the field is optional on the canonical type per src/analyst/types.ts:50), applyMutation would produce a systemPrompt ending in '\n\nIMPROVED v1: ' — a no-op mutation that v1 would then 'promote' on the paired bootstrap of unchanged scores. runAnalyst in this file always sets recommended_action so it cannot fire today, but the silent-fallback pattern is the wrong default for a pedagogical template other analysts will copy. Prefer `if (!finding.recommended_action) throw new Error('analyst: finding missi

🟡 LOW Asymmetric agent-eval devDep (0.99.0) vs peerDep (0.97.0) floor — package.json

Lines 92 vs 120: devDep is >=0.99.0 <1.0.0 while peerDep is >=0.97.0 <1.0.0. Commit 2a6e393 makes the intent explicit and correct ('Only the examples need 0.99.0 for llmJudge; agent-runtime's src does not'). Validated by grepping src/ for agent-eval imports — all use stable APIs (AgentEvalError, AnalystFinding, gepaProposer, createChatClient, etc.) present at 0.97.0. Minor risk: a future src/ change that starts using a 0.99.0-only API would silently compile against the devDep but break consumers on 0.97.0/0.98.0. No action required for this PR; consider adding a CI assertion (e.g. typecheck against the peer floor) if this pattern recurs.

🟡 LOW No automated test for the new CATALOG class — scripts/check-docs-freshness.mjs

Lines 567-613 add a non-trivial check that shells out to the generator and compares outputs. No tests in the repo exercise this script (grep found none), so regressions are only caught by the full CI docs:check step. Consider a minimal test that stubs the generator to assert stale/missing catalog detection.

🟡 LOW PID-only temp filename risks collision under concurrent CI — scripts/check-docs-freshness.mjs

Line 587: join(tmpdir(), primitive-catalog-check-${process.pid}.md) uses only the PID for uniqueness. In containerized CI where PIDs can recycle across jobs, two concurrent runs of this script could step on each other's temp file. Low probability in practice; adding Date.now() or crypto.randomUUID() would make it deterministic-safe. The same pattern appears in gen-primitive-catalog.mjs:114 for its virtual entry file.

🟡 LOW Success output omits catalog check — scripts/check-docs-freshness.mjs

Lines 622-633: the success message enumerates every checked CLASS (version, substrate peers, citations, §2 table, §3 signatures, exports↔typedoc, prose symbols) but omits any mention of the CLASS 7 primitive-catalog check. Cosmetic — the check still runs, but a user reading CI output won't see primitive catalog: OK in the report.

🟡 LOW Temp file not cleaned on generator failure — scripts/check-docs-freshness.mjs

Lines 593-599: when res.status !== 0, the error is reported but rmSync(tmpOut) is never called. The temp file (primitive-catalog-check-<pid>.md in /tmp) is orphaned. Low impact — small file, /tmp is ephemeral. The success path (line 603) does clean up explicitly.

🟡 LOW Temp file not cleaned on generator-failure branch — scripts/check-docs-freshness.mjs

At scripts/check-docs-freshness.mjs:593-599, when res.status !== 0 the block reports CATALOG and continues without rmSync(tmpOut). The success branch (line 603) does clean up. In practice gen-primitive-catalog.mjs exits before its writeFileSync(outPath, rendered) (line 339) on every error path, so the leak is effectively unreachable; still, for symmetry and defense-in-depth, rmSync the tmp path in the failure branch too. Impact: negligible — leftover empty/missing file in tmpdir.

🟡 LOW spawnSync error/signal cases not reported — scripts/check-docs-freshness.mjs

Line 593 only checks res.status !== 0. If the child is killed by a signal, res.status is null and the message prints 'exit null'. If spawnSync itself fails, res.error is set but ignored. Handle res.error and res.signal for clearer failure diagnostics.

🟡 LOW ./runtime alias silently suppresses any future export — scripts/gen-primitive-catalog.mjs

Line 61 adds ./runtime to ownSubpathAliases, so the loop at lines 196-199 skips it without requiring a label. If a future maintainer adds a ./runtime export to package.json, it will be omitted from the catalog with no error. Consider removing the alias or adding a guard that errors if both ./loops and ./runtime exist.

🟡 LOW Build tooling excludes scripts — scripts/gen-primitive-catalog.mjs

package.json lint is biome check src tests examples and tsconfig includes only src, while biome.json includes only src/**, tests/**, and examples/**. The new generator is therefore not covered by lint or typecheck. Evidence: package.json line 85, tsconfig.json include: ['src'], biome.json files.includes.

🟡 LOW Log message hardcodes output path; misleading when PRIMITIVE_CATALOG_OUT is set — scripts/gen-primitive-catalog.mjs

Line 344–347: the console.error summary says "wrote docs/api/primitive-catalog.md" regardless of outPath. When the freshness gate sets PRIMITIVE_CATALOG_OUT to a temp path, this message is incorrect — the actual write target is e.g. /tmp/primitive-catalog-check-12345.md. Fix: reference outPath in the log message (e.g. wrote ${outPath}) so it reflects the real output destination.

🟡 LOW No unit-level tests for the generator script — scripts/gen-primitive-catalog.mjs

The script is integration-tested via check-docs-freshness.mjs CLASS 7 (regenerates + diffs), but has no direct tests for its extraction, filtering, or rendering logic. If the TS compiler API usage (extractModules, resolveAlias, declKind, firstDocLine) regresses, it would be caught as a catalog diff in CI — but the root cause would require manual diagnosis. Consider adding a vitest test that calls extractModules with a known fixture package to verify symbol extraction, filter application, and markdown rendering independently.

🟡 LOW Orphaned virtual-entry dotfile not in .gitignore — scripts/gen-primitive-catalog.mjs

L114 writes repoRoot/.primcat-entry-${process.pid}.mts and L149's rmSync(entry, {force:true}) only runs in the finally block. On SIGKILL / OOM / CI timeout / laptop force-shutdown the file survives in repoRoot as an untracked dotfile. Verified .gitignore (9 lines) has no .primcat* entry, so a careless git add . sweeps it in. The repoRoot location is correctly justified (L110-112: Node module resolution walks up from the entry's dir, so /tmp can't see this repo's node_modules and the self-reference fails) — so the fix is not to relocate, but to add .primcat-entry-*.mts to .gitignore. Cheap, durable.

🟡 LOW Substrate filters use unanchored substring matches — scripts/gen-primitive-catalog.mjs

Line 73: judgeFilter matches Calibration anywhere, and line 91: usageFilter matches UsageEvent anywhere. A future symbol such as CalibrationData or MyUsageEventHandler could be miscategorized into JUDGE or TOKEN/USAGE. Tighten to anchored patterns where the intent is a whole-symbol match.

_{tangletools · 2026-06-24T11:11:58Z · trace}

tangletools

❌ 1 Blocking Finding — `be4c1288`

Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 8/8 planned shots over 9 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary

_{tangletools · 2026-06-24T11:11:58Z · immutable trace}

tangletools previously approved these changes Jun 24, 2026

View reviewed changes

drewstone dismissed tangletools’s stale review via 22dedbf June 24, 2026 09:04

tangletools previously approved these changes Jun 24, 2026

View reviewed changes

fix(deps): keep agent-eval peer floor at >=0.97.0

2a6e393

Only the examples (devDependency) need 0.99.0 for llmJudge; agent-runtime's src does not, so the peer floor must not force consumers onto 0.99.0. Catalog + lockfile stay on the resolved 0.99.0 so the examples get llmJudge.

drewstone dismissed tangletools’s stale review via 2a6e393 June 24, 2026 09:06

tangletools previously approved these changes Jun 24, 2026

View reviewed changes

drewstone dismissed tangletools’s stale review via be4c128 June 24, 2026 09:09

tangletools approved these changes Jun 24, 2026

View reviewed changes

drewstone mentioned this pull request Jun 24, 2026

docs(canonical-api): close the anti-reinvention gaps + de-reinvent the examples #370

Closed

tangletools reviewed Jun 24, 2026

View reviewed changes

tangletools requested changes Jun 24, 2026

View reviewed changes

drewstone merged commit 3a9acb6 into main Jun 24, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(api): generate the primitive catalog so the anti-reinvention reference cannot go stale#371

docs(api): generate the primitive catalog so the anti-reinvention reference cannot go stale#371
drewstone merged 4 commits into
mainfrom
docs/generated-primitive-catalog

drewstone commented Jun 24, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools commented Jun 24, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 24, 2026

Problem

What this does

Finding surfaced by doing this

Verify

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — dd248315

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 22dedbfd

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 2a6e393b

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — be4c1288

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟢 Value Audit — sound

💰 Value — sound

🎯 Usefulness — sound

🔎 Heuristic Signals

Uh oh!

tangletools commented Jun 24, 2026

❌ Needs Work — be4c1288

Blocking

Other

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

❌ 1 Blocking Finding — be4c1288

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `dd248315`

✅ Auto-approved PR — `22dedbfd`

✅ Auto-approved PR — `2a6e393b`

✅ Auto-approved PR — `be4c1288`

❌ Needs Work — `be4c1288`

❌ 1 Blocking Finding — `be4c1288`