chore: update Tangle agent packages by drewstone · Pull Request #2 · tangle-network/agent-runtime

drewstone · 2026-05-08T10:32:33Z

Updates published Tangle agent packages:\n\n- @tangle-network/agent-eval to 0.20.12\n- @tangle-network/agent-knowledge to 1.2.0 where used\n\nLockfiles were refreshed where present.

Sweep removes commentary that describes what code used to do, what bug it replaces, or which audit found a pattern — per the CLAUDE.md doc discipline. Trims "(NEW in 0.7.0)" markers and legal-agent migration narrative from README and example docs. Deletes two orphan docs under docs/ that no source or doc references; both were point-in-time release/issue snapshots that no longer describe current state. - src/runtime-run.ts: drop "replaces legal-agent's bespoke..." paragraph from module doc; tighten complete() and randomSuffix comments. - src/trace-bridge.ts: drop "Before this module, consumers hand-rolled..." paragraph; reword tool_call args-omission and text_delta drop comments to describe current behaviour. - src/sanitize.ts: drop "the unified-union alternative was rejected because..." narrative on createRuntimeStreamEventCollector. - src/chat-turn.ts: drop "Caller pattern (replaces ~400 lines of legal/gtm/creative chat-runtime wrappers)" and tax-agent file:line reference; reword transport / fallback comments. - src/profile-conformance.ts: strip "from the canonical audit" / "#2 anti-pattern in the canonical audit" from issue messages and reword system-prompt-too-short message; trim docstring. - src/profile-conformance.test.ts: rename "the gtm-agent anti-pattern audit-found is caught" -> describes current behaviour; same for the describe block + shell-cap test. - src/index.ts: drop "(compat surface)" and "(new in 0.7.0)" section banners. - README.md: drop "(NEW in 0.7.0)" markers from quickstart table and section headers; drop legal-agent migration narrative. - examples/runtime-run: same treatment in README + .ts header. - docs/domain-agent-runtime-integration-issues.md: deleted (165 lines of issue drafts referencing "GitHub connector returns 404"; zero references in tree). - docs/product-runtime-kernel.md: deleted (326-line completion record for 0.5.0-0.5.2 release process; zero references in tree). - package.json: drop "docs" from files (directory is gone). Verification: pnpm typecheck, pnpm test (68 passing, unchanged), pnpm build all pass.

…-loud session continuity Resolve all six findings from the review (none blocked landing; #1 gated enabling, #3/#4 wanted documenting). Lineage remains default-OFF and byte-identical to the fresh-box path when both flags are unset. - #1 sessionContinuity silent no-op: `continue` now asserts the session is still known to the sandbox via `box.session(id).status()` before streaming. A `null` (platform never honored the client-minted id, or it was reaped) raises a ValidationError, which executeIteration now propagates as a hard structural failure instead of degrading to a soft empty iteration — so a non-honoring platform errors loudly rather than running contextless turns. - #2 unbounded fork creation: `fork` provisions child boxes through `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single `Promise.all` over all N branches. - #3 fork ignores per-branch specs: documented on `fork` and `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent image/profile (per-branch specs apply only on the degraded fresh path). - #4 lineage holds every box to loop end: kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own `parentIndex`. The unprunable case is documented as the box ceiling. - #5 abort during fork: documented the SDK's signal-less fork; abort is now checked per branch (between bounded waves) + an abort-under-lineage test. - #6 export order: alphabetized the loops barrel. Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/ fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent, abort-under-lineage). 627 tests pass, typecheck + biome clean.

@experimental

…r runLoop (backend-blind) (#150) * feat(loops): opt-in session continuation + checkpoint-fork lineage (backend-blind) Two @experimental, default-OFF seams on runLoop so a loop can CONTINUE a sandbox session across iterations (same box + sessionId, no prompt-text replay) and FORK fanout branches from a parent checkpoint (shared context prefix) — both behind a capability probe so the kernel asks 'can I fork?' (client.criuStatus) and never names Docker/Firecracker, degrading to fresh boxes when CRIU is absent. - sandbox-capabilities.ts: memoized, fail-closed criuStatus probe -> {canFork}. - sandbox-lineage.ts: createSandboxLineage owns box+session handles with start/continue/fork/teardown; reuses the kernel's acquireSandbox / buildBackendOptions / deleteBoxSafe; fail-loud if the probe says canFork but the box has no fork(). - run-loop.ts: RunLoopOptions.lineage (sessionContinuity / forkFanout); refine continues, fanout forks-once, else fresh-through-lineage. Default OFF is byte-identical to today, so random@k stays N independent fresh boxes (the compute-control invariant). Rejects lineage + onWorkerBox (both own boxes). - 7 new unit tests (continuation reuses session; fork when canFork; fresh fallback; default-off invariant). Full suite 621 pass, typecheck clean. * fix(loops): address PR #150 review — bound forks, prune lineage, fail-loud session continuity Resolve all six findings from the review (none blocked landing; #1 gated enabling, #3/#4 wanted documenting). Lineage remains default-OFF and byte-identical to the fresh-box path when both flags are unset. - #1 sessionContinuity silent no-op: `continue` now asserts the session is still known to the sandbox via `box.session(id).status()` before streaming. A `null` (platform never honored the client-minted id, or it was reaped) raises a ValidationError, which executeIteration now propagates as a hard structural failure instead of degrading to a soft empty iteration — so a non-honoring platform errors loudly rather than running contextless turns. - #2 unbounded fork creation: `fork` provisions child boxes through `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single `Promise.all` over all N branches. - #3 fork ignores per-branch specs: documented on `fork` and `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent image/profile (per-branch specs apply only on the degraded fresh path). - #4 lineage holds every box to loop end: kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own `parentIndex`. The unprunable case is documented as the box ceiling. - #5 abort during fork: documented the SDK's signal-less fork; abort is now checked per branch (between bounded waves) + an abort-under-lineage test. - #6 export order: alphabetized the loops barrel. Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/ fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent, abort-under-lineage). 627 tests pass, typecheck + biome clean.

…es agent-eval) createTrajectoryRecorder (supervise/trajectory-recorder.ts) — the post-hoc half of the analyst pipe. Replays a worker's captured tool steps as agent-eval spans (InMemoryTraceStore) and runs its PUBLISHED batch analyzers — buildTrajectory (structured run summary), stuckLoopView (full-run repeated-call view, complementing the online consecutive detector), toolWasteView. No analysis reimplemented; the thin bridge from live tool steps to the substrate trace model. Feeds from the same onToolStep seam as the online monitor. 3 recorder tests (real spans → real agent-eval findings); full suite 1017 pass; typecheck/build/lint clean. Closes both legs: online (mid-run) + settle (post-hoc).

…orker (#318) * feat(supervise): bidirectional bus — down-leg (steer/answer/resume) + resume_worker Close the bus to 100% bidirectional. The parent→child down-leg routes to the child inbox (scope.send→deliver) AND records a queue:false event on the same bus: it lands in history() + reaches subscribers for the audit trail, but is never pulled back by the parent. New: resume_worker (continue a parked worker — the protocol had {resume} but no verb); answer_question now routes the answer DOWN to the asking worker, unparking it. EventBus gains PublishOptions.queue for record-only events. down-leg + bidirectional history tests; full suite 1000 pass; typecheck/build/lint clean. * fix(supervise): answer_question returns delivered; close down-leg review gaps Address PR #318 review: - BLOCKING: answer_question computed `delivered` but returned only { question } — now returns { question, delivered }, consistent with steer_worker/resume_worker (no longer hides whether the answer reached a live worker). - tests: answer routed down to a LIVE worker (delivered:true happy path); resume_worker delivered:false path; a focused event-bus queue:false unit test (history+subscribers see it, pull queue never does). - resume_worker added to OPERATOR_TOOLS + the driver system prompt so the driver is actually prompted to use it. * feat(supervise): functional down-leg — workers drain a steerable inbox Make the down-leg actually move a live worker (was observable-only). New createInbox (supervise/inbox.ts) is the receive end an executor exposes as Executor.deliver; the owned tool-loop (routerToolsInlineExecutor) drains it two ways: - QUEUED (default): flush at each step boundary AND before the worker may settle — it can't finish while a steer/answer it never read is pending. - FORCEFUL (steer_worker interrupt:true): aborts the in-flight turn so the worker re-plans immediately, breaking it off a wrong path mid-task. Black-box CLI harnesses can't be interrupted mid-step → down-leg degrades to next spawn. inbox 4 + executor-drains-inbox integration test (flush-before-settle proven end to end through the real executor); full suite 1008 pass; typecheck/build/lint clean. * chore(supervise): address review nits — accurate resume_worker desc, sendDown covers answer PR #318 audit follow-ups (non-blocking): - resume_worker description no longer implies a park/resume lifecycle the scope model lacks — a settled (drained) worker is gone; says so and points to spawning fresh. - sendDown now covers the 'answer' down-leg too (removes the inline bus.publish duplication; one helper for all three down kinds). - history() docstring lists the down-leg event kinds. full suite 1008 pass; typecheck/lint clean. * refactor(supervise): unify the coordination surface (12→10 tools) Simplify without losing capability: - MERGE steer_worker + resume_worker → one steer_worker (any live worker; the only real axis was interrupt forceful-vs-queued, already a param). 'Resume' = a non- interrupt steer. Removes a redundant verb + dissolves the resume-vs-steer prompt nits. - REMOVE await_next — it was a strict subset of await_event({kinds:['settled']}). One wait-verb now; callers/prompts pass kinds:['settled'] for the next finished worker. - DROP bus.peek() — speculative, only its own test used it (YAGNI). Down-leg event union + inbox shed the dead 'resume' kind. Full suite 1007 pass; typecheck/build/lint clean. * feat(supervise): online detector monitor on the worker pipe (reuses agent-eval kernel) createDetectorMonitor (supervise/detector-monitor.ts) — the online analyst on the live worker pipe. Folds each tool step through agent-eval 0.93.0's published streaming kernel (repeatedActionDetector/errorStreakDetector — the SAME kernel control-runtime folds; no detection logic reimplemented) and fires onSignal → a finding on the bus the moment a worker loops or error-storms. routerToolsInlineExecutor feeds it via a new onToolStep seam. Bumps agent-eval ^0.93.0. monitor tests (4); full suite 1011 pass; typecheck/build/lint clean. * fix(supervise): address #318 review + wire raiseFinding (last mile) Last mile: createCoordinationTools.raiseFinding (exposed on the MCP handle) — the seam an ONLINE detector uses to publish a finding on the live bus mid-run. Proven end-to-end: a stuck-loop on the worker pipe → monitor → raiseFinding → await_event surfaces it. Review fixes (audit on the earlier commit): - HIGH: AbortSignal.any (needs Node 20.3, floor is 20) → portable mergeAbortSignals. - forceful interrupt: docstring no longer overpromises (aborts in-flight inference, a tool mid-exec finishes first); interrupted turns no longer count toward maxTurns; added the e2e test (forceful steer aborts the turn, re-plans, aborted turn is free). - answer to a BLOCKING question is now delivered forcefully (interrupt) to unpark the worker immediately, not at its next boundary. - sendDown 'answer' now REQUIRES questionId (overload; no silent ?? '' mask). - tool-step status captured (error vs ok) for the error-streak detector. - stale await_next purged from bench prompts + docs; history() docstring drops 'resume'. - added tests: answer delivered:false + return asserted; await_event idle-on-mismatch. full suite 1014 pass; typecheck/build/lint clean. * feat(supervise): settle-time trajectory analyzer (last mile #2 — reuses agent-eval) createTrajectoryRecorder (supervise/trajectory-recorder.ts) — the post-hoc half of the analyst pipe. Replays a worker's captured tool steps as agent-eval spans (InMemoryTraceStore) and runs its PUBLISHED batch analyzers — buildTrajectory (structured run summary), stuckLoopView (full-run repeated-call view, complementing the online consecutive detector), toolWasteView. No analysis reimplemented; the thin bridge from live tool steps to the substrate trace model. Feeds from the same onToolStep seam as the online monitor. 3 recorder tests (real spans → real agent-eval findings); full suite 1017 pass; typecheck/build/lint clean. Closes both legs: online (mid-run) + settle (post-hoc). * fix(supervise): address audit on 27dd2ce (listener leak, crash-safety, comment accuracy) - mergeAbortSignals listener leak: pre-link external signals ONCE; per-turn add+remove the listener (no accumulation on long-lived signals over maxTurns). - interrupt catch now requires a real AbortError (DOMException) — a network fault coincident with an interrupt is no longer swallowed; rethrown. - corrected the comment: an interrupted+re-planned turn DOES consume a maxTurns slot (bounded backstop, not a hang) — it just doesn't bill a turn. - onToolStep is an observability side-channel: wrapped so a throwing monitor can't crash the worker loop; detector-monitor.observeToolStep also defends argHash on circular/unhashable args. - projectEvent preserves questionId on the answer branch. - stale await_next purged from skills/{supervise,loop-writer}; trimmed CLAUDE.md redundancy; softened the recorder's per-span-duration claim. full suite 1018 pass; typecheck/build/lint clean.

chore: update Tangle agent packages

9313058

drewstone merged commit 596e351 into main May 8, 2026

drewstone deleted the codex/update-agent-packages-20260508 branch May 8, 2026 23:43

drewstone mentioned this pull request May 17, 2026

chore(tech-debt): strip historical narrative, delete dead docs #12

Merged

This was referenced Jun 10, 2026

fix(bench): grid compression cache stores the promise — one call per key #247

Merged

feat(lifecycle): profile-artifact lifecycle as a runtime primitive — generate/eval/promote/maintain skills, tools, MCPs, hooks, subagents for free #267

Closed

drewstone mentioned this pull request Jun 17, 2026

feat(supervise): bidirectional coordination bus — down-leg + resume_worker #318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: update Tangle agent packages#2

chore: update Tangle agent packages#2
drewstone merged 1 commit into
mainfrom
codex/update-agent-packages-20260508

drewstone commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

drewstone commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant