Skip to content

chore: update Tangle agent packages#2

Merged
drewstone merged 1 commit into
mainfrom
codex/update-agent-packages-20260508
May 8, 2026
Merged

chore: update Tangle agent packages#2
drewstone merged 1 commit into
mainfrom
codex/update-agent-packages-20260508

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Updates published Tangle agent packages:\n\n- @tangle-network/agent-eval to 0.20.12\n- @tangle-network/agent-knowledge to 1.2.0 where used\n\nLockfiles were refreshed where present.

@drewstone drewstone merged commit 596e351 into main May 8, 2026
@drewstone drewstone deleted the codex/update-agent-packages-20260508 branch May 8, 2026 23:43
tangletools pushed a commit that referenced this pull request May 17, 2026
Sweep removes commentary that describes what code used to do, what bug it
replaces, or which audit found a pattern — per the CLAUDE.md doc discipline.
Trims "(NEW in 0.7.0)" markers and legal-agent migration narrative from
README and example docs. Deletes two orphan docs under docs/ that no
source or doc references; both were point-in-time release/issue snapshots
that no longer describe current state.

- src/runtime-run.ts: drop "replaces legal-agent's bespoke..." paragraph
  from module doc; tighten complete() and randomSuffix comments.
- src/trace-bridge.ts: drop "Before this module, consumers hand-rolled..."
  paragraph; reword tool_call args-omission and text_delta drop comments
  to describe current behaviour.
- src/sanitize.ts: drop "the unified-union alternative was rejected
  because..." narrative on createRuntimeStreamEventCollector.
- src/chat-turn.ts: drop "Caller pattern (replaces ~400 lines of
  legal/gtm/creative chat-runtime wrappers)" and tax-agent file:line
  reference; reword transport / fallback comments.
- src/profile-conformance.ts: strip "from the canonical audit" / "#2
  anti-pattern in the canonical audit" from issue messages and reword
  system-prompt-too-short message; trim docstring.
- src/profile-conformance.test.ts: rename "the gtm-agent anti-pattern
  audit-found is caught" -> describes current behaviour; same for the
  describe block + shell-cap test.
- src/index.ts: drop "(compat surface)" and "(new in 0.7.0)" section
  banners.
- README.md: drop "(NEW in 0.7.0)" markers from quickstart table and
  section headers; drop legal-agent migration narrative.
- examples/runtime-run: same treatment in README + .ts header.
- docs/domain-agent-runtime-integration-issues.md: deleted (165 lines of
  issue drafts referencing "GitHub connector returns 404"; zero
  references in tree).
- docs/product-runtime-kernel.md: deleted (326-line completion record
  for 0.5.0-0.5.2 release process; zero references in tree).
- package.json: drop "docs" from files (directory is gone).

Verification: pnpm typecheck, pnpm test (68 passing, unchanged), pnpm build all pass.
drewstone added a commit that referenced this pull request May 17, 2026
Sweep removes commentary that describes what code used to do, what bug it
replaces, or which audit found a pattern — per the CLAUDE.md doc discipline.
Trims "(NEW in 0.7.0)" markers and legal-agent migration narrative from
README and example docs. Deletes two orphan docs under docs/ that no
source or doc references; both were point-in-time release/issue snapshots
that no longer describe current state.

- src/runtime-run.ts: drop "replaces legal-agent's bespoke..." paragraph
  from module doc; tighten complete() and randomSuffix comments.
- src/trace-bridge.ts: drop "Before this module, consumers hand-rolled..."
  paragraph; reword tool_call args-omission and text_delta drop comments
  to describe current behaviour.
- src/sanitize.ts: drop "the unified-union alternative was rejected
  because..." narrative on createRuntimeStreamEventCollector.
- src/chat-turn.ts: drop "Caller pattern (replaces ~400 lines of
  legal/gtm/creative chat-runtime wrappers)" and tax-agent file:line
  reference; reword transport / fallback comments.
- src/profile-conformance.ts: strip "from the canonical audit" / "#2
  anti-pattern in the canonical audit" from issue messages and reword
  system-prompt-too-short message; trim docstring.
- src/profile-conformance.test.ts: rename "the gtm-agent anti-pattern
  audit-found is caught" -> describes current behaviour; same for the
  describe block + shell-cap test.
- src/index.ts: drop "(compat surface)" and "(new in 0.7.0)" section
  banners.
- README.md: drop "(NEW in 0.7.0)" markers from quickstart table and
  section headers; drop legal-agent migration narrative.
- examples/runtime-run: same treatment in README + .ts header.
- docs/domain-agent-runtime-integration-issues.md: deleted (165 lines of
  issue drafts referencing "GitHub connector returns 404"; zero
  references in tree).
- docs/product-runtime-kernel.md: deleted (326-line completion record
  for 0.5.0-0.5.2 release process; zero references in tree).
- package.json: drop "docs" from files (directory is gone).

Verification: pnpm typecheck, pnpm test (68 passing, unchanged), pnpm build all pass.
tangletools pushed a commit that referenced this pull request Jun 4, 2026
…-loud session continuity

Resolve all six findings from the review (none blocked landing; #1 gated
enabling, #3/#4 wanted documenting). Lineage remains default-OFF and
byte-identical to the fresh-box path when both flags are unset.

- #1 sessionContinuity silent no-op: `continue` now asserts the session is
  still known to the sandbox via `box.session(id).status()` before streaming.
  A `null` (platform never honored the client-minted id, or it was reaped)
  raises a ValidationError, which executeIteration now propagates as a hard
  structural failure instead of degrading to a soft empty iteration — so a
  non-honoring platform errors loudly rather than running contextless turns.
- #2 unbounded fork creation: `fork` provisions child boxes through
  `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single
  `Promise.all` over all N branches.
- #3 fork ignores per-branch specs: documented on `fork` and
  `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent
  image/profile (per-branch specs apply only on the degraded fresh path).
- #4 lineage holds every box to loop end: kernel prunes boxes no future round
  can descend from after each round, gated on a kernel-inferred (monotonic)
  branch point — skipped when the driver authors its own `parentIndex`. The
  unprunable case is documented as the box ceiling.
- #5 abort during fork: documented the SDK's signal-less fork; abort is now
  checked per branch (between bounded waves) + an abort-under-lineage test.
- #6 export order: alphabetized the loops barrel.

Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/
fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent,
abort-under-lineage). 627 tests pass, typecheck + biome clean.
drewstone added a commit that referenced this pull request Jun 4, 2026
…r runLoop (backend-blind) (#150)

* feat(loops): opt-in session continuation + checkpoint-fork lineage (backend-blind)

Two @experimental, default-OFF seams on runLoop so a loop can CONTINUE a sandbox
session across iterations (same box + sessionId, no prompt-text replay) and FORK
fanout branches from a parent checkpoint (shared context prefix) — both behind a
capability probe so the kernel asks 'can I fork?' (client.criuStatus) and never
names Docker/Firecracker, degrading to fresh boxes when CRIU is absent.

- sandbox-capabilities.ts: memoized, fail-closed criuStatus probe -> {canFork}.
- sandbox-lineage.ts: createSandboxLineage owns box+session handles with
  start/continue/fork/teardown; reuses the kernel's acquireSandbox /
  buildBackendOptions / deleteBoxSafe; fail-loud if the probe says canFork but
  the box has no fork().
- run-loop.ts: RunLoopOptions.lineage (sessionContinuity / forkFanout); refine
  continues, fanout forks-once, else fresh-through-lineage. Default OFF is
  byte-identical to today, so random@k stays N independent fresh boxes (the
  compute-control invariant). Rejects lineage + onWorkerBox (both own boxes).
- 7 new unit tests (continuation reuses session; fork when canFork; fresh
  fallback; default-off invariant). Full suite 621 pass, typecheck clean.

* fix(loops): address PR #150 review — bound forks, prune lineage, fail-loud session continuity

Resolve all six findings from the review (none blocked landing; #1 gated
enabling, #3/#4 wanted documenting). Lineage remains default-OFF and
byte-identical to the fresh-box path when both flags are unset.

- #1 sessionContinuity silent no-op: `continue` now asserts the session is
  still known to the sandbox via `box.session(id).status()` before streaming.
  A `null` (platform never honored the client-minted id, or it was reaped)
  raises a ValidationError, which executeIteration now propagates as a hard
  structural failure instead of degrading to a soft empty iteration — so a
  non-honoring platform errors loudly rather than running contextless turns.
- #2 unbounded fork creation: `fork` provisions child boxes through
  `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single
  `Promise.all` over all N branches.
- #3 fork ignores per-branch specs: documented on `fork` and
  `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent
  image/profile (per-branch specs apply only on the degraded fresh path).
- #4 lineage holds every box to loop end: kernel prunes boxes no future round
  can descend from after each round, gated on a kernel-inferred (monotonic)
  branch point — skipped when the driver authors its own `parentIndex`. The
  unprunable case is documented as the box ceiling.
- #5 abort during fork: documented the SDK's signal-less fork; abort is now
  checked per branch (between bounded waves) + an abort-under-lineage test.
- #6 export order: alphabetized the loops barrel.

Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/
fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent,
abort-under-lineage). 627 tests pass, typecheck + biome clean.
drewstone added a commit that referenced this pull request Jun 17, 2026
…es agent-eval)

createTrajectoryRecorder (supervise/trajectory-recorder.ts) — the post-hoc half of the
analyst pipe. Replays a worker's captured tool steps as agent-eval spans (InMemoryTraceStore)
and runs its PUBLISHED batch analyzers — buildTrajectory (structured run summary),
stuckLoopView (full-run repeated-call view, complementing the online consecutive detector),
toolWasteView. No analysis reimplemented; the thin bridge from live tool steps to the
substrate trace model. Feeds from the same onToolStep seam as the online monitor.

3 recorder tests (real spans → real agent-eval findings); full suite 1017 pass;
typecheck/build/lint clean. Closes both legs: online (mid-run) + settle (post-hoc).
drewstone added a commit that referenced this pull request Jun 17, 2026
…orker (#318)

* feat(supervise): bidirectional bus — down-leg (steer/answer/resume) + resume_worker

Close the bus to 100% bidirectional. The parent→child down-leg routes to the child
inbox (scope.send→deliver) AND records a queue:false event on the same bus: it lands
in history() + reaches subscribers for the audit trail, but is never pulled back by
the parent. New: resume_worker (continue a parked worker — the protocol had {resume}
but no verb); answer_question now routes the answer DOWN to the asking worker, unparking
it. EventBus gains PublishOptions.queue for record-only events.

down-leg + bidirectional history tests; full suite 1000 pass; typecheck/build/lint clean.

* fix(supervise): answer_question returns delivered; close down-leg review gaps

Address PR #318 review:
- BLOCKING: answer_question computed `delivered` but returned only { question } —
  now returns { question, delivered }, consistent with steer_worker/resume_worker
  (no longer hides whether the answer reached a live worker).
- tests: answer routed down to a LIVE worker (delivered:true happy path); resume_worker
  delivered:false path; a focused event-bus queue:false unit test (history+subscribers
  see it, pull queue never does).
- resume_worker added to OPERATOR_TOOLS + the driver system prompt so the driver is
  actually prompted to use it.

* feat(supervise): functional down-leg — workers drain a steerable inbox

Make the down-leg actually move a live worker (was observable-only). New createInbox
(supervise/inbox.ts) is the receive end an executor exposes as Executor.deliver; the
owned tool-loop (routerToolsInlineExecutor) drains it two ways:
- QUEUED (default): flush at each step boundary AND before the worker may settle — it
  can't finish while a steer/answer it never read is pending.
- FORCEFUL (steer_worker interrupt:true): aborts the in-flight turn so the worker
  re-plans immediately, breaking it off a wrong path mid-task.
Black-box CLI harnesses can't be interrupted mid-step → down-leg degrades to next spawn.

inbox 4 + executor-drains-inbox integration test (flush-before-settle proven end to end
through the real executor); full suite 1008 pass; typecheck/build/lint clean.

* chore(supervise): address review nits — accurate resume_worker desc, sendDown covers answer

PR #318 audit follow-ups (non-blocking):
- resume_worker description no longer implies a park/resume lifecycle the scope model
  lacks — a settled (drained) worker is gone; says so and points to spawning fresh.
- sendDown now covers the 'answer' down-leg too (removes the inline bus.publish
  duplication; one helper for all three down kinds).
- history() docstring lists the down-leg event kinds.

full suite 1008 pass; typecheck/lint clean.

* refactor(supervise): unify the coordination surface (12→10 tools)

Simplify without losing capability:
- MERGE steer_worker + resume_worker → one steer_worker (any live worker; the only
  real axis was interrupt forceful-vs-queued, already a param). 'Resume' = a non-
  interrupt steer. Removes a redundant verb + dissolves the resume-vs-steer prompt nits.
- REMOVE await_next — it was a strict subset of await_event({kinds:['settled']}).
  One wait-verb now; callers/prompts pass kinds:['settled'] for the next finished worker.
- DROP bus.peek() — speculative, only its own test used it (YAGNI).

Down-leg event union + inbox shed the dead 'resume' kind. Full suite 1007 pass;
typecheck/build/lint clean.

* feat(supervise): online detector monitor on the worker pipe (reuses agent-eval kernel)

createDetectorMonitor (supervise/detector-monitor.ts) — the online analyst on the live
worker pipe. Folds each tool step through agent-eval 0.93.0's published streaming kernel
(repeatedActionDetector/errorStreakDetector — the SAME kernel control-runtime folds; no
detection logic reimplemented) and fires onSignal → a finding on the bus the moment a
worker loops or error-storms. routerToolsInlineExecutor feeds it via a new onToolStep seam.

Bumps agent-eval ^0.93.0. monitor tests (4); full suite 1011 pass; typecheck/build/lint clean.

* fix(supervise): address #318 review + wire raiseFinding (last mile)

Last mile: createCoordinationTools.raiseFinding (exposed on the MCP handle) — the seam
an ONLINE detector uses to publish a finding on the live bus mid-run. Proven end-to-end:
a stuck-loop on the worker pipe → monitor → raiseFinding → await_event surfaces it.

Review fixes (audit on the earlier commit):
- HIGH: AbortSignal.any (needs Node 20.3, floor is 20) → portable mergeAbortSignals.
- forceful interrupt: docstring no longer overpromises (aborts in-flight inference, a
  tool mid-exec finishes first); interrupted turns no longer count toward maxTurns;
  added the e2e test (forceful steer aborts the turn, re-plans, aborted turn is free).
- answer to a BLOCKING question is now delivered forcefully (interrupt) to unpark the
  worker immediately, not at its next boundary.
- sendDown 'answer' now REQUIRES questionId (overload; no silent ?? '' mask).
- tool-step status captured (error vs ok) for the error-streak detector.
- stale await_next purged from bench prompts + docs; history() docstring drops 'resume'.
- added tests: answer delivered:false + return asserted; await_event idle-on-mismatch.

full suite 1014 pass; typecheck/build/lint clean.

* feat(supervise): settle-time trajectory analyzer (last mile #2 — reuses agent-eval)

createTrajectoryRecorder (supervise/trajectory-recorder.ts) — the post-hoc half of the
analyst pipe. Replays a worker's captured tool steps as agent-eval spans (InMemoryTraceStore)
and runs its PUBLISHED batch analyzers — buildTrajectory (structured run summary),
stuckLoopView (full-run repeated-call view, complementing the online consecutive detector),
toolWasteView. No analysis reimplemented; the thin bridge from live tool steps to the
substrate trace model. Feeds from the same onToolStep seam as the online monitor.

3 recorder tests (real spans → real agent-eval findings); full suite 1017 pass;
typecheck/build/lint clean. Closes both legs: online (mid-run) + settle (post-hoc).

* fix(supervise): address audit on 27dd2ce (listener leak, crash-safety, comment accuracy)

- mergeAbortSignals listener leak: pre-link external signals ONCE; per-turn add+remove the
  listener (no accumulation on long-lived signals over maxTurns).
- interrupt catch now requires a real AbortError (DOMException) — a network fault coincident
  with an interrupt is no longer swallowed; rethrown.
- corrected the comment: an interrupted+re-planned turn DOES consume a maxTurns slot (bounded
  backstop, not a hang) — it just doesn't bill a turn.
- onToolStep is an observability side-channel: wrapped so a throwing monitor can't crash the
  worker loop; detector-monitor.observeToolStep also defends argHash on circular/unhashable args.
- projectEvent preserves questionId on the answer branch.
- stale await_next purged from skills/{supervise,loop-writer}; trimmed CLAUDE.md redundancy;
  softened the recorder's per-span-duration claim.

full suite 1018 pass; typecheck/build/lint clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant