Skip to content

fix(agents): preserve accepted spawn terminal success#85054

Closed
samzong wants to merge 2 commits into
openclaw:mainfrom
samzong:fix/accepted-spawn-terminal
Closed

fix(agents): preserve accepted spawn terminal success#85054
samzong wants to merge 2 commits into
openclaw:mainfrom
samzong:fix/accepted-spawn-terminal

Conversation

@samzong
Copy link
Copy Markdown
Contributor

@samzong samzong commented May 21, 2026

Summary

  • Problem: pure-relay agents that successfully accepted a sessions_spawn child session could still be classified as an incomplete empty parent turn.
  • Solution: carry a structured accepted-spawn fact from tool completion through the embedded runner and fallback/replay classifiers.
  • What changed: valid accepted sessions_spawn results now require status: "accepted", non-empty runId, and non-empty childSessionKey; that evidence suppresses only the false incomplete-turn path and prevents duplicate/replay fallback paths.
  • CI drift fixed after rebasing to current main: two qa-lab Codex lifecycle fixtures are now optional Knip unused-file entries so the dependency shard matches the scenario-driven fixture contract.
  • What did NOT change (scope boundary): failed/malformed spawns, parent prompt timeouts, messaging delivery errors, and unrelated tool side effects still use the existing safety/error paths.

Motivation

  • Fixes a real session-state false negative where the runtime accepted a child session but the parent run surfaced Agent couldn't generate a response anyway.

Change Type (select all)

  • Bug fix
  • Refactor required for the fix
  • Feature
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real behavior proof (required for external PRs)

  • Behavior addressed: accepted sessions_spawn terminal success no longer trips incomplete-turn fallback, abandoned lifecycle state, replay/fallback retries, or duplicate child-session spawn paths.
  • Real environment tested: OpenClaw QA Lab host runner on macOS Darwin 25.4.0, Node v24.14.0, pnpm 11.1.0, using qa-channel + qa-lab bus + real gateway child + mock-openai provider.
  • Exact steps or command run after this patch:
pnpm openclaw qa suite --scenario runtime-tool-sessions-spawn --provider-mode mock-openai --transport qa-channel --concurrency 1 --output-dir .artifacts/real-proof/issue-72541-runtime-tool-sessions-spawn-rebased-final
pnpm openclaw qa suite --scenario subagent-handoff --provider-mode mock-openai --transport qa-channel --concurrency 1 --output-dir .artifacts/real-proof/issue-72541-subagent-handoff-rebased-final
  • Evidence after fix:
Runtime tool fixture - sessions_spawn
Status: pass
Passed: 1
Failed: 0
Details: sessions_spawn happy planned args and denied-input failure path were exercised through the QA channel.

Subagent handoff
Status: pass
Passed: 1
Failed: 0
Result: { "status": "accepted", "childSessionKey": "agent:qa:subagent:10b50ee2-860b-45e4-9a9b-1737a6b28d7a", "runId": "f4a2e737-0922-458d-bb44-98a4ad2c3d45", "mode": "run", "modelApplied": true }
Foldback result: The child result was folded back into the main thread exactly once.
  • Observed result after fix: the real gateway/QA-channel path preserves the accepted child session result and reports it without producing Agent couldn't generate a response or re-running the spawn.
  • What was not tested: live provider credentials, Discord thread-bound behavior, and the broader subagent-completion-direct-fallback QA path; that broader path was not used as final proof because a separate announce empty-response timeout was observed outside this patch's changed surface.
  • Before evidence (optional but encouraged): issue Gateway completeness check false-negative on pure-relay agents #72541 documents the pre-fix false negative; regression tests now cover accepted spawn suppression and the parent-timeout edge case.

Root Cause (if applicable)

  • Root cause: the runtime produced an accepted child-session fact but dropped it before the incomplete-turn classifier, replay metadata, fallback classifier, and terminal trajectory checks could use it.
  • Missing detection / guardrail: no structured accepted-spawn evidence existed on EmbeddedRunAttemptResult or final run results.
  • Contributing context (if known): the classifier correctly avoided treating arbitrary successful tools as completion, but had no narrow contract for accepted sessions_spawn.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/pi-embedded-runner/run.incomplete-turn.test.ts, src/agents/pi-embedded-subscribe.handlers.tools.test.ts, src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts, src/agents/pi-embedded-runner/result-fallback-classifier.test.ts, src/agents/pi-embedded-runner/run/attempt-trajectory-status.test.ts.
  • Scenario the test should lock in: accepted sessions_spawn is terminal outbound progress, malformed/error spawns are not, replay/fallback retries are blocked after acceptance, and prompt timeout still returns timeout payload.
  • Why this is the smallest reliable guardrail: it exercises the contract where the tool fact is produced and every runner classifier that previously lost it.
  • Existing test that already covers this (if any): none before this PR.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

Pure-relay subagent handoffs no longer show a false Agent couldn't generate a response message after a child session was accepted.

Diagram (if applicable)

Before:
sessions_spawn accepted -> accepted child fact dropped -> empty parent turn -> false incomplete-turn error / retry risk

After:
sessions_spawn accepted -> structured accepted-spawn evidence -> terminal success / replay blocked -> child result folded once

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS Darwin 25.4.0
  • Runtime/container: local OpenClaw QA suite, Node v24.14.0, pnpm 11.1.0
  • Model/provider: mock-openai/gpt-5.5
  • Integration/channel (if any): qa-channel + qa-lab bus + real gateway child
  • Relevant config (redacted): no secrets used

Steps

  1. Run the focused regression tests and type/format checks.
  2. Run the two QA suite scenarios listed in Real behavior proof.
  3. Confirm both QA reports show Passed: 1, Failed: 0.

Expected

  • Accepted sessions_spawn is preserved as terminal evidence.
  • The parent does not synthesize the false incomplete-turn error.
  • Prompt timeout after accepted spawn still returns a timeout payload.

Actual

  • Matches expected.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • codex review --commit HEAD found no actionable regressions after fixes.
    • node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt-trajectory-status.test.ts src/agents/pi-embedded-subscribe.handlers.tools.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts src/agents/pi-embedded-runner/result-fallback-classifier.test.ts --run passed: 10 files, 425 tests.
    • pnpm exec tsgo -p test/tsconfig/tsconfig.core.test.json --noEmit --pretty false passed.
    • pnpm exec oxfmt --check ... passed on all touched files.
    • git diff --check passed.
    • pnpm deadcode:dependencies, pnpm deadcode:unused-files, and pnpm deadcode:report:ci:ts-unused passed after rebasing to current upstream/main.
    • node scripts/run-vitest.mjs test/scripts/check-deadcode-unused-files.test.ts --run passed: 1 file, 7 tests.
    • Both QA suite Real Proof commands passed.
  • Edge cases checked: malformed accepted spawns, errored tool results, compaction retry preservation, replay invalidation, model fallback classification, terminal trajectory status, lifecycle abandonment, and timeout after accepted spawn.
  • What you did not verify: live provider credentials, Discord thread-bound behavior, and broad full-suite/Testbox gates.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: accepting sessions_spawn as terminal evidence could hide unrelated timeout failures.
    • Mitigation: timeout guard remains tied to real messaging delivery, and a regression test proves parent prompt timeout still returns a timeout payload after accepted spawn.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 21, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 21, 2026

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch carries accepted sessions_spawn evidence through embedded subscribe, runner replay/fallback/trajectory classification, and focused regression coverage.

Reproducibility: yes. at source level: current main can accept a child session but the incomplete-turn classifier only recognizes committed messaging delivery before emitting the false warning. I did not run the current-main repro in this read-only review.

PR rating
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Summary: Strong terminal proof, focused implementation, and broad regression coverage make this above-average merge signal with no blocking findings.

What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (terminal): The PR body supplies after-fix terminal QA output from a real OpenClaw QA Lab/gateway child path showing accepted spawn success without the false incomplete-turn error or duplicate spawn.

Risk before merge

  • The supplied real behavior proof uses QA-channel with mock-openai; live provider credentials and Discord thread-bound behavior were not exercised.

Maintainer options:

  1. Decide the mitigation before merge
    Land this structured accepted-spawn evidence fix after normal maintainer and CI validation, keeping the terminal-success contract limited to accepted status with non-empty run and child session identifiers.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
No repair lane is needed; the remaining action is ordinary maintainer and CI merge judgment.

Security
Cleared: The diff adds internal runtime state, tests, and a deadcode optional allowlist entry without new dependencies, permissions, secret handling, or external code execution.

Review details

Best possible solution:

Land this structured accepted-spawn evidence fix after normal maintainer and CI validation, keeping the terminal-success contract limited to accepted status with non-empty run and child session identifiers.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: current main can accept a child session but the incomplete-turn classifier only recognizes committed messaging delivery before emitting the false warning. I did not run the current-main repro in this read-only review.

Is this the best way to solve the issue?

Yes. Carrying a structured accepted-spawn fact through the runner is narrower than a config flag or broad tool-success exemption, and the PR preserves malformed/error spawn and timeout paths.

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body supplies after-fix terminal QA output from a real OpenClaw QA Lab/gateway child path showing accepted spawn success without the false incomplete-turn error or duplicate spawn.

Label justifications:

  • P2: This is a normal-priority agent runtime/session-state bug fix with limited scope and focused proof.
  • rating: 🦞 diamond lobster: Current PR rating is 🦞 diamond lobster because proof is 🦞 diamond lobster, patch quality is 🦞 diamond lobster, and Strong terminal proof, focused implementation, and broad regression coverage make this above-average merge signal with no blocking findings.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body supplies after-fix terminal QA output from a real OpenClaw QA Lab/gateway child path showing accepted spawn success without the false incomplete-turn error or duplicate spawn.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body supplies after-fix terminal QA output from a real OpenClaw QA Lab/gateway child path showing accepted spawn success without the false incomplete-turn error or duplicate spawn.

What I checked:

  • Current-main accepted spawn contract: Native subagent spawn returns status: "accepted" with childSessionKey and runId, and jsonResult stores that payload as tool-result details for the handler to inspect. (src/agents/subagent-spawn.ts:1345, b25a0d013b64)
  • Current-main incomplete-turn path: Current main only exempts committed messaging delivery before returning the Agent couldn't generate a response warning, so accepted spawn identity is not a terminal-success fact in this classifier. (src/agents/pi-embedded-runner/run/incomplete-turn.ts:251, b25a0d013b64)
  • Current-main tool handler path: Current main sanitizes tool results and records generic tool metadata/replay state, but has no accepted-spawn state on the subscription result. (src/agents/pi-embedded-subscribe.handlers.tools.ts:928, b25a0d013b64)
  • PR implementation evidence: The PR adds accepted-session-spawn.ts, records accepted sessions_spawn results only when status, run id, and child session key are valid, and threads that evidence through incomplete-turn, replay, lifecycle, trajectory, and fallback classifiers. (src/agents/accepted-session-spawn.ts:1, 0261fcd7306f)
  • Proof supplied in PR body: The PR body includes terminal QA Lab output for runtime-tool-sessions-spawn and subagent-handoff, showing accepted spawn success without the false incomplete-turn error or duplicate spawn on the QA-channel/gateway path. (0261fcd7306f)
  • History and routing evidence: Blame and path history point the current incomplete-turn, delivery-evidence, and fallback classifier surfaces to recent agent-runner work by Shakker, with nearby recent agent/subagent work by Vincent Koc and Peter Steinberger. (src/agents/pi-embedded-runner/run/incomplete-turn.ts:202, e3b77d6d2c89)

Likely related people:

  • shakkernerd: Blame and git log -S tie the current incomplete-turn, delivery-evidence, and fallback classifier code to commit e3b77d6d2c897d8c6f83a921527c5c740504d7d9. (role: introduced current classifier surface; confidence: medium; commits: e3b77d6d2c89; files: src/agents/pi-embedded-runner/run/incomplete-turn.ts, src/agents/pi-embedded-runner/delivery-evidence.ts, src/agents/pi-embedded-runner/result-fallback-classifier.ts)
  • vincentkoc: Recent commits touched embedded runner and subscription-adjacent agent code, including session write fencing and deadcode helper cleanup. (role: recent area contributor; confidence: medium; commits: 2bb00f6726d4, 88c49f9e68fc; files: src/agents/pi-embedded-runner, src/agents/pi-embedded-subscribe.ts)
  • steipete: Recent subagent-spawn work is adjacent to the accepted sessions_spawn contract this PR relies on. (role: recent adjacent contributor; confidence: medium; commits: 02182d5a3031; files: src/agents/subagent-spawn.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against b25a0d013b64.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 21, 2026
Signed-off-by: samzong <samzong.lu@gmail.com>
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. labels May 21, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 21, 2026

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Moonlit Shellbean

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: finds missing screenshots.
Image traits: location CI tidepool; accessory miniature diff map; palette moonlit blue and soft silver; mood celebratory; pose balancing on a branch marker; shell paper lantern shell; lighting soft studio lighting; background gentle dashboard dots.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Moonlit Shellbean in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@samzong samzong force-pushed the fix/accepted-spawn-terminal branch from 0a1e304 to 0261fcd Compare May 21, 2026 18:32
@openclaw-barnacle openclaw-barnacle Bot added scripts Repository scripts and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 21, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 21, 2026
@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 22, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 22, 2026

🦞🔧
ClawSweeper automerge is enabled.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

  • 2026-05-22 00:03:01 UTC review queued 0261fcd7306f (queued)

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 22, 2026

ClawSweeper 🐠 reef update

Thanks for the work on this. ClawSweeper did not have permission to update this branch directly, so it opened a narrow replacement PR instead. that's a branch access thing, not a knock on the contribution.

Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
Replacement PR: #85135
Why close: this run explicitly closes the superseded source PR after the credited replacement PR is open, so review continues in one place.
Closing this source PR because this run explicitly enabled source-PR closeout.
The replacement PR carries the original credit trail forward.
Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against c1b3c30.

@clawsweeper clawsweeper Bot closed this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. scripts Repository scripts size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway completeness check false-negative on pure-relay agents

2 participants