Skip to content

Retry app-server bridge drops safely#84219

Closed
VACInc wants to merge 4 commits into
openclaw:mainfrom
VACInc:repair-appserver-turn-lifecycle
Closed

Retry app-server bridge drops safely#84219
VACInc wants to merge 4 commits into
openclaw:mainfrom
VACInc:repair-appserver-turn-lifecycle

Conversation

@VACInc
Copy link
Copy Markdown
Contributor

@VACInc VACInc commented May 19, 2026

Summary

  • Retry Codex app-server bridge client-close and turn-completion idle failures once when the run has not produced visible output, persisted the queued user message, sent through the message tool, or started/completed a mutating tool.
  • Surface repeated or skipped app-server bridge failures as visible Codex app-server errors so group/channel replies do not silently disappear behind generic runner suppression.
  • Mark queued/direct block replies, partial/reasoning/tool-result output, compaction notices, messaging sends, and generic mutating tool side effects as retry blockers, including errored mutating tool results.

Root Cause

PR #83827 is still present upstream, and this is a different failure signature. The observed drops match Codex app-server lifecycle failures such as codex app-server client closed before turn completed and codex app-server turn idle timed out waiting for turn/completed; those were flowing through the generic before-reply failure path, which can become silent in group/channel conversations.

The safe recovery boundary is narrower than “before final reply”: once any user-visible output, transcript user-message persistence, message-tool send, or mutating tool side effect has happened, replaying the turn can duplicate messages or side effects. The P1 follow-up found that errored mutating tool results were still clearing the pending side-effect latch; this patch now treats completion of a mutating tool call as a replay blocker even when the tool reports an error.

Real behavior proof

Behavior addressed: app-server client-close/turn-completion idle failures before reply no longer disappear silently; clean pre-output failures retry once; post-output/post-side-effect failures do not replay.

Real environment tested: local tmp worktree after rebasing onto current origin/main; live OpenClaw Telegram DM production runtime after the fix path was deployed. The exact original production screenshot is attached as provided by the operator.

Exact steps or command run after this patch: ./node_modules/.bin/oxfmt --check --threads=1 src/auto-reply/reply/agent-runner-execution.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/reply-delivery.ts src/auto-reply/reply/reply-delivery.test.ts; git diff --check origin/main..HEAD; node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/reply-delivery.test.ts; node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts; AUTOREVIEW_AUTO_TESTS=0 .agents/skills/autoreview/scripts/autoreview --mode local; live production Telegram DM text prompt; live production Telegram DM prompt containing one JPEG screenshot attachment.

Evidence after fix: oxfmt reported all matched files use the correct format; git diff --check origin/main..HEAD passed; earlier Vitest proof reported Test Files 2 passed (2) and Tests 131 passed (131); the P1 follow-up focused Vitest proof reported Test Files 1 passed (1) and Tests 121 passed (121); autoreview reported autoreview clean: no accepted/actionable findings reported. In live production, a Telegram DM text-only prompt received a delivered OpenClaw reply, and a later screenshot/JPEG prompt also received a delivered OpenClaw reply. The local session record for the live production proof shows Telegram as the source channel, one media-bearing inbound message for the screenshot case, and successful message tool results (ok: true) for the text and media replies. Original screenshot proof is attached in the PR conversation: https://raw.githubusercontent.com/VACInc/openclaw/pr-84219-production-proof-artifacts/pr-assets/84219/live-production-telegram-proof-redacted.jpg?v=3

Observed result after fix: app-server bridge failures before output retry and can recover; repeated failures produce visible Codex app-server copy in group chats; retries are skipped after block/partial/reasoning/tool-result output, compaction notices, queued user persistence, messaging sends, pending messaging sends, completed mutating tools, errored completed mutating tools, and pending mutating tools. Live production Telegram DM text and screenshot prompts produced visible replies instead of being lost, confirming the issue is fixed in the production path covered by this PR.

What was not tested: full pnpm check/changed lanes or broad Testbox proof.

Verification

  • ./node_modules/.bin/oxfmt --check --threads=1 src/auto-reply/reply/agent-runner-execution.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/reply-delivery.ts src/auto-reply/reply/reply-delivery.test.ts
  • git diff --check origin/main..HEAD
  • node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/reply-delivery.test.ts
  • node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts
  • AUTOREVIEW_AUTO_TESTS=0 .agents/skills/autoreview/scripts/autoreview --mode local
  • Live production Telegram DM text prompt: delivered OpenClaw reply observed.
  • Live production Telegram DM screenshot/JPEG prompt: delivered OpenClaw reply observed.
  • Original live production screenshot proof attached in the PR conversation.

What was not tested

  • Full pnpm check/changed lanes or Testbox proof.

@openclaw-barnacle openclaw-barnacle Bot added size: L triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 19, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 19, 2026

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds a one-time Codex app-server bridge retry in auto-reply, blocks retry after visible output/message or mutating side effects, surfaces repeated/skipped failures, and adds focused regression coverage plus a changelog entry.

Reproducibility: Source-level yes, live no: current main has the app-server error strings and no bridge-specific retry/surfacing path, and the PR adds focused tests around that boundary. I did not live-induce a Codex app-server bridge drop in this read-only review.

PR rating
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Summary: Patch quality looks normally mergeable from source review, but proof remains thin for the changed retry/drop behavior that can duplicate messages or state.

Rank-up moves:

  • Attach direct redacted proof of pre-output bridge failure retry and unsafe/repeated failure visibility without duplicate sends.
  • Run a broader changed-lane or Testbox proof if maintainers want more confidence on the auto-reply/channel surface before merge.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: The screenshot is real Telegram proof of delivered replies, but it does not directly show the new app-server bridge retry, repeated-failure diagnostics, or duplicate-send guard; contributors should add redacted logs, terminal output, a recording, linked artifacts, or diagnostic screenshots and redact private data before posting. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Mantis proof suggestion
A live Telegram lane can verify that bridge-drop recovery produces one visible reply and does not silently drop or duplicate messages. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram live proof: verify Codex app-server bridge failure before output retries once, and unsafe or repeated failures produce one visible Telegram reply without duplicate sends.

Risk before merge

  • The supplied screenshot proves Telegram replies were delivered after deploying the patched runtime, but not the induced app-server bridge-drop retry path or unsafe repeated-failure visibility.
  • The PR intentionally replays a whole embedded turn under a computed no-output/no-side-effect boundary; any missed blocker could duplicate channel messages or mutate session/workspace state twice.
  • The PR body reports focused tests and local checks, but no full changed-lane, Testbox, or live induced-failure proof for this auto-reply/channel surface.

Maintainer options:

  1. Require Direct Bridge-Drop Proof (recommended)
    Before merge, ask for redacted logs, terminal output, a recording, or linked artifacts showing a clean pre-output bridge failure retrying once and an unsafe or repeated failure surfacing one reply without duplicates.
  2. Accept The Production Screenshot Risk
    A maintainer can intentionally accept the screenshot plus reported production session notes as enough for an urgent drop fix, while recording that the induced retry path was not directly proved.
  3. Use A Telegram Live Proof Lane
    A maintainer can request a live Telegram proof to capture the user-visible reply path and confirm recovery does not silently drop or duplicate replies.

Next step before merge
No automated repair is appropriate because the code review found no narrow defect, while the remaining blocker is direct behavior proof or explicit maintainer acceptance of replay risk.

Security
Cleared: No dependency, workflow, credential, permission, install, or supply-chain surface changed; the remaining replay concern is functional message/session risk.

Review details

Best possible solution:

Land the focused retry boundary after direct redacted runtime or transport proof shows pre-output bridge-drop recovery and unsafe/repeated failures producing exactly one visible reply without duplicate sends or side effects.

Do we have a high-confidence way to reproduce the issue?

Source-level yes, live no: current main has the app-server error strings and no bridge-specific retry/surfacing path, and the PR adds focused tests around that boundary. I did not live-induce a Codex app-server bridge drop in this read-only review.

Is this the best way to solve the issue?

Yes, the code shape is the right narrow surface: auto-reply owns the user-visible retry/drop decision, while the Codex plugin continues to own the app-server lifecycle errors. The remaining blocker is proof and maintainer risk acceptance, not a clearer code fix found in review.

Label justifications:

  • P1: The PR targets a real channel-drop failure where Codex app-server lifecycle errors can leave Telegram/group replies invisible to users.
  • merge-risk: 🚨 message-delivery: The retry decision directly affects whether channel replies are lost, duplicated, or surfaced as one visible failure message.
  • merge-risk: 🚨 session-state: A whole-turn retry can repeat transcript/session or mutating tool effects if the no-side-effect boundary misses a runtime signal.
  • rating: 🦪 silver shellfish: Current PR rating is 🦪 silver shellfish because proof is 🦪 silver shellfish, patch quality is 🐚 platinum hermit, and Patch quality looks normally mergeable from source review, but proof remains thin for the changed retry/drop behavior that can duplicate messages or state.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Needs stronger real behavior proof before merge: The screenshot is real Telegram proof of delivered replies, but it does not directly show the new app-server bridge retry, repeated-failure diagnostics, or duplicate-send guard; contributors should add redacted logs, terminal output, a recording, linked artifacts, or diagnostic screenshots and redact private data before posting. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • proof: 📸 screenshot: Contributor real behavior proof includes screenshot evidence. The screenshot is real Telegram proof of delivered replies, but it does not directly show the new app-server bridge retry, repeated-failure diagnostics, or duplicate-send guard; contributors should add redacted logs, terminal output, a recording, linked artifacts, or diagnostic screenshots and redact private data before posting.
  • mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The PR changes user-visible Telegram/group reply delivery behavior, so a short live Telegram proof would materially help review.

What I checked:

Likely related people:

  • steipete: Authored recent structured execution-item and embedded lifecycle work in the same auto-reply runner path that this retry logic observes. (role: recent area contributor; confidence: medium; commits: 996dccb19ca1, af81ee9fee8b; files: src/auto-reply/reply/agent-runner-execution.ts)
  • ahdernasr: Introduced serialized tool-result delivery in the same runner file, which is one of the user-visible output boundaries this PR now uses as a retry blocker. (role: adjacent behavior introducer; confidence: medium; commits: e321f21daaad; files: src/auto-reply/reply/agent-runner-execution.ts)
  • VACInc: Authored the merged queued Telegram topic followup fix that is linked from this PR as a distinct but adjacent auto-reply/channel delivery failure. (role: recent adjacent owner; confidence: medium; commits: a559ccc0844a; files: src/auto-reply/reply/followup-runner.ts, src/auto-reply/reply/get-reply-run.ts)
  • Bob: Carried a recent reply lifecycle refactor in the same auto-reply execution file, relevant to retry and failure-finalization boundaries. (role: adjacent lifecycle contributor; confidence: low; commits: 3f6840230b86; files: src/auto-reply/reply/agent-runner-execution.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against a13468320c63.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot added the app: android App: android label May 19, 2026
@VACInc VACInc changed the title Retry app-server bridge drops safely Preserve app-server and Android realtime replies May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the app: android App: android label May 19, 2026
@VACInc VACInc changed the title Preserve app-server and Android realtime replies Retry app-server bridge drops safely May 19, 2026
@VACInc VACInc marked this pull request as ready for review May 19, 2026 18:42
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 19, 2026
Copy link
Copy Markdown
Contributor Author

VACInc commented May 19, 2026

Production Telegram proof attached. This is the exact original screenshot provided by the operator from the live production Telegram DM verification.

Original live production Telegram proof

The production runtime is currently running this fix path. The live text prompt and screenshot/JPEG prompt both produced delivered replies, confirming the dropped/lost reply issue is fixed in the live production path covered by this PR.

Ready for re-review.

@clawsweeper clawsweeper Bot added status: 🛠️ actively grinding The PR author has acted after the latest ClawSweeper review and work remains. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 19, 2026
@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 19, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 19, 2026

🦞🔧
ClawSweeper automerge is enabled.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

  • 2026-05-19 18:58:08 UTC review queued 98cee127009c (queued)
  • 2026-05-21 22:33:21 UTC review queued ddcbe77ae631 (queued)

@clawsweeper clawsweeper Bot added proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed status: 🛠️ actively grinding The PR author has acted after the latest ClawSweeper review and work remains. labels May 19, 2026
@Takhoffman
Copy link
Copy Markdown
Contributor

looking into why it got stuck

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 20, 2026

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@clawsweeper clawsweeper Bot added the clawsweeper:human-review Needs maintainer review before ClawSweeper can continue label May 20, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 20, 2026

🦞✅
ClawSweeper is pausing this repair loop for human review.

Source: clawsweeper[bot]
Reason: Needs contributor or maintainer action rather than a repair lane because the PR still has a retry-safety blocker and the required real behavior proof is insufficient for automation to supply.; Needs attention: No dependency or workflow supply-chain change was found, but the command/tool retry path can replay errored mutating tools and needs a fail-closed guard before merge.; Review finding: [P1] Block retries after... (sha=b8c187d3467654c3a8ed994ca7b5414d16a8dc7d)

Why human review is needed:
ClawSweeper found a blocker that should be resolved or accepted by a maintainer before the repair or automerge loop continues.

Recommended next action:
Review the reason above, resolve the blocker or explicitly accept the risk, then ask ClawSweeper to continue if automation is still appropriate.

I added clawsweeper:human-review and left the final call with a maintainer.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 20, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 20, 2026

🦞✅
ClawSweeper is pausing this repair loop for human review.

Source: clawsweeper[bot]
Reason: The PR has no discrete code finding after the latest fix, but missing direct real-behavior proof and message/session replay risk should stay with contributor or maintainer review rather than an automated repair lane.; Cleared: No dependency, workflow, credential, permission, or supply-chain change is present; the remaining replay concern is tracked as functional merge risk. (sha=c1d45956ca472bc61b8401b9403a4937a104d1e2)

Why human review is needed:
This item has security-sensitive risk. ClawSweeper is pausing instead of making an autonomous change that could affect trust, credentials, permissions, or exposure.

Recommended next action:
Have a maintainer review the security-sensitive detail and provide an explicit safe path before asking ClawSweeper to continue.

I added clawsweeper:human-review and left the final call with a maintainer.

@VACInc VACInc force-pushed the repair-appserver-turn-lifecycle branch from c1d4595 to ddcbe77 Compare May 20, 2026 15:39
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 20, 2026

🦞✅
ClawSweeper is pausing this repair loop for human review.

Source: clawsweeper[bot]
Reason: No automated repair is appropriate because the code review found no narrow defect, while the remaining blocker is direct behavior proof or explicit maintainer acceptance of replay risk.; Cleared: No dependency, workflow, credential, permission, install, or supply-chain surface changed; the remaining replay concern is functional message/session risk. (sha=ddcbe77ae63126a48c55ad909d37f793b69985cc)

Why human review is needed:
This item has security-sensitive risk. ClawSweeper is pausing instead of making an autonomous change that could affect trust, credentials, permissions, or exposure.

Recommended next action:
Have a maintainer review the security-sensitive detail and provide an explicit safe path before asking ClawSweeper to continue.

I added clawsweeper:human-review and left the final call with a maintainer.

@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper approve

@clawsweeper clawsweeper Bot removed the clawsweeper:human-review Needs maintainer review before ClawSweeper can continue label May 21, 2026
@steipete steipete self-assigned this May 21, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 21, 2026

ClawSweeper 🐠 reef update

Thanks for the work on this. ClawSweeper could not push to this branch with the permissions available, so it opened a narrow replacement PR to keep the fix swimming forward without losing the contributor trail. not your fault, just GitHub branch-permission tides.

Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
Replacement PR: #85116
Why close: this run explicitly closes the superseded source PR after the credited replacement PR is open, so review continues in one place.
This source PR is being closed only under the explicit source-close setting for this ClawSweeper run.
The original contribution stays credited in the replacement PR context.
Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against 3b24634.

@steipete
Copy link
Copy Markdown
Contributor

Thanks @VACInc. I traced this through the Codex app-server stdio/websocket boundary and the app-server run loop. The underlying failure class is real, but I do not want to merge this branch as-is because the blanket retry can replay turns where the shared app-server may still have an active/hidden turn, especially idle turn/completed failures.

I opened a narrower replacement here: #85279

That replacement keeps your credit in the changelog and includes you as commit co-author. It preserves the useful behavior from this PR, but moves the retry decision behind structured app-server failure metadata: replay-safe stdio client-close turns retry once, websocket/shared-server failures do not retry, idle turn/completed failures do not retry or fall through to model fallback, and the user gets specific non-generic failure copy.

Proof on the replacement:

Closing this PR in favor of #85279. Thanks again for digging into the app-server bridge failure; the replacement is intentionally shaped around the same bug report but with the replay boundary tightened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge mantis: telegram-visible-proof Mantis should capture Telegram visible proof. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: L status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants