fix: prevent macOS task hang — force-close connections, improve process cleanup#423
Conversation
…e process cleanup - killSidecar: pass close=true to force-close SSE/WebSocket connections on shutdown - processor: increase toolcall cleanup timeout from 250ms to 5s for child process teardown - pty: send explicit SIGTERM signal instead of undefined for reliable process termination - llm: add 10-minute stream timeout to prevent session stuck in busy state on network stall - retry: cap retry attempts at 10 to prevent infinite retry loops Fixes Astro-Han#421, Fixes Astro-Han#422
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughForce-closes the desktop sidecar connections; implements staged, cross-platform process-tree termination and helpers to detect/process descendants; reworks PTY teardown to track async cleanup and publish ordered events; adds an inactivity timeout for LLM streaming and a retry-attempt cap; buffers and applies pending processor tool updates/actions; and updates tests covering these behaviors. ChangesShutdown, PTY & Session Cleanup
Sequence Diagram(s)sequenceDiagram
participant App as rgba(52,160,52,0.5)
participant Sidecar as rgba(66,135,245,0.5)
participant PTY as rgba(245,166,35,0.5)
participant ChildProc as rgba(168,88,214,0.5)
participant LLM as rgba(220,57,18,0.5)
App->>Sidecar: killSidecar() → server.stop(true)
Sidecar-->>App: closeAllConnections (SSE/WebSocket closed)
App->>PTY: Pty.remove(id)
PTY->>PTY: closeSubscribers(session)
PTY->>ChildProc: terminateTree(SIGTERM → waitForExit)
PTY->>PTY: wait EXIT_WAIT_MS / TERMINATION_GRACE_MS
alt still alive after grace
PTY->>ChildProc: escalate SIGKILL (group/individual)
end
PTY-->>App: publish pty.exited → pty.deleted
App->>LLM: stream(..., streamTimeoutMs)
LLM->>App: stream events (reset inactivity timer)
alt no events within timeout
LLM-->>App: abort provider response body / cancel stream
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 8/10 reviews remaining, refill in 10 minutes and 28 seconds. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces several stability and timeout improvements across the codebase, including explicit signal handling for process termination, increased timeouts for tool calls, and a 10-minute timeout for LLM streams. It also implements a maximum retry limit of 10 attempts. A critical issue was identified in the retry logic where an incorrect method was used to terminate the retry schedule, which should be corrected to ensure the schedule stops as intended.
|
补充一下 maintainer update:我在这个 PR 上追加了一个提交 这次主要做了三件事:
本地验证已通过: bun --cwd packages/opencode test test/pty/pty-session.test.ts test/tool/bash.test.ts --timeout 20000
bun --cwd packages/opencode typecheck这部分是对 #421 / #422 根因修复的补强,不再只是增加 cleanup/stream/retry 的等待时间。 |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/pty/index.ts`:
- Around line 183-206: The finalizer currently calls the sync path
(teardownSync/terminateSync) which schedules a detached SIGKILL via
setTimeout.unref(); change the finalizer to be effectful and await the async
teardown path by invoking the Effect-based teardown(session) (and
terminate(session) as used inside it) instead of teardownSync, ensuring the
finalizer yields until the child processes are torn down; update any other
finalizer occurrences (also around the 216-220 area) to use Effect.addFinalizer
or Effect.acquireRelease inside InstanceState.make and call/await
teardown(session) rather than the fire-and-forget teardownSync/terminateSync
helpers.
- Around line 164-196: The terminate and terminateSync functions send signals
even when a session has already exited, risking PID/PGID reuse races; update
terminate and terminateSync (referencing terminate, terminateSync, waitForExit,
signalProcess, signalGroup, session.info.status, sleep, and
TERMINATION_GRACE_MS) to short-circuit immediately if session.info.status ===
"exited" and to re-check liveness after the grace period before escalating to
SIGKILL (e.g., re-check waitForExit or verify the PID/PGID is still alive),
skipping any further signals if the session became exited in the meantime.
In `@packages/opencode/test/tool/bash.test.ts`:
- Line 1095: Update the test command so the background child installs its own
signal handlers before running sleep: instead of relying on the parent trap in
the command string `trap '' HUP TERM; sleep 30 & echo $! >
${JSON.stringify(pidFile)}; echo before; wait`, spawn the background job via a
subshell that runs its own `trap '' HUP TERM` (or at least `trap '' TERM`) then
writes its PID to `pidFile` and runs `sleep 30`, ensuring the child process
ignores TERM independently of the parent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: db4a2e14-e79f-4489-b6c6-5b01a193b454
📒 Files selected for processing (3)
packages/opencode/src/pty/index.tspackages/opencode/test/pty/pty-session.test.tspackages/opencode/test/tool/bash.test.ts
|
Update pushed in 4494673. What changed:
Verified locally:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/opencode/test/pty/pty-session.test.ts (1)
98-135: UsetestEffect(...)/it.live(...)for this PTY Effect-service test.This test exercises Pty Effect services and depends on real child-process behavior and git; it should use the
testEffect+it.livepattern. Additionally, replace theInstance.provide(...)pattern withprovideTmpdirInstance(...)for better cleanup and Effect composition alignment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/pty/pty-session.test.ts` around lines 98 - 135, The test "remove terminates child processes started inside the pty" should be converted to use the Effect-based test helpers: replace the top-level test(...) with testEffect(...) and make the body an it.live(...) spec so it runs live against real processes; swap the Instance.provide({ directory: dir.path, fn: async () => { ... } }) usage for provideTmpdirInstance(...) to obtain the tmpdir-backed Instance within the Effect environment, and keep the same logic that calls Pty.create(...) and Pty.remove(...) (ensure id is tracked and removed in finally). Ensure the effect returns/awaits the same waits and assertions inside the live effect so cleanup and process isolation occur under the testEffect/it.live lifecycle.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/opencode/test/pty/pty-session.test.ts`:
- Around line 98-135: The test "remove terminates child processes started inside
the pty" should be converted to use the Effect-based test helpers: replace the
top-level test(...) with testEffect(...) and make the body an it.live(...) spec
so it runs live against real processes; swap the Instance.provide({ directory:
dir.path, fn: async () => { ... } }) usage for provideTmpdirInstance(...) to
obtain the tmpdir-backed Instance within the Effect environment, and keep the
same logic that calls Pty.create(...) and Pty.remove(...) (ensure id is tracked
and removed in finally). Ensure the effect returns/awaits the same waits and
assertions inside the live effect so cleanup and process isolation occur under
the testEffect/it.live lifecycle.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b05f00d1-87d6-4d55-86fb-7c8f4d20eeb0
📒 Files selected for processing (1)
packages/opencode/test/pty/pty-session.test.ts
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 227-241: The applyPendingToolUpdates function currently deletes
ctx.pendingToolUpdates[toolCallID] before verifying the part exists, which loses
updates when readToolCall(toolCallID) returns undefined; change the flow in
applyPendingToolUpdates so you only delete ctx.pendingToolUpdates[toolCallID]
after a successful match and successful session.updatePart(part) (i.e., move the
delete to after the if (!match) check and after the part is created/updated),
ensuring readToolCall and session.updatePart complete before clearing the
pending queue and then update ctx.toolcalls accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 277a6c0d-f5be-4ce8-ac72-d00ea81d1ccb
📒 Files selected for processing (8)
packages/desktop-electron/src/main/index.tspackages/opencode/src/pty/index.tspackages/opencode/src/session/llm.tspackages/opencode/src/session/processor.tspackages/opencode/src/session/retry.tspackages/opencode/src/tool/bash.tspackages/opencode/test/pty/pty-session.test.tspackages/opencode/test/tool/bash.test.ts
|
Follow-up update for the latest review pass:
Local verification:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/opencode/test/session/retry.test.ts (1)
125-150: ⚡ Quick winUse
testEffect/it.effectfor this Effect workflow test.This new case is Effect-driven (
Effect.runPromiseExit,Effect.retry) but is still written with plaintest(...). Please move it to the Effect test harness for consistency with the test conventions in this package.As per coding guidelines: “Use
testEffect(...)fromtest/lib/effect.tsfor tests that exercise Effect services or Effect-based workflows. Useit.effect(...)when the test should run withTestClockandTestConsole.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/session/retry.test.ts` around lines 125 - 150, Replace the plain Jest test(...) with the Effect test harness by converting the test titled "policy stops retrying after the configured max attempts" to use testEffect (from test/lib/effect.ts); instead of calling Effect.runPromiseExit imperatively inside the test body, return the Effect workflow to the harness (i.e., make the test function return the Effect that runs the retry logic and asserts via Exit/expect), keeping the same Effect pipeline (Effect.try + Effect.retry with SessionRetry.policy and references to SessionRetry.RETRY_MAX_ATTEMPTS and attempts array) so the harness manages execution and TestClock/TestConsole as required.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/pty/index.ts`:
- Around line 138-148: waitForExit currently resolves when either the process
actually exits or when EXIT_WAIT_MS elapses, which can cause a raced publish of
pty.deleted before the true onExit later publishes pty.exited; change
waitForExit (and the other similar call sites) to keep the timeout branch
separate from the real exit signal: attach the onExit handler and return a
promise that only resolves on the actual onExit, but also return or set a
boolean (e.g., timedOut) when the timeout fires so the caller can proceed with
deletion without treating the timeout as a real exit; additionally ensure the
global onExit handler (the session.process.onExit subscriber used elsewhere)
checks session.info.status or the timedOut flag and unsubscribes/disables its
publish of pty.exited if a prior timeout-driven deletion has already occurred,
and always dispose the temporary subscription created by waitForExit when either
branch wins.
In `@packages/opencode/src/session/llm.ts`:
- Line 404: The assignment to timeoutMs uses caller-controlled
input.streamTimeoutMs directly; guard it so 0, negative, or NaN values fall back
to the default SILENT_STREAM_TIMEOUT_MS. Replace the expression around const
timeoutMs = input.streamTimeoutMs ?? SILENT_STREAM_TIMEOUT_MS with a validation
that checks Number.isFinite(input.streamTimeoutMs) and input.streamTimeoutMs > 0
(or Math.max with a defined minimum) and otherwise assigns
SILENT_STREAM_TIMEOUT_MS, referencing the timeoutMs variable and
input.streamTimeoutMs in the fix.
---
Nitpick comments:
In `@packages/opencode/test/session/retry.test.ts`:
- Around line 125-150: Replace the plain Jest test(...) with the Effect test
harness by converting the test titled "policy stops retrying after the
configured max attempts" to use testEffect (from test/lib/effect.ts); instead of
calling Effect.runPromiseExit imperatively inside the test body, return the
Effect workflow to the harness (i.e., make the test function return the Effect
that runs the retry logic and asserts via Exit/expect), keeping the same Effect
pipeline (Effect.try + Effect.retry with SessionRetry.policy and references to
SessionRetry.RETRY_MAX_ATTEMPTS and attempts array) so the harness manages
execution and TestClock/TestConsole as required.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: a05d6956-24f4-4c96-aab1-a2ed12ee4e76
📒 Files selected for processing (6)
packages/desktop-electron/src/main/index-sidecar-source.test.tspackages/opencode/src/pty/index.tspackages/opencode/src/session/llm.tspackages/opencode/test/pty/pty-session.test.tspackages/opencode/test/session/llm.test.tspackages/opencode/test/session/retry.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opencode/test/pty/pty-session.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 623-624: The queued running-state update in updateToolCall leaves
running undefined and skips SessionDiagnostics.observeToolCall when tool-call
precedes tool-input-start; modify updateToolCall (and the out-of-order handling
at the same block around lines 636-652) to either (a) record/serialize the
diagnostics metadata (the same data written by
SessionDiagnostics.observeToolCall, including diagnostics.loop) into the
pending-update structure so applyPendingToolUpdates will replay both state and
diagnostics, or (b) call SessionDiagnostics.observeToolCall when queuing so the
diagnostics write is executed immediately; ensure applyPendingToolUpdates also
applies any queued diagnostics write when it replays pending updates so
loopRecords, errorRecords and completeToolCall see the same metadata as the
in-order path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: bd9371d9-19ba-4065-aaac-dc4d370fbd72
📒 Files selected for processing (1)
packages/opencode/src/session/processor.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/opencode/test/util/process.test.ts (1)
63-74: ⚡ Quick winThis regression test doesn't validate tree cleanup yet.
The spawned script has no child process, so a root-only kill would still make this pass. To lock in the
/Tbehavior, make the parent spawn a long-lived child and assert that the child PID is gone afterterminateTree(...).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/util/process.test.ts` around lines 63 - 74, The test currently spawns a single process so it cannot verify tree-wide cleanup; update the test named "terminateTree uses the platform process-tree cleanup path on Windows" so the spawned parent process (created via Process.spawn) itself spawns a long-lived child, capture that child's PID from the parent (or have the parent print it and parse proc.stdout), call Process.terminateTree({ pid: proc.pid! }), then assert the child's PID is no longer running (e.g., wait for the child to exit or check that the child PID cannot be found) rather than only checking proc.exited; this ensures the `/T` tree-kill behavior is asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/util/process.ts`:
- Around line 241-248: The TERM grace period is skipped when input.waitForExit
is not provided; change the rootExited calculation so that even if
input.waitForExit is undefined we still wait graceMs before escalating.
Specifically, update the rootExited logic (where input.waitForExit,
sleep(graceMs) and graceMs are referenced) to either race
input.waitForExit.then(() => true) against sleep(graceMs).then(() => false) when
present, or await sleep(graceMs) and set rootExited = false when
input.waitForExit is absent, so that the later groupSignaled && !rootExited &&
exists(input.pid) branch honors the grace period before calling
signalGroup(input.pid, "SIGKILL").
---
Nitpick comments:
In `@packages/opencode/test/util/process.test.ts`:
- Around line 63-74: The test currently spawns a single process so it cannot
verify tree-wide cleanup; update the test named "terminateTree uses the platform
process-tree cleanup path on Windows" so the spawned parent process (created via
Process.spawn) itself spawns a long-lived child, capture that child's PID from
the parent (or have the parent print it and parse proc.stdout), call
Process.terminateTree({ pid: proc.pid! }), then assert the child's PID is no
longer running (e.g., wait for the child to exit or check that the child PID
cannot be found) rather than only checking proc.exited; this ensures the `/T`
tree-kill behavior is asserted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: f5968b70-02f0-47a0-b9c8-1a87930526dd
📒 Files selected for processing (4)
packages/opencode/src/pty/index.tspackages/opencode/src/tool/bash.tspackages/opencode/src/util/process.tspackages/opencode/test/util/process.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opencode/src/tool/bash.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/util/process.ts`:
- Around line 258-279: The group-kill branch currently calls
signalGroup(input.pid, "SIGKILL") and returns without touching the
previously-enumerated children, which can leave descendants orphaned; update the
branch in the function containing signalGroup, signalPid, children, exists, and
input.waitForExit so that after signalGroup(input.pid, "SIGKILL") you also
iterate over the children array and call signalPid(child, "SIGKILL") (optionally
checking exists(child) before signaling), emit a debug log similar to other
paths, and then await the same Promise.race([input.waitForExit.catch(() =>
undefined), sleep(graceMs)]) before returning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 7fd32a75-b7d5-4851-8b98-670474d8fa37
📒 Files selected for processing (4)
packages/opencode/src/pty/index.tspackages/opencode/src/tool/bash.tspackages/opencode/src/util/process.tspackages/opencode/test/util/process.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/opencode/src/tool/bash.ts
- packages/opencode/src/pty/index.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/sdk/js/src/process.ts`:
- Around line 39-43: The shutdown code sends SIGTERM and SIGKILL immediately,
preventing graceful shutdown; modify the sequence in the block that calls
descendants(proc.pid) and signal to first send SIGTERM to proc.pid and each
child, then wait a short grace window (e.g., 500ms — same as packages/opencode
behavior) before sending SIGKILL to proc.pid and each child; implement the wait
with a Promise-based sleep (or setTimeout wrapped in a Promise) so the function
can await the delay and then proceed to send SIGKILL only after the grace
period.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ef3f44f1-60fb-498f-a9f8-ec3b26f68af5
📒 Files selected for processing (3)
packages/opencode/src/util/process.tspackages/opencode/test/util/process.test.tspackages/sdk/js/src/process.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/opencode/test/util/process.test.ts
- packages/opencode/src/util/process.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/sdk/js/src/process.ts`:
- Around line 39-46: The delayed SIGKILL timer can fire after the child already
exited and its PID recycled; modify the shutdown logic around
descendants(proc.pid) / signal(proc.pid, ...) so that after creating killTimer
you register handlers on the child process (proc.on('exit'), proc.on('close')
and proc.on('error')) that clearTimeout(killTimer) (and call killTimer.unref if
needed) and remove those handlers so the timer is cancelled if the process exits
or errors before the 500ms window; keep the existing SIGTERM/SIGKILL signaling
and ensure you reference the existing symbols (proc.pid, descendants, signal,
killTimer) when adding/clearing the timer and listeners.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e4611246-5e82-4c24-b239-54bae7a2d1dd
📒 Files selected for processing (1)
packages/sdk/js/src/process.ts
Summary
修复 macOS 上任务完成后 app 卡死的问题,并把后续 review 里暴露出的清理路径补齐:
killSidecar()传close=true— 强制关闭 SSE/WebSocket 连接,防止 Electron 进程带着半死不活的服务器挂住1s等待 — 只等待 Bash/PTY 正常落盘截断输出和退出状态;进程生命周期仍在 Bash/PTY 层显式终止Process.terminateTree()— 先尝试 process group,再 fallback 到 root + descendants,TERM 后统一 500ms grace 再 KILL;Windows 使用taskkill /f /tSILENT_STREAM_TIMEOUT_MS只限制两次 provider event 之间的静默时间,不是总运行时长Related Issues
Testing
bun test test/util/process.test.ts test/pty/pty-session.test.ts test/tool/bash.test.ts test/session/prompt-effect.test.ts --timeout 30000bun test test/session/llm.test.ts test/session/prompt-effect.test.ts --timeout 30000bun --cwd packages/opencode typecheck,bun --cwd packages/sdk/js typecheckunit-opencode,typecheck,unit-app,unit-desktop,smoke-macos-arm64,e2e-artifacts,analyze-js-ts, CodeQLSummary by CodeRabbit
Bug Fixes
Improvements
Tests