fix(claude): don't time out a turn while it waits on a slow tool approval by pjdoland · Pull Request #382 · plmbr/notebook-intelligence

pjdoland · 2026-06-24T23:58:01Z

Summary

Approving a tool after a long wait (around 30 minutes) threw a "Claude agent response timeout" error, and subsequent prompts then got stuck. The root cause: the response timeout (CLAUDE_AGENT_CLIENT_RESPONSE_TIMEOUT, default 1800s) counted the time the agent spent legitimately blocked on the user's tool approval, so a slow human reply timed out the turn while the worker was still mid-request awaiting the approval. The orphaned request then wedged the next prompt.

Solution

Exclude user-wait time from the response timeout. ChatResponse now tracks the wall-clock time it spends blocked on user input (every tool approval, plan confirm, and AskUserQuestion funnels through wait_for_chat_user_input), and _send_claude_agent_request subtracts that from the timeout window. A slow human reply no longer looks like an unresponsive agent, while the timeout still fires for a genuine post-approval hang.

Cancel in-flight requests on on_close. Because a parked approval no longer self-resolves via the timeout, an abandoned turn (the user closes the tab) would otherwise spin its worker thread and the Claude subprocess forever; cancelling makes the poll loop tear the subprocess down. The handler dict is snapshotted before iterating so a worker popping its own entry at the same instant can't raise mid-iteration.

wait_for_chat_user_input now disconnects its listener in a finally, fixing a latent subscription leak on a cancelled turn.

Testing

Unit tests cover the new ChatResponse wait accounting (starts at zero, counts an in-progress wait, accumulates completed waits, clears in-progress on answer, disconnects the listener) and the on_close cancellation (cancels each in-flight token, continues if one cancel raises, clears the dict). Full suite green: 1235 passed plus the 57-test Claude client suite.

Verified live in a running JupyterLab with a lowered timeout so a "long wait" is fast to reproduce: an approval held about 7x past the timeout did not error, completed on approval, and the next prompt responded normally (both symptoms fixed). Separately, closing the tab mid-approval fired on_close and tore the subprocess down within a poll cycle, so an abandoned approval no longer leaks; ungraceful drops are covered by Tornado's websocket ping timeout firing on_close.

Risks / follow-ups

The accounting assumes a single in-flight approval at a time, which holds because the Claude SDK serializes tool permission prompts; worth a note near wait_for_chat_user_input if that ever changes. The timing uses wall-clock time.time() rather than a monotonic clock, so a system clock adjustment mid-wait could skew the window slightly; cosmetic, optional to switch to a monotonic source.

Closes #381.

…oval (plmbr#381) A tool approval the user took a long time to confirm (around 30 min) was counted toward CLAUDE_AGENT_CLIENT_RESPONSE_TIMEOUT, so the turn errored with "Claude agent response timeout" while the worker was still mid-request awaiting the approval. The orphaned request then wedged the next prompt. - Track on ChatResponse the wall-clock time spent blocked on user input (every approval, plan confirm, and AskUserQuestion goes through wait_for_chat_user_input) and subtract it from the response-timeout window in _send_claude_agent_request. A slow human reply no longer looks like an unresponsive agent; the timeout still fires for a genuine post-approval hang. - wait_for_chat_user_input now disconnects its listener in a finally, so a cancelled turn can't leak the subscription. - Cancel in-flight requests in on_close. With the timeout no longer firing for a parked approval, an abandoned turn (the user closes the tab) would otherwise spin its worker and Claude subprocess forever; cancelling makes the poll loop tear the subprocess down. Snapshot the handler dict before iterating so a worker popping its own entry can't raise mid-iteration. Verified live with a lowered timeout: an approval held ~7x past the timeout did not error, completed on approval, and the next prompt responded normally; and a tab close mid-approval tore the subprocess down within a poll cycle.

mbektas

thanks a lot @pjdoland !

pjdoland added the bug Something isn't working label Jun 25, 2026

mbektas approved these changes Jun 25, 2026

View reviewed changes

mbektas merged commit 2f7e7a7 into plmbr:main Jun 25, 2026
4 of 5 checks passed

pjdoland deleted the fix/381-tool-approval-long-wait branch June 25, 2026 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(claude): don't time out a turn while it waits on a slow tool approval#382

fix(claude): don't time out a turn while it waits on a slow tool approval#382
mbektas merged 1 commit into
plmbr:mainfrom
pjdoland:fix/381-tool-approval-long-wait

pjdoland commented Jun 24, 2026

Uh oh!

mbektas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pjdoland commented Jun 24, 2026

Summary

Solution

Testing

Risks / follow-ups

Uh oh!

mbektas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants