Summary
These two bugs have been plaguing users for months (see #26224 — 28 comments, #6836 — 150+ reports), with no root cause analysis from the team. After yet another day of babysitting Claude Code and pressing ESC every few minutes to revive a hung agent, we decided to conduct our own deep investigation — reverse-engineering cli.js across 12 npm package versions and analyzing 1,571 session JSONL files containing 148,444 tool calls.
Here are the exact root causes and proposed fixes.
Claude Code hangs indefinitely when an SSE streaming connection silently dies. There is no client-side timeout or heartbeat detection, so the process waits forever for events that will never arrive. ESC partially works around this by aborting the dead connection, but the queue auto-restart mechanism (queue.length > 0 → n()) immediately starts the next queued prompt instead of returning control to the user.
Root cause identified in source code — two separate issues in cli.js:
- No streaming timeout: The
messages.stream() call has no timeout. If the SSE connection dies silently (TCP half-open), the client waits forever.
- Queue auto-restart after abort: After ESC aborts a hung request,
if (queue.length > 0) { n(); return; } immediately starts the next queued prompt. The user cannot fully cancel.
Environment
- Claude Code: 2.1.74 (also confirmed on 2.1.50–2.1.73)
- OS: Windows 10, Git Bash
- Model: Opus 4.6
- API: Anthropic direct (not Bedrock/Vertex)
Reproduction
- Start a Claude Code session
- Submit a prompt → agent starts processing
- Wait for a hang (0 tokens, timer running, no progress) — happens ~10-15% of prompts
- Submit another prompt while hung → goes to queue
- Press ESC
- Expected: Cancel everything, return to
❯
- Actual: Cancels the hung prompt, immediately starts the queued one
Frequency
Measured across 1,571 sessions using a custom JSONL analyzer tool:
| Period |
Versions |
Orphan rate (lost tool calls) |
| Dec 2025 |
2.0.72–2.1.2 |
6–14% |
| Jan 2026 |
2.1.5–2.1.23 |
5–10% |
| Feb 2026 |
2.1.29–2.1.56 |
3–8% |
| Mar 2026 |
2.1.69–2.1.74 |
2.4–4% |
The hang frequency has been increasing over time: rare in fall 2025, now ~10-15% of prompts per hour.
Source Code Analysis
Analyzed cli.js extracted from npm pack @anthropic-ai/claude-code across versions 2.0.72 through 2.1.74.
Issue 1: No streaming timeout
The API call at approximately offset 2,553,870 in cli.js (v2.1.74):
client.beta.messages.stream({...params}, options)
There is no timeout parameter, no keepalive check, and no heartbeat detection. The Anthropic SSE API sends periodic :ping comments, but the client does not monitor for their absence.
When the TCP connection silently dies (common on Windows, WiFi, VPN, or after laptop sleep), the Node.js HTTP client has no way to know the connection is dead. The AbortController signal is never triggered because no error event fires.
Evidence: Packet inspection by other reporters confirms the client is stuck waiting for SSE events that never arrive. Token count stays at 0. ESC + re-submit creates a new connection that works immediately.
Issue 2: Queue auto-restart prevents full cancellation
The main processing loop (offset ~11,400,559 in v2.1.74):
n = async () => {
if (M) return; // running guard
M = true;
// ... prepare input, call API, process response ...
}
After completion or abort — in the finally block (offset ~11,406,174):
finally {
M = false; // clear running guard
W6.start(); // restart idle timer
}
if (c36()) { // c36() = yY.length > 0 = queue not empty?
n(); // YES → immediately restart with queued message!
return; // without returning control to user!
}
Historical analysis of npm packages confirms this pattern exists since v2.1.50 (as queue.length > 0) and was refactored to c36() in v2.1.74.
Issue 3: JSONL writer race condition (related)
The session writer class LZq (offset ~10,549,000) has a non-atomic insertMessageChain() that writes assistant (tool_use) and user (tool_result) messages one at a time in a loop:
async insertMessageChain(A, q, K, Y, z) {
return this.trackWrite(async () => {
for (let H of A) {
await this.appendEntry(M); // each message separately!
}
});
}
If the process is interrupted between writing tool_use and tool_result, the tool_use becomes orphaned. This is the root cause of issue #6836.
Proposed Fixes
Fix 1: Streaming timeout (critical)
Add a client-side timeout that aborts and retries if no SSE events are received within N seconds:
// Pseudocode
const STREAM_IDLE_TIMEOUT_MS = 30_000;
let lastEventTime = Date.now();
stream.on('event', () => { lastEventTime = Date.now(); });
const watchdog = setInterval(() => {
if (Date.now() - lastEventTime > STREAM_IDLE_TIMEOUT_MS) {
clearInterval(watchdog);
abortController.abort();
// retry with new connection
}
}, 5_000);
The Anthropic API sends :ping SSE comments periodically. Monitoring for these would detect stale connections without false positives.
Fix 2: ESC should clear the queue
When the user presses ESC during a hang, the queue should be cleared (or the user should be asked):
// After abort, before checking queue:
if (userInitiatedAbort && c36()) {
// Option A: Clear queue entirely
clearQueue();
return; // back to prompt
// Option B: Ask user
// "You have N queued messages. Clear queue? (y/n)"
}
Fix 3: Atomic message chain writes
insertMessageChain() should serialize the entire chain as a single appendToFile() call:
async insertMessageChain(messages) {
const serialized = messages.map(m => JSON.stringify(m)).join('\n') + '\n';
await this.appendToFile(sessionFile, serialized);
}
Note: history.jsonl already uses proper-lockfile for file locking — the same approach should be applied to session JSONL files when multiple agents write concurrently.
Related Issues
Methodology
Analysis performed using:
- ccdiag: Custom Go CLI tool that parses JSONL session files, detects orphaned tool calls, analyzes timing, and scans multiple sessions
- Source analysis:
cli.js extracted from npm packages across 12 versions (2.0.72 through 2.1.74), searched for queue/abort/streaming patterns
- Session data: 1,571 sessions, 148,444 tool calls, 8,007 orphaned
Summary
These two bugs have been plaguing users for months (see #26224 — 28 comments, #6836 — 150+ reports), with no root cause analysis from the team. After yet another day of babysitting Claude Code and pressing ESC every few minutes to revive a hung agent, we decided to conduct our own deep investigation — reverse-engineering
cli.jsacross 12 npm package versions and analyzing 1,571 session JSONL files containing 148,444 tool calls.Here are the exact root causes and proposed fixes.
Claude Code hangs indefinitely when an SSE streaming connection silently dies. There is no client-side timeout or heartbeat detection, so the process waits forever for events that will never arrive. ESC partially works around this by aborting the dead connection, but the queue auto-restart mechanism (
queue.length > 0 → n()) immediately starts the next queued prompt instead of returning control to the user.Root cause identified in source code — two separate issues in
cli.js:messages.stream()call has no timeout. If the SSE connection dies silently (TCP half-open), the client waits forever.if (queue.length > 0) { n(); return; }immediately starts the next queued prompt. The user cannot fully cancel.Environment
Reproduction
❯Frequency
Measured across 1,571 sessions using a custom JSONL analyzer tool:
The hang frequency has been increasing over time: rare in fall 2025, now ~10-15% of prompts per hour.
Source Code Analysis
Analyzed
cli.jsextracted fromnpm pack @anthropic-ai/claude-codeacross versions 2.0.72 through 2.1.74.Issue 1: No streaming timeout
The API call at approximately offset 2,553,870 in cli.js (v2.1.74):
There is no timeout parameter, no keepalive check, and no heartbeat detection. The Anthropic SSE API sends periodic
:pingcomments, but the client does not monitor for their absence.When the TCP connection silently dies (common on Windows, WiFi, VPN, or after laptop sleep), the Node.js HTTP client has no way to know the connection is dead. The
AbortControllersignal is never triggered because no error event fires.Evidence: Packet inspection by other reporters confirms the client is stuck waiting for SSE events that never arrive. Token count stays at 0. ESC + re-submit creates a new connection that works immediately.
Issue 2: Queue auto-restart prevents full cancellation
The main processing loop (offset ~11,400,559 in v2.1.74):
After completion or abort — in the
finallyblock (offset ~11,406,174):Historical analysis of npm packages confirms this pattern exists since v2.1.50 (as
queue.length > 0) and was refactored toc36()in v2.1.74.Issue 3: JSONL writer race condition (related)
The session writer class
LZq(offset ~10,549,000) has a non-atomicinsertMessageChain()that writes assistant (tool_use) and user (tool_result) messages one at a time in a loop:If the process is interrupted between writing tool_use and tool_result, the tool_use becomes orphaned. This is the root cause of issue #6836.
Proposed Fixes
Fix 1: Streaming timeout (critical)
Add a client-side timeout that aborts and retries if no SSE events are received within N seconds:
The Anthropic API sends
:pingSSE comments periodically. Monitoring for these would detect stale connections without false positives.Fix 2: ESC should clear the queue
When the user presses ESC during a hang, the queue should be cleared (or the user should be asked):
Fix 3: Atomic message chain writes
insertMessageChain()should serialize the entire chain as a singleappendToFile()call:Note:
history.jsonlalready usesproper-lockfilefor file locking — the same approach should be applied to session JSONL files when multiple agents write concurrently.Related Issues
.claude.jsonarchitectural issues (non-atomic writes, no separation of concerns)Methodology
Analysis performed using:
cli.jsextracted from npm packages across 12 versions (2.0.72 through 2.1.74), searched for queue/abort/streaming patterns