Streaming idle timeout (~60s) kills agent/reasoning workloads — root cause analysis & proposed fixes

## Problem

When using copilot-api as a proxy for Claude Code or OpenClaw agent workloads, streaming requests consistently fail after ~60 seconds of idle time (no tokens emitted). This happens during extended thinking / reasoning phases where the model is computing internally but not yet producing output tokens.

**Observed error:**
```
HTTP error: { error: { message: "Timed out reading request body. Try again, or use a smaller request size.", code: "user_request_timeout" } }
--> POST /v1/messages?beta=true 408 61s
```

Also manifests as:
- `SocketError: other side closed` (#148)
- `Bun.serve idleTimeout` in Docker (#153)
- Generic `Failed to create chat completions` with no retry

## Root Cause Analysis

The timeout is **not in copilot-api itself** — it comes from GitHub Copilot's upstream infrastructure (load balancers / reverse proxies at `api.githubcopilot.com`). The typical idle timeout for Azure/AWS ALBs and Nginx reverse proxies defaults to 60 seconds.

The chain:
```
Client → copilot-api (localhost) → api.githubcopilot.com → LLM inference
                                          ↑
                              Infrastructure idle timeout here (~60s)
```

This works fine for normal chat completions where tokens stream continuously. But LLM reasoning/thinking models (Claude with extended thinking, o1/o3 with chain-of-thought) can have 60+ second gaps between the request being accepted and the first token being emitted.

## Impact on Agent Workloads

This is particularly severe for agent use cases (Claude Code, OpenClaw, Codex CLI, etc.):
- **Source code analysis**: Large file reads → long thinking → timeout before first token
- **Complex reasoning**: Multi-step planning tasks trigger extended thinking
- **Tool-heavy workflows**: Agent accumulates large context → model needs more processing time
- Measured: successful runs have dense event streams with no gaps; failed runs consistently show a maximum gap of ~60.9 seconds (precisely at the infrastructure timeout boundary)

## Related Issues

- #97 — Feature request for configurable timeout (same root cause)
- #153 — Bun idleTimeout in Docker (same root cause, different manifestation)
- #148 — SocketError: other side closed (same root cause)
- #221 — Timed out Error with 408/61s (exact same symptom)

## Proposed Solutions

### 1. Client-side retry on disconnect (copilot-api can do this)
When the upstream connection drops during streaming, detect it and automatically retry the request. The GitHub Copilot API is stateless per-request, so retrying is safe.

### 2. Keepalive SSE comments during streaming
If copilot-api detects a long gap in upstream SSE events, it could inject empty SSE comments (`: keepalive\n\n`) downstream to keep the client connection alive. This doesn't help with the upstream timeout but prevents cascading timeouts in the client.

### 3. Configurable timeout (as requested in #97)
Add `--timeout <ms>` flag to set the upstream request timeout. While this doesn't fix the infrastructure-level idle timeout, it gives users control and documents the limitation.

### 4. Document the limitation
At minimum, document that reasoning-heavy workloads may hit this ~60s idle timeout, and suggest workarounds (smaller context, split tasks).

## Environment

- copilot-api: latest (via npx)
- Runtime: Node.js v22 (not Bun)
- Use case: OpenClaw agent → copilot-api → Claude extended thinking
- OS: Linux x64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming idle timeout (~60s) kills agent/reasoning workloads — root cause analysis & proposed fixes #223

Problem

Root Cause Analysis

Impact on Agent Workloads

Related Issues

Proposed Solutions

1. Client-side retry on disconnect (copilot-api can do this)

2. Keepalive SSE comments during streaming

3. Configurable timeout (as requested in #97)

4. Document the limitation

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Streaming idle timeout (~60s) kills agent/reasoning workloads — root cause analysis & proposed fixes #223

Description

Problem

Root Cause Analysis

Impact on Agent Workloads

Related Issues

Proposed Solutions

1. Client-side retry on disconnect (copilot-api can do this)

2. Keepalive SSE comments during streaming

3. Configurable timeout (as requested in #97)

4. Document the limitation

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions