Problem
When using copilot-api as a proxy for Claude Code or OpenClaw agent workloads, streaming requests consistently fail after ~60 seconds of idle time (no tokens emitted). This happens during extended thinking / reasoning phases where the model is computing internally but not yet producing output tokens.
Observed error:
HTTP error: { error: { message: "Timed out reading request body. Try again, or use a smaller request size.", code: "user_request_timeout" } }
--> POST /v1/messages?beta=true 408 61s
Also manifests as:
Root Cause Analysis
The timeout is not in copilot-api itself — it comes from GitHub Copilot's upstream infrastructure (load balancers / reverse proxies at api.githubcopilot.com). The typical idle timeout for Azure/AWS ALBs and Nginx reverse proxies defaults to 60 seconds.
The chain:
Client → copilot-api (localhost) → api.githubcopilot.com → LLM inference
↑
Infrastructure idle timeout here (~60s)
This works fine for normal chat completions where tokens stream continuously. But LLM reasoning/thinking models (Claude with extended thinking, o1/o3 with chain-of-thought) can have 60+ second gaps between the request being accepted and the first token being emitted.
Impact on Agent Workloads
This is particularly severe for agent use cases (Claude Code, OpenClaw, Codex CLI, etc.):
- Source code analysis: Large file reads → long thinking → timeout before first token
- Complex reasoning: Multi-step planning tasks trigger extended thinking
- Tool-heavy workflows: Agent accumulates large context → model needs more processing time
- Measured: successful runs have dense event streams with no gaps; failed runs consistently show a maximum gap of ~60.9 seconds (precisely at the infrastructure timeout boundary)
Related Issues
Proposed Solutions
1. Client-side retry on disconnect (copilot-api can do this)
When the upstream connection drops during streaming, detect it and automatically retry the request. The GitHub Copilot API is stateless per-request, so retrying is safe.
2. Keepalive SSE comments during streaming
If copilot-api detects a long gap in upstream SSE events, it could inject empty SSE comments (: keepalive\n\n) downstream to keep the client connection alive. This doesn't help with the upstream timeout but prevents cascading timeouts in the client.
3. Configurable timeout (as requested in #97)
Add --timeout <ms> flag to set the upstream request timeout. While this doesn't fix the infrastructure-level idle timeout, it gives users control and documents the limitation.
4. Document the limitation
At minimum, document that reasoning-heavy workloads may hit this ~60s idle timeout, and suggest workarounds (smaller context, split tasks).
Environment
- copilot-api: latest (via npx)
- Runtime: Node.js v22 (not Bun)
- Use case: OpenClaw agent → copilot-api → Claude extended thinking
- OS: Linux x64
Problem
When using copilot-api as a proxy for Claude Code or OpenClaw agent workloads, streaming requests consistently fail after ~60 seconds of idle time (no tokens emitted). This happens during extended thinking / reasoning phases where the model is computing internally but not yet producing output tokens.
Observed error:
Also manifests as:
SocketError: other side closed(SocketError: other side closed #148)Bun.serve idleTimeoutin Docker (docker deployment can sometimes also trigger bun idleTimeout #153)Failed to create chat completionswith no retryRoot Cause Analysis
The timeout is not in copilot-api itself — it comes from GitHub Copilot's upstream infrastructure (load balancers / reverse proxies at
api.githubcopilot.com). The typical idle timeout for Azure/AWS ALBs and Nginx reverse proxies defaults to 60 seconds.The chain:
This works fine for normal chat completions where tokens stream continuously. But LLM reasoning/thinking models (Claude with extended thinking, o1/o3 with chain-of-thought) can have 60+ second gaps between the request being accepted and the first token being emitted.
Impact on Agent Workloads
This is particularly severe for agent use cases (Claude Code, OpenClaw, Codex CLI, etc.):
Related Issues
Proposed Solutions
1. Client-side retry on disconnect (copilot-api can do this)
When the upstream connection drops during streaming, detect it and automatically retry the request. The GitHub Copilot API is stateless per-request, so retrying is safe.
2. Keepalive SSE comments during streaming
If copilot-api detects a long gap in upstream SSE events, it could inject empty SSE comments (
: keepalive\n\n) downstream to keep the client connection alive. This doesn't help with the upstream timeout but prevents cascading timeouts in the client.3. Configurable timeout (as requested in #97)
Add
--timeout <ms>flag to set the upstream request timeout. While this doesn't fix the infrastructure-level idle timeout, it gives users control and documents the limitation.4. Document the limitation
At minimum, document that reasoning-heavy workloads may hit this ~60s idle timeout, and suggest workarounds (smaller context, split tasks).
Environment