Skip to content

Streaming idle timeout (~60s) kills agent/reasoning workloads — root cause analysis & proposed fixes #223

@kagura-agent

Description

@kagura-agent

Problem

When using copilot-api as a proxy for Claude Code or OpenClaw agent workloads, streaming requests consistently fail after ~60 seconds of idle time (no tokens emitted). This happens during extended thinking / reasoning phases where the model is computing internally but not yet producing output tokens.

Observed error:

HTTP error: { error: { message: "Timed out reading request body. Try again, or use a smaller request size.", code: "user_request_timeout" } }
--> POST /v1/messages?beta=true 408 61s

Also manifests as:

Root Cause Analysis

The timeout is not in copilot-api itself — it comes from GitHub Copilot's upstream infrastructure (load balancers / reverse proxies at api.githubcopilot.com). The typical idle timeout for Azure/AWS ALBs and Nginx reverse proxies defaults to 60 seconds.

The chain:

Client → copilot-api (localhost) → api.githubcopilot.com → LLM inference
                                          ↑
                              Infrastructure idle timeout here (~60s)

This works fine for normal chat completions where tokens stream continuously. But LLM reasoning/thinking models (Claude with extended thinking, o1/o3 with chain-of-thought) can have 60+ second gaps between the request being accepted and the first token being emitted.

Impact on Agent Workloads

This is particularly severe for agent use cases (Claude Code, OpenClaw, Codex CLI, etc.):

  • Source code analysis: Large file reads → long thinking → timeout before first token
  • Complex reasoning: Multi-step planning tasks trigger extended thinking
  • Tool-heavy workflows: Agent accumulates large context → model needs more processing time
  • Measured: successful runs have dense event streams with no gaps; failed runs consistently show a maximum gap of ~60.9 seconds (precisely at the infrastructure timeout boundary)

Related Issues

Proposed Solutions

1. Client-side retry on disconnect (copilot-api can do this)

When the upstream connection drops during streaming, detect it and automatically retry the request. The GitHub Copilot API is stateless per-request, so retrying is safe.

2. Keepalive SSE comments during streaming

If copilot-api detects a long gap in upstream SSE events, it could inject empty SSE comments (: keepalive\n\n) downstream to keep the client connection alive. This doesn't help with the upstream timeout but prevents cascading timeouts in the client.

3. Configurable timeout (as requested in #97)

Add --timeout <ms> flag to set the upstream request timeout. While this doesn't fix the infrastructure-level idle timeout, it gives users control and documents the limitation.

4. Document the limitation

At minimum, document that reasoning-heavy workloads may hit this ~60s idle timeout, and suggest workarounds (smaller context, split tasks).

Environment

  • copilot-api: latest (via npx)
  • Runtime: Node.js v22 (not Bun)
  • Use case: OpenClaw agent → copilot-api → Claude extended thinking
  • OS: Linux x64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions