Problem
The SDK currently creates new HTTP connections for each API request. Each new connection requires a TCP handshake (~50-100ms) and TLS negotiation (~100-200ms), adding ~150-300ms overhead per request.
For applications making multiple sequential API calls (e.g., chat followed by embed, or multiple embed calls), this overhead adds up.
Proposed Solution
Configure httpx.Limits on the default httpx.Client and httpx.AsyncClient instances to enable connection reuse:
httpx.Limits(
max_keepalive_connections=20,
max_connections=100,
keepalive_expiry=30.0
)
This is a ~16 line change across sync and async clients.
Expected Impact
- 15-30% reduction in latency for subsequent API calls
- Reduced server load from fewer connection establishments
- More predictable response times
Context
We use the Cohere SDK at Oracle for workloads involving multiple sequential API calls. Connection pooling is a standard optimization in HTTP client libraries and httpx supports it natively.
Implementation available in PR #697.
References
Problem
The SDK currently creates new HTTP connections for each API request. Each new connection requires a TCP handshake (~50-100ms) and TLS negotiation (~100-200ms), adding ~150-300ms overhead per request.
For applications making multiple sequential API calls (e.g., chat followed by embed, or multiple embed calls), this overhead adds up.
Proposed Solution
Configure
httpx.Limitson the defaulthttpx.Clientandhttpx.AsyncClientinstances to enable connection reuse:This is a ~16 line change across sync and async clients.
Expected Impact
Context
We use the Cohere SDK at Oracle for workloads involving multiple sequential API calls. Connection pooling is a standard optimization in HTTP client libraries and httpx supports it natively.
Implementation available in PR #697.
References