Skip to content

Commit a81f9ee

Browse files
committed
feat: Add AI response timeout configuration and SSE stream error handling
1 parent eba9e2d commit a81f9ee

7 files changed

Lines changed: 158 additions & 40 deletions

File tree

docs/operations_configuration.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,7 @@ Choose **one** method (mutually exclusive):
235235
| [`TOKENS_ESTIMATION_DEFAULT_ENCODING`](#tokens-encoding) | `o200k_base` | Tiktoken encoding algorithm: `o200k_base` (GPT-4o+), `cl100k_base` (GPT-4), or `p50k_base` |
236236
| [`DEFAULT_MODEL_PARAMS`](#default-model-params) | `{}` | JSON object with per-model default inference parameters (temperature, max_tokens, etc.) |
237237
| [`MODEL_CACHE_SECONDS`](#model-cache-seconds) | `900` | Model list cache lifetime in seconds before lazy refresh (default: 15 minutes) |
238+
| [`AI_RESPONSE_TIMEOUT`](#ai-response-timeout) | `600` | Maximum seconds to wait for a model to complete a response (default: 10 minutes) |
238239
| [`DROP_UNSUPPORTED_SYSTEM_PROMPT`](#drop-unsupported-system-prompt) | `true` | Drop system prompts for unsupported models; when `false`, return error instead |
239240
| [`ANTHROPIC_BETA_FILTER`](#anthropic-beta-filter) | `true` | Enable filtering of unsupported `anthropic_beta` flags for Claude models |
240241
| [`ANTHROPIC_BETA_ALLOWLIST`](#anthropic-beta-allowlist) | `(empty)` | Additional `anthropic_beta` flags to allow beyond built-in Bedrock defaults |
@@ -2895,6 +2896,38 @@ export MODEL_CACHE_SECONDS=3600
28952896
- **Rate Limits**: Very frequent refreshes in high-traffic deployments may approach API rate limits, though parallel execution doesn't increase per-region request rate
28962897
- **Multi-Region**: Refresh latency is determined by the slowest responding region, not the total number of regions, thanks to parallel execution
28972898

2899+
#### `AI_RESPONSE_TIMEOUT` { #ai-response-timeout }
2900+
2901+
:octicons-package-24: **Purpose**
2902+
: Maximum time in seconds to wait for an AI model to complete a response
2903+
2904+
:octicons-database-24: **Type**
2905+
: Integer (seconds, must be greater than 0)
2906+
2907+
:octicons-gear-24: **Default**
2908+
: `600` (10 minutes)
2909+
2910+
:octicons-workflow-24: **Behavior**
2911+
: Applies to both streaming and non-streaming requests. The timer starts from the moment the model begins generating and covers the full duration until the last token is received. If the model does not complete within this limit, the connection is closed and the request fails with a timeout error
2912+
2913+
```bash
2914+
# Default (10 minutes) - suitable for extended thinking models
2915+
export AI_RESPONSE_TIMEOUT=600
2916+
2917+
# Shorter timeout for standard models (2 minutes)
2918+
export AI_RESPONSE_TIMEOUT=120
2919+
2920+
# Longer timeout for very long documents or high reasoning budgets (15 minutes)
2921+
export AI_RESPONSE_TIMEOUT=900
2922+
```
2923+
2924+
!!! tip "When to Adjust"
2925+
- **Increase** if you see timeout errors with models that use extended thinking/reasoning, large document analysis, or high token budgets
2926+
- **Decrease** to fail fast and free resources if your workload only uses standard models where long waits indicate a problem
2927+
2928+
!!! info "Extended Thinking Models"
2929+
Models with extended reasoning capabilities (such as Claude with `thinking` enabled or high `reasoning_effort`) may spend significant time generating internal reasoning steps before producing output. The default of 600 seconds accommodates these use cases. Standard models without extended thinking typically respond within 60 seconds.
2930+
28982931
---
28992932

29002933
## Default Model Parameters

stdapi/aws.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
retries=_RETRIES,
2929
max_pool_connections=_MAX_POOL_CONNECTIONS,
3030
parameter_validation=False,
31+
read_timeout=SETTINGS.ai_response_timeout,
3132
)
3233

3334
getLogger("aiobotocore").setLevel("CRITICAL")

stdapi/aws_bedrock.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -366,6 +366,41 @@ def get_extra_model_parameters(
366366
return params
367367

368368

369+
#: AWS error codes to HTTP status + error type mapping
370+
AWS_ERROR_MAP: dict[str, tuple[int, str]] = {
371+
**dict.fromkeys(
372+
{
373+
"ThrottlingException",
374+
"TooManyRequestsException",
375+
"ServiceQuotaExceededException",
376+
},
377+
(429, "rate_limit_error"),
378+
),
379+
**dict.fromkeys({"AccessDeniedException"}, (403, "permission_error")),
380+
**dict.fromkeys(
381+
{
382+
"UnrecognizedClientException",
383+
"InvalidSignatureException",
384+
"ExpiredTokenException",
385+
},
386+
(401, "authentication_error"),
387+
),
388+
**dict.fromkeys({"ResourceNotFoundException"}, (404, "not_found_error")),
389+
**dict.fromkeys(
390+
{"ValidationException", "BadRequestException"}, (400, "invalid_request_error")
391+
),
392+
**dict.fromkeys(
393+
{
394+
"ServiceUnavailableException",
395+
"InternalServerException",
396+
"ServiceFailureException",
397+
"ReadTimeoutError",
398+
},
399+
(503, "server_error"),
400+
),
401+
}
402+
403+
369404
@contextmanager
370405
def handle_bedrock_client_error() -> Generator[None]:
371406
"""Context manager to translate Bedrock client errors to appropriate HTTP 4XX/5XX when possible.

stdapi/config.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -671,6 +671,24 @@ class _Settings(BaseSettings):
671671
),
672672
)
673673

674+
ai_response_timeout: int = Field(
675+
default=600,
676+
gt=0,
677+
description=(
678+
"Maximum time in seconds to wait for an AI model to complete a response. "
679+
"This applies to both streaming and non-streaming requests, from the moment "
680+
"the model starts generating until the last token is received.\n\n"
681+
"The default of 600 seconds (10 minutes) accommodates models with extended "
682+
"reasoning or thinking capabilities, which may take longer to generate "
683+
"complex responses. For standard models without extended thinking, responses "
684+
"typically complete well within 60 seconds.\n\n"
685+
"Increase this value if you experience timeout errors with long-running "
686+
"requests (e.g., large document analysis, complex reasoning tasks). "
687+
"Decrease it to fail fast on unexpectedly slow responses.\n\n"
688+
"Example: 300 (5 minutes), 600 (10 minutes, default), 900 (15 minutes)"
689+
),
690+
)
691+
674692
model_cache_seconds: int = Field(
675693
default=900,
676694
description=(

stdapi/main.py

Lines changed: 2 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
from stdapi.auth import initialize_authentication
2121
from stdapi.aws import AWSConnectionManager, initialize_aws_account_info
2222
from stdapi.aws_bedrock import (
23+
AWS_ERROR_MAP,
2324
set_guardrail_configuration,
2425
set_performance_configuration,
2526
)
@@ -311,40 +312,6 @@ async def handle_validation_exception(
311312
)
312313

313314

314-
#: AWS error codes to OpenAI error codes
315-
_AWS_ERROR_MAP: dict[str, tuple[int, str]] = {
316-
**dict.fromkeys(
317-
{
318-
"ThrottlingException",
319-
"TooManyRequestsException",
320-
"ServiceQuotaExceededException",
321-
},
322-
(429, "rate_limit_error"),
323-
),
324-
**dict.fromkeys({"AccessDeniedException"}, (403, "permission_error")),
325-
**dict.fromkeys(
326-
{
327-
"UnrecognizedClientException",
328-
"InvalidSignatureException",
329-
"ExpiredTokenException",
330-
},
331-
(401, "authentication_error"),
332-
),
333-
**dict.fromkeys({"ResourceNotFoundException"}, (404, "not_found_error")),
334-
**dict.fromkeys(
335-
{"ValidationException", "BadRequestException"}, (400, "invalid_request_error")
336-
),
337-
**dict.fromkeys(
338-
{
339-
"ServiceUnavailableException",
340-
"InternalServerException",
341-
"ServiceFailureException",
342-
},
343-
(503, "server_error"),
344-
),
345-
}
346-
347-
348315
@app.exception_handler(ClientError)
349316
async def handle_botocore_client_error(
350317
request: Request, exc: ClientError
@@ -362,7 +329,7 @@ async def handle_botocore_client_error(
362329
"""
363330
error = exc.response["Error"]
364331
aws_code = error["Code"]
365-
status, err_type = _AWS_ERROR_MAP.get(aws_code, (502, "server_error"))
332+
status, err_type = AWS_ERROR_MAP.get(aws_code, (502, "server_error"))
366333
log_error_details(error["Message"], status=status)
367334
return JSONResponse(
368335
*format_http_error(

stdapi/models/chat/_default.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
from stdapi.models.chat._adapters import _openai_chat_completion as openai_adapter
2626
from stdapi.monitoring import (
2727
REQUEST_HEADERS,
28-
log_request_stream_event,
28+
log_request_sse_stream_event,
2929
log_response_params,
3030
)
3131
from stdapi.types.anthropic_messages import ToolChoiceToolParam
@@ -158,7 +158,7 @@ async def create_completion(
158158
)
159159
if request.stream:
160160
return EventSourceResponse(
161-
await log_request_stream_event(
161+
log_request_sse_stream_event(
162162
openai_adapter.format_stream(
163163
completion_id,
164164
created,
@@ -267,7 +267,7 @@ async def create_message(
267267
await bedrock_runtime.converse_stream(**bedrock_request)
268268
)["stream"]
269269
return EventSourceResponse(
270-
await log_request_stream_event(
270+
log_request_sse_stream_event(
271271
anthropic_adapter.format_stream(
272272
message_id, request.model, bedrock_stream, forced_tool
273273
)

stdapi/monitoring.py

Lines changed: 66 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,22 @@
66
from traceback import format_exception
77
from typing import TYPE_CHECKING, Any, Literal, NotRequired, TypedDict, TypeVar
88

9+
from botocore.exceptions import ClientError
10+
from fastapi import Request # noqa: TC002
911
from pydantic import AwareDatetime, BaseModel, JsonValue
12+
from sse_starlette import JSONServerSentEvent
1013

14+
from stdapi.api_errors import ApiError
15+
from stdapi.api_providers import format_http_error
16+
from stdapi.aws_bedrock import AWS_ERROR_MAP
1117
from stdapi.config import SETTINGS, LogLevel
1218
from stdapi.metering import SERVER_FULL_VERSION
1319
from stdapi.server import SERVER_NAME
14-
from stdapi.utils import stdout_write, webuuid
20+
from stdapi.utils import hide_security_details, stdout_write, webuuid
1521

1622
if TYPE_CHECKING:
1723
from collections.abc import AsyncGenerator, Generator
1824

19-
from fastapi import Request
2025
from pydantic.main import IncEx
2126
from starlette.datastructures import Headers
2227
from types_aiobotocore_meteringmarketplace.type_defs import (
@@ -99,6 +104,9 @@ class EventLog(TypedDict):
99104
#: Request HTTP headers
100105
REQUEST_HEADERS: ContextVar[Headers] = ContextVar("request_headers")
101106

107+
#: HTTP request object
108+
REQUEST: ContextVar[Request] = ContextVar("request")
109+
102110
#: Paths to ignore in logging
103111
LOGGING_PATHS_IGNORE = {
104112
"/",
@@ -162,6 +170,7 @@ def log_request_event(request: Request) -> Generator[EventLog]:
162170
REQUEST_ID.set(request_id)
163171
request_time = SETTINGS.now()
164172
REQUEST_TIME.set(request_time)
173+
REQUEST.set(request)
165174
log = EventLog(
166175
type="request",
167176
level="info",
@@ -408,3 +417,58 @@ async def log_request_stream_event[T](stream: AsyncGenerator[T]) -> AsyncGenerat
408417
Items from the input asynchronous generator in their modified or original form.
409418
"""
410419
return _rebuild_and_log_stream(await stream.__anext__(), stream)
420+
421+
422+
async def log_request_sse_stream_event(
423+
stream: AsyncGenerator[JSONServerSentEvent],
424+
) -> AsyncGenerator[JSONServerSentEvent]:
425+
"""Log, monitor, and error-guard an SSE stream for use with ``EventSourceResponse``.
426+
427+
Combines :func:`log_request_stream_event` and an SSE error boundary into a
428+
single step. After the HTTP response headers are sent, any exception that
429+
escapes the underlying generator cannot be turned into an HTTP error response
430+
(Starlette raises ``RuntimeError: Caught handled exception, but response
431+
already started``). This wrapper catches such exceptions, logs them via
432+
:func:`log_error_details`, and yields a terminal ``error`` SSE event
433+
formatted for the matched API provider so that ``EventSourceResponse`` can
434+
close the connection cleanly.
435+
436+
Args:
437+
stream: Raw SSE async generator (e.g. from an adapter's ``format_stream``).
438+
439+
Yields:
440+
Items from ``stream`` (after monitoring setup), followed by a provider-
441+
formatted ``error`` SSE event on failure.
442+
"""
443+
try:
444+
async for chunk in _rebuild_and_log_stream(await stream.__anext__(), stream):
445+
yield chunk
446+
except ApiError as exc:
447+
status = exc.status
448+
log_error_details(exc.args[0], status=status)
449+
yield JSONServerSentEvent(
450+
data=format_http_error(
451+
REQUEST.get(),
452+
status,
453+
hide_security_details(status, exc.args[0]),
454+
exc.param,
455+
exc.code,
456+
)[0],
457+
event="error",
458+
)
459+
except ClientError as exc:
460+
error = exc.response["Error"]
461+
status = AWS_ERROR_MAP.get(error["Code"], (502, "server_error"))[0]
462+
log_error_details(error["Message"], status=status)
463+
yield JSONServerSentEvent(
464+
data=format_http_error(
465+
REQUEST.get(), status, hide_security_details(status, error["Message"])
466+
)[0],
467+
event="error",
468+
)
469+
except Exception as exc: # noqa: BLE001
470+
log_error_details("\n".join(format_exception(exc)), level="critical")
471+
yield JSONServerSentEvent(
472+
data=format_http_error(REQUEST.get(), 500, "Internal Server Error")[0],
473+
event="error",
474+
)

0 commit comments

Comments
 (0)