fix(passthrough): swallow flush replay errors; map Anthropic overloaded_error to 529 (#29187) by Anai-Guo · Pull Request #29205 · BerriAI/litellm

Anai-Guo · 2026-05-28T19:20:57Z

Summary

Two small, scoped changes that together turn the Bedrock-passthrough
"client sees Internal server error, no signal to retry" failure into a
proper 529 surfacing path, and stop the
Task exception was never retrieved log spam during Anthropic
overloads.

1. `litellm_core_utils/litellm_logging.py` — wrap flush replay in `try/except`

flush_passthrough_collected_chunks and its async sibling are scheduled
via asyncio.create_task in the finally block from PR #26719 (v1.84.0
onwards). They replay the buffered stream through the provider config
purely for spend-tracking / success-logging — the client HTTP response is
already fully closed by the time we get here.

If the buffered stream contains a mid-stream error event (Anthropic's
overloaded_error is the visible one for Claude Code via Bedrock,
covered in the issue), the provider config raises during chunk replay,
the unhandled exception escapes the async task, and the user sees:

Task exception was never retrieved
future: <Task ... exception=AnthropicError('Overloaded')>

…while the real surfacing path (the actual HTTP response) is already
gone. Wrapping the replay in try/except logs it as a warning, skips
the success handler (no spend to track for a failed call anyway), and
lets the asyncio task complete cleanly. This is symmetric across the
sync (flush_passthrough_collected_chunks) and async
(async_flush_passthrough_collected_chunks) paths.

2. `llms/anthropic/chat/handler.py` — pick status from `error.type`

ModelResponseIterator.chunk_parser's type == "error" branch
hard-coded status_code=500 for every in-stream error event, with a
comment noting Anthropic doesn't return an HTTP status in the chunk.

Anthropic's public convention is 529 for overloaded_error (their
own SDK / docs use this code for the "API temporarily overloaded" case).
Mapping error.type == "overloaded_error" → 529 lets retry middleware
and observers distinguish a transient/retryable overload from a generic
500 server error. Any other error.type keeps the existing 500 default.

Test plan

Static reasoning against the failing trace in Bedrock passthrough requests appear to hang then fail with "Internal server error" when Anthropic is overloaded — overloaded error is never surfaced to the client (e.g. Claude Code) #29187:
- the AnthropicError('Overloaded') raised at
  anthropic/chat/handler.py:851 is now wrapped at the
  litellm_logging.py:1996 / 2010 callsites — no more
  Task exception was never retrieved.
- the same code path, when surfaced via the normal (non-passthrough)
  streaming path, now propagates as a 529 rather than a 500 — clients
  can choose to retry.
python -m py_compile both files.
Behavioral non-regression: when complete_streaming_response is None (existing path) or replay succeeds, control flow is
unchanged — success_handler / async_success_handler is still
invoked exactly as before.
Black-style formatting preserved (multi-line tuple-assign matches
neighboring style in the file).

Out of scope

The related anthropic_messages route mid-stream fallback ([Bug]: Mid-stream fallback not supported for anthropic_messages route type #24004)
and /v1/messages async_sse_wrapper error handling ([Bug]: No Error Handling in /v1/messages Path #24609) are
separate surfacing paths; this PR only addresses the bedrock
passthrough flush path called out in Bedrock passthrough requests appear to hang then fail with "Internal server error" when Anthropic is overloaded — overloaded error is never surfaced to the client (e.g. Claude Code) #29187 (root + secondary
finding).

🤖 Generated with Claude Code

…oaded_error to 529 (BerriAI#29187) Signed-off-by: Tai An <antai12232931@outlook.com>

greptile-apps · 2026-05-28T19:23:05Z

Greptile Summary

This PR makes two targeted fixes to the Bedrock passthrough + Anthropic overload error path: it wraps the post-stream flush (used only for spend-tracking) in a try/except so that in-stream error events no longer escape as unhandled asyncio task exceptions, and it maps the overloaded_error in-stream type to HTTP 529 instead of the previous hard-coded 500.

litellm_logging.py: Both flush_passthrough_collected_chunks and async_flush_passthrough_collected_chunks now catch any exception raised by _flush_passthrough_collected_chunks_helper, emit a WARNING-level log, and return early — correctly preventing Task exception was never retrieved log spam without affecting the already-closed client response.
handler.py: ModelResponseIterator.chunk_parser now checks error.type and assigns status_code=529 for overloaded_error, falling back to 500 for all other types — enabling retry middleware to distinguish transient overloads from generic server errors.

Confidence Score: 4/5

Both changes are narrow and well-scoped; the flush wrapping adds no new risk to the client response path, and the 529 mapping is straightforwardly additive.

The flush try/except silently skips failure-handler invocation when an error is caught, so overloaded passthrough calls will not appear in cost or alerting callbacks. This is a present observability gap, but it does not affect correctness of the client-facing response, which is already closed before the flush runs.

litellm/litellm_core_utils/litellm_logging.py — the exception branch in both flush methods returns without firing any failure handler.

Important Files Changed

Filename	Overview
litellm/litellm_core_utils/litellm_logging.py	Wraps `_flush_passthrough_collected_chunks_helper` in try/except for both sync and async flush paths; on error, logs a warning and returns early instead of propagating the exception as an unhandled asyncio task exception.
litellm/llms/anthropic/chat/handler.py	Maps `overloaded_error` in-stream error type to HTTP 529 status instead of the former hard-coded 500, allowing retry middleware to distinguish transient overload from generic server errors.

Comments Outside Diff (1)

litellm/litellm_core_utils/litellm_logging.py, line 2019-2044 (link)

No failure handler called on flush error — failed passthrough calls go unobserved

When _flush_passthrough_collected_chunks_helper raises (e.g. on overloaded_error), the new code logs a warning and returns without calling self.failure_handler(...). The sync path has the same gap. This means overloaded / errored passthrough calls will never appear in cost-tracking or alerting dashboards as failures — they are silently dropped from observability. If a caller relies on failure callbacks (e.g. for budget enforcement or alert thresholds) those callbacks are never fired for this code path.

Consider whether self.failure_handler(e) / await self.async_failure_handler(e) should be invoked inside the except block after the warning log, mirroring how non-passthrough streaming errors are handled.

_{Reviews (1): Last reviewed commit: "fix(passthrough): swallow flush replay e..." | Re-trigger Greptile}

codecov · 2026-05-28T19:25:33Z

Codecov Report

❌ Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/litellm_core_utils/litellm_logging.py	0.00%	10 Missing ⚠️
litellm/llms/anthropic/chat/handler.py	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

The bare <a> in the "Not merged yet" copy on the unmerged-PR page ("BerriAI/litellm#29205 hasn't been merged…") had no explicit color and fell back to the browser default #0000EE on the #111111 surface — 2:1 contrast, well below WCAG AA's 4.5:1 floor. axe-core caught it on the unmerged-PR page once we added coverage there. Adds .answer-date a styling matching the established pattern (.answer-meta a / .pr-banner a / .sec-label a): --text at rest with a muted underline, --accent on hover. Also: - a11y-contrast.test.ts: new case covering the open-PR page (regression gate for the dark-blue-link bug shape). - a11y-full-audit.test.ts: gated re-runnable sweep that runs the full WCAG 2.1 AA ruleset across every distinct page state. Opt-in via A11Y_AUDIT=1 — review tool, not a per-PR gate. - validate.sh: now runs the chromium contrast suite pre-push when playwright's chromium is installed; gracefully skips with a one-liner install hint otherwise. CI's a11y job remains the authoritative gate. Matches the shellcheck/actionlint/osv tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(passthrough): swallow flush replay errors and map Anthropic overl…

cf6252c

…oaded_error to 529 (BerriAI#29187) Signed-off-by: Tai An <antai12232931@outlook.com>

Santazuki mentioned this pull request Jun 9, 2026

feat(exceptions): Add protocol-level error category normalization #30031

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(passthrough): swallow flush replay errors; map Anthropic overloaded_error to 529 (#29187)#29205

fix(passthrough): swallow flush replay errors; map Anthropic overloaded_error to 529 (#29187)#29205
Anai-Guo wants to merge 1 commit into
BerriAI:litellm_internal_stagingfrom
Anai-Guo:fix/bedrock-passthrough-overloaded-flush

Anai-Guo commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

codecov Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Anai-Guo commented May 28, 2026

Summary

1. litellm_core_utils/litellm_logging.py — wrap flush replay in try/except

2. llms/anthropic/chat/handler.py — pick status from error.type

Test plan

Out of scope

Uh oh!

greptile-apps Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

codecov Bot commented May 28, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `litellm_core_utils/litellm_logging.py` — wrap flush replay in `try/except`

2. `llms/anthropic/chat/handler.py` — pick status from `error.type`

greptile-apps Bot commented May 28, 2026 •

edited

Loading