Skip to content

fix(passthrough): swallow flush replay errors; map Anthropic overloaded_error to 529 (#29187)#29205

Open
Anai-Guo wants to merge 1 commit into
BerriAI:litellm_internal_stagingfrom
Anai-Guo:fix/bedrock-passthrough-overloaded-flush
Open

fix(passthrough): swallow flush replay errors; map Anthropic overloaded_error to 529 (#29187)#29205
Anai-Guo wants to merge 1 commit into
BerriAI:litellm_internal_stagingfrom
Anai-Guo:fix/bedrock-passthrough-overloaded-flush

Conversation

@Anai-Guo

Copy link
Copy Markdown
Contributor

Summary

Closes #29187.

Two small, scoped changes that together turn the Bedrock-passthrough
"client sees Internal server error, no signal to retry" failure into a
proper 529 surfacing path, and stop the
Task exception was never retrieved log spam during Anthropic
overloads.

1. litellm_core_utils/litellm_logging.py — wrap flush replay in try/except

flush_passthrough_collected_chunks and its async sibling are scheduled
via asyncio.create_task in the finally block from PR #26719 (v1.84.0
onwards). They replay the buffered stream through the provider config
purely for spend-tracking / success-logging — the client HTTP response is
already fully closed by the time we get here.

If the buffered stream contains a mid-stream error event (Anthropic's
overloaded_error is the visible one for Claude Code via Bedrock,
covered in the issue), the provider config raises during chunk replay,
the unhandled exception escapes the async task, and the user sees:

Task exception was never retrieved
future: <Task ... exception=AnthropicError('Overloaded')>

…while the real surfacing path (the actual HTTP response) is already
gone. Wrapping the replay in try/except logs it as a warning, skips
the success handler (no spend to track for a failed call anyway), and
lets the asyncio task complete cleanly. This is symmetric across the
sync (flush_passthrough_collected_chunks) and async
(async_flush_passthrough_collected_chunks) paths.

2. llms/anthropic/chat/handler.py — pick status from error.type

ModelResponseIterator.chunk_parser's type == "error" branch
hard-coded status_code=500 for every in-stream error event, with a
comment noting Anthropic doesn't return an HTTP status in the chunk.

Anthropic's public convention is 529 for overloaded_error (their
own SDK / docs use this code for the "API temporarily overloaded" case).
Mapping error.type == "overloaded_error" → 529 lets retry middleware
and observers distinguish a transient/retryable overload from a generic
500 server error. Any other error.type keeps the existing 500 default.

Test plan

Out of scope


🤖 Generated with Claude Code

…oaded_error to 529 (BerriAI#29187)

Signed-off-by: Tai An <antai12232931@outlook.com>
@greptile-apps

greptile-apps Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR makes two targeted fixes to the Bedrock passthrough + Anthropic overload error path: it wraps the post-stream flush (used only for spend-tracking) in a try/except so that in-stream error events no longer escape as unhandled asyncio task exceptions, and it maps the overloaded_error in-stream type to HTTP 529 instead of the previous hard-coded 500.

  • litellm_logging.py: Both flush_passthrough_collected_chunks and async_flush_passthrough_collected_chunks now catch any exception raised by _flush_passthrough_collected_chunks_helper, emit a WARNING-level log, and return early — correctly preventing Task exception was never retrieved log spam without affecting the already-closed client response.
  • handler.py: ModelResponseIterator.chunk_parser now checks error.type and assigns status_code=529 for overloaded_error, falling back to 500 for all other types — enabling retry middleware to distinguish transient overloads from generic server errors.

Confidence Score: 4/5

Both changes are narrow and well-scoped; the flush wrapping adds no new risk to the client response path, and the 529 mapping is straightforwardly additive.

The flush try/except silently skips failure-handler invocation when an error is caught, so overloaded passthrough calls will not appear in cost or alerting callbacks. This is a present observability gap, but it does not affect correctness of the client-facing response, which is already closed before the flush runs.

litellm/litellm_core_utils/litellm_logging.py — the exception branch in both flush methods returns without firing any failure handler.

Important Files Changed

Filename Overview
litellm/litellm_core_utils/litellm_logging.py Wraps _flush_passthrough_collected_chunks_helper in try/except for both sync and async flush paths; on error, logs a warning and returns early instead of propagating the exception as an unhandled asyncio task exception.
litellm/llms/anthropic/chat/handler.py Maps overloaded_error in-stream error type to HTTP 529 status instead of the former hard-coded 500, allowing retry middleware to distinguish transient overload from generic server errors.

Comments Outside Diff (1)

  1. litellm/litellm_core_utils/litellm_logging.py, line 2019-2044 (link)

    P2 No failure handler called on flush error — failed passthrough calls go unobserved

    When _flush_passthrough_collected_chunks_helper raises (e.g. on overloaded_error), the new code logs a warning and returns without calling self.failure_handler(...). The sync path has the same gap. This means overloaded / errored passthrough calls will never appear in cost-tracking or alerting dashboards as failures — they are silently dropped from observability. If a caller relies on failure callbacks (e.g. for budget enforcement or alert thresholds) those callbacks are never fired for this code path.

    Consider whether self.failure_handler(e) / await self.async_failure_handler(e) should be invoked inside the except block after the warning log, mirroring how non-passthrough streaming errors are handled.

Reviews (1): Last reviewed commit: "fix(passthrough): swallow flush replay e..." | Re-trigger Greptile

@codecov

codecov Bot commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/litellm_core_utils/litellm_logging.py 0.00% 10 Missing ⚠️
litellm/llms/anthropic/chat/handler.py 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

lukaso added a commit to lukaso/released that referenced this pull request Jun 4, 2026
The bare <a> in the "Not merged yet" copy on the unmerged-PR page
("BerriAI/litellm#29205 hasn't been merged…") had no explicit color and
fell back to the browser default #0000EE on the #111111 surface — 2:1
contrast, well below WCAG AA's 4.5:1 floor. axe-core caught it on the
unmerged-PR page once we added coverage there.

Adds .answer-date a styling matching the established pattern
(.answer-meta a / .pr-banner a / .sec-label a): --text at rest with a
muted underline, --accent on hover.

Also:
- a11y-contrast.test.ts: new case covering the open-PR page (regression
  gate for the dark-blue-link bug shape).
- a11y-full-audit.test.ts: gated re-runnable sweep that runs the full
  WCAG 2.1 AA ruleset across every distinct page state. Opt-in via
  A11Y_AUDIT=1 — review tool, not a per-PR gate.
- validate.sh: now runs the chromium contrast suite pre-push when
  playwright's chromium is installed; gracefully skips with a one-liner
  install hint otherwise. CI's a11y job remains the authoritative gate.
  Matches the shellcheck/actionlint/osv tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant