fix(anthropic): preserve server_tool_use and web_search_tool_result in multi-turn conversations by Chesars · Pull Request #17746 · BerriAI/litellm

Chesars · 2025-12-10T01:35:09Z

Relevant issues

Fixes #17737

Pre-Submission checklist

I have Added testing in the tests/litellm/ directory
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix

Changes

Bug 1: `web_search_tool_result` is dropped

When Anthropic returns web search results, LiteLLM was ignoring that field.

Example request

response = litellm.completion(
    model="anthropic/claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Search the web for X and use my calculator"}],
    tools=[
        {"type": "web_search_20250305"},  # Anthropic's built-in web search
        {"type": "function", "function": {"name": "calculator", ...}}
    ]
)

Anthropic returns:

{"type": "web_search_tool_result", "tool_use_id": "srvtoolu_01ABC", "content": [...]}

LiteLLM returned to user: Nothing. search results were lost.

Fix: Extract web_search_tool_result and include it in provider_specific_fields.web_search_results.

Bug 2: `server_tool_use` reconstructed as `tool_use`

When the user sends messages back for a multi-turn conversation, LiteLLM was converting server-side tool calls to regular tool calls.

User sends to LiteLLM:

{"tool_calls": [{"id": "srvtoolu_01ABC", "function": {"name": "web_search", ...}}]}

LiteLLM sent to Anthropic (before fix):

{"type": "tool_use", "id": "srvtoolu_01ABC", "name": "web_search", ...}

❌ Anthropic requires tool_result for every tool_use, but the user can't provide one for server-executed tools.

LiteLLM sends to Anthropic (after fix):

{"type": "server_tool_use", "id": "srvtoolu_01ABC", "name": "web_search", ...}
{"type": "web_search_tool_result", "tool_use_id": "srvtoolu_01ABC", "content": [...]}

✅ Block types.

Fix: Convert srvtoolu_ prefix and reconstruct as server_tool_use + web_search_tool_result.

Files changed

litellm/llms/anthropic/chat/transformation.py - Extract web_search_tool_result
litellm/litellm_core_utils/prompt_templates/factory.py - Reconstruct server_tool_use
tests/test_litellm/llms/anthropic/chat/test_anthropic_chat_transformation.py - 3 new tests
tests/llm_translation/test_prompt_factory.py - 5 new tests

…n multi-turn conversations - Extract web_search_tool_result blocks in extract_response_content() - Store web_search_results in provider_specific_fields for round-trip - Detect srvtoolu_ prefix to reconstruct as server_tool_use (not tool_use) - Add corresponding web_search_tool_result after server_tool_use blocks This ensures multi-turn conversations with Anthropic web search + custom tools work correctly without Anthropic expecting tool_result for server- side tool executions.

vercel · 2025-12-10T01:35:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Ready	Preview	Comment	Dec 10, 2025 1:36am

ghost · 2025-12-10T06:47:30Z

@Chesars can you please ensure this covers streaming too?

KeremTurgutlu · 2025-12-10T14:32:36Z

@krrishdholakia @Chesars streaming is likely failing because of #17254

Here is an example:

❯  sudo ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY uv run test_litellm_websearch_fix_async.py

Reading inline script metadata from `test_litellm_websearch_fix_async.py`
 Updated https://github.com/BerriAI/litellm.git (0d2f8ce93)

=== First API call (async + streaming) ===
Based on the search results, I found the average weights for male elephants:

- **Male African elephants**: 5,000 kg
- **Male Asian elephants**: 3,600 kg
Tool calls: ['web_search', 'add_numbers']

Executing add_numbers(5000, 3600) = 8600

=== Second API call (async + streaming continuation) ===

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

FAILED: litellm.BadRequestError: AnthropicException - Extra data: line 1 column 60 (char 59)
Received Messages=[{'role': 'user', 'content': 'Search the web for the avg weight in kgs of male African and Asian elephants. Then add the two. Be concise.'}, {'content': 'Based on the search results, I found the average weights for male elephants:\n\n- **Male African elephants**: 5,000 kg\n- **Male Asian elephants**: 3,600 kg', 'role': 'assistant', 'tool_calls': [{'function': {'arguments': '{"query": "average weight male African Asian elephants kg"}{}{}', 'name': 'web_search'}, 'id': 'srvtoolu_01Pupo3WHJ7g8JZF354UKcoE', 'type': 'function'}, {'function': {'arguments': '{"a": 5000, "b": 3600}', 'name': 'add_numbers'}, 'id': 'toolu_01RRf2pCDB7TBUfJGv2oPLGk', 'type': 'function'}]}, {'role': 'tool', 'tool_call_id': 'toolu_01RRf2pCDB7TBUfJGv2oPLGk', 'content': '8600'}]

you can see 'arguments': '{"query": "average weight male African Asian elephants kg"}{}{}', which has extra trailing {} which causes the JSON decoding issue.

KeremTurgutlu · 2025-12-10T14:38:39Z

I can confirm that this is the only issue. For example postprocessing tool_calls before the next call like this works:

# Convert streamed chunks to complete message
full_response = litellm.stream_chunk_builder(chunks)
content = full_response.choices[0].message.content
tool_calls = full_response.choices[0].message.tool_calls or []
for tc in tool_calls:
    if tc.function.arguments:
        tc.function.arguments = tc.function.arguments.replace('{}', '')

❯  sudo ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY uv run test_litellm_websearch_fix_async.py

Password:
Reading inline script metadata from `test_litellm_websearch_fix_async.py`
 Updated https://github.com/BerriAI/litellm.git (0d2f8ce93)

=== First API call (async + streaming) ===
Based on the search results, I found information about male elephant weights:

- Male African savanna elephants average 5,000 kg (11,000 pounds)
- Male Asian elephants weigh on average about 3,600 kg (7,900 pounds)

Now let me add these weights:
Tool calls: ['web_search', 'add_numbers']

Executing add_numbers(5000, 3600) = 8600

=== Second API call (async + streaming continuation) ===
Based on the search results:

- Male African elephants average 5,000 kg
- Male Asian elephants weigh on average about 3,600 kg

**Total: 5,000 + 3,600 = 8,600 kg**
SUCCESS!

==================================================
Test result: PASSED

…tions Fixes BerriAI#18137 Similar to the fix for web_search_tool_result (BerriAI#17746, BerriAI#17798), this PR preserves web_fetch_tool_result blocks in multi-turn conversations. Changes: - Add handling for web_fetch_tool_result in transformation.py (non-streaming) - Add capture of web_fetch_tool_result in handler.py (streaming) - Fix streaming tool arguments bug where empty input {} was prepended to actual arguments by using empty string instead of str({}) - Add unit tests for web_fetch_tool_result handling

…tions (#18142) Fixes #18137 Similar to the fix for web_search_tool_result (#17746, #17798), this PR preserves web_fetch_tool_result blocks in multi-turn conversations. Changes: - Add handling for web_fetch_tool_result in transformation.py (non-streaming) - Add capture of web_fetch_tool_result in handler.py (streaming) - Fix streaming tool arguments bug where empty input {} was prepended to actual arguments by using empty string instead of str({}) - Add unit tests for web_fetch_tool_result handling

…om generic OpenAI clients (#30486) * fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients When an Anthropic server-side tool (web_search, id `srvtoolu_...`) is used, its result is carried in `provider_specific_fields.web_search_results` — PRs #17746 / #17798 restore it for callers that round-trip provider_specific_fields. A generic OpenAI client that does NOT preserve provider_specific_fields (e.g. Open WebUI talking to a Vertex/Anthropic model over /chat/completions) drops it on replay and instead sends back an assistant `tool_call` + a `tool` message both keyed to the `srvtoolu_` id. The transform then produced a bare `server_tool_use` (with no following *_tool_result) plus a user `tool_result` for the same id — both invalid, so the next turn 400s: messages.N.content.0: unexpected `tool_use_id` found in `tool_result` blocks: srvtoolu_... Each `tool_result` block must have a corresponding `tool_use` block in the previous message. This is the commonly-reported vertex_ai symptom where Gemini works but Claude 400s on the 2nd turn of a web-search chat. Fix (litellm/litellm_core_utils/prompt_templates/factory.py): - convert_to_anthropic_tool_invoke: only emit a server_tool_use when its matching *_tool_result is available to pair with it; otherwise skip it (a bare server_tool_use is itself rejected). - anthropic_messages_pt: drop a replayed `tool`/`function` message whose tool_call_id starts with `srvtoolu_` (a server-executed tool produces no client result; a user tool_result for it is invalid). The existing reconstruction path (provider_specific_fields present, e.g. the litellm SDK) is unchanged, as is regular client tool_use/tool_result. Tests (tests/llm_translation/test_prompt_factory.py): - update test_convert_to_anthropic_tool_invoke_server_tool -> test_convert_to_anthropic_tool_invoke_server_tool_without_result_is_dropped - add test_anthropic_messages_pt_generic_client_drops_orphan_server_tool Follow-up to #17746 / #17798; addresses the generic-client (no provider_specific_fields) case of #17737. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(anthropic): cover the srvtoolu_ round-trip fix in the test_litellm unit suite The regression tests added in tests/llm_translation/test_prompt_factory.py aren't run by the coverage CI job (it runs tests/test_litellm), so the new factory.py branches showed as uncovered (codecov patch coverage). Add equivalent focused tests in the unit suite so both new branches are exercised there: - convert_to_anthropic_tool_invoke drops a srvtoolu_ server_tool_use when no matching *_tool_result is available. - anthropic_messages_pt drops the orphaned srvtoolu_ tool message a generic OpenAI client replays. Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(anthropic): cover the server_tool_use + result valid-pair path in unit suite Covers the remaining patch-coverage lines codecov flagged: convert_to_anthropic_tool_invoke emitting server_tool_use followed by its web_search_tool_result when the matching result is present (the litellm-SDK round-trip path). Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * style(anthropic): flatten srvtoolu_ tool-message guard to a negated if Addresses the Greptile style nit: replace the if-pass/else with a single negated `if not (...)` guard around the tool_result append. Behavior unchanged. Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@Sameerlite

* fix(proxy): allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes (#30089) * fix(proxy): allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes Add the realtime WebRTC HTTP sub-routes (/realtime/client_secrets, /realtime/calls and their /v1 + /openai/v1 variants) to LiteLLMRoutes.openai_routes so is_llm_api_route() classifies them as LLM API routes. Without this, non-admin virtual keys received 401 'Only proxy admin can be used to generate, delete, update info for new keys/users/teams' when calling these endpoints. Fixes #29923 * fix(proxy): validate session.model for realtime routes in model-access check The GA Realtime WebRTC HTTP routes resolve the effective model from the nested session.model (falling back to the top-level model), but the auth layer's get_model_from_request() only extracted the top-level model. A model-restricted virtual key could therefore place a disallowed model in session.model, leave the top-level model unset, and skip can_key_call_model() entirely - obtaining an ephemeral token for a model it is not allowed to use. Extract session.model for the realtime client_secrets/calls routes so the model-access check runs against the model the request will actually use. Legitimate callers are unaffected; their permitted model still validates. Relates to #29923 * fix(proxy): classify realtime transcription_sessions routes as LLM API routes Add the GA Realtime WebRTC transcription_sessions HTTP routes to openai_routes so is_llm_api_route() returns True for them, matching the client_secrets and calls routes already fixed. These endpoints are registered with user_api_key_auth in realtime_endpoints/endpoints.py, so without this a non-admin virtual key calling POST /v1/realtime/transcription_sessions would hit the admin-only 401 branch. Extends the regression test parametrization accordingly. --------- Co-authored-by: habonlaci <4699494+habonlaci@users.noreply.github.com> * feat(proxy): surface max_input_tokens/max_output_tokens on /v1/models (#30272) * feat(proxy): surface max_input_tokens/max_output_tokens on /v1/models * fix(proxy): degrade /v1/models gracefully when model-group lookup fails --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix: sort tiered token-cost thresholds numerically (#30375) * fix: sort tiered token-cost thresholds numerically _get_token_base_cost iterated input_cost_per_token_above_<N>_tokens keys with a lexicographic sort, so for tiers whose thresholds have different digit lengths (e.g. 90k vs 128k) a request crossing both was billed at the lower tier that sorted first. Sort by the parsed numeric threshold instead, so the highest tier the request actually crosses is applied. * refactor: reuse _parse_above_token_threshold for inline threshold parse --------- Co-authored-by: Eric (GabiDevFamily) <271972409+santino18727-debug@users.noreply.github.com> * fix(openai): preserve cache_control for openai-compatible custom endpoints (#30387) * fix(openai): preserve cache_control for openai-compatible custom endpoints * fix(openai): use parsed hostname to detect real OpenAI for cache_control preservation * fix(proxy): drain all daily-spend batches per flush cycle (#30281) (#30505) * fix(types): prevent internal parallel_request_limiter fields from leaking to upstream providers (#30545) * fix(types): add internal parallel_request_limiter fields to all_litellm_params to prevent forwarding to upstream providers * test(types): add regression test for internal rate-limit fields in all_litellm_params * fix(init): add bool type annotation to suppress_debug_info (#30531) Module-level `suppress_debug_info = False` had no annotation, so strict type checkers (e.g. ty) infer it as `Literal[False]`. Reassigning it to `True` (as done in proxy_server.py and router.py) then fails with an invalid-assignment error. Annotate it as `bool` to match every other flag in this module. * fix: coalesce null aggregates in update_metrics for no-spend keys (#29945) * feat(team_endpoints): add query parameter `key_limit` to `/team/info` endpoint (#30006) * feat(team_endpoints): Add query parameter key_limit to /team/info * feat(team_endpoints): update schema.d.ts to include the new query parameter * feat(team_endpoints): add tests for limitting key count in /team/info response * feat(team_endpoints): Apply suggestions from greptile * Set greater-than constraint on key-limit * Fix type * fix(router): release aiohttp connection when stream iteration ends abnormally (#30271) * fix(router): release aiohttp connection when stream iteration ends abnormally A streaming response that terminates with a mid-stream read timeout, a task cancellation (client disconnect), or GeneratorExit never closed the underlying aiohttp ClientResponse. aiohttp only auto-releases the connector slot at body EOF, so each abnormally terminated stream permanently leaked one slot from the shared TCPConnector pool. During a backend traffic spike the pool drains; once exhausted every subsequent request to that host waits for a slot, times out and surfaces as a 408, indefinitely, even after the backend recovers. Only a proxy restart cleared the in-memory sessions, which matched the reported symptom of a router stuck returning 408 for a healthy vLLM backend. Close the response in a finally clause when iteration ends. On a fully read response the connection was already released at EOF and close() is a no-op, so keep-alive reuse for normal requests is unchanged. Fixes #30192 * test(aiohttp): cover GeneratorExit path with a mock instead of a live socket The previous slot-release test started a real aiohttp TCP server, which can flake in offline CI and does not exercise this fix's code path directly. Replace it with a dependency-injected mock that closes the stream generator (GeneratorExit) and asserts the response is closed, covering the third abnormal-exit path the finally block handles * feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery (#30273) * feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery * refactor(proxy): move Anthropic model-list formatter into llms/anthropic/common_utils * fix(proxy): make model_list request param optional for direct callers * feat(dashscope): add Responses API support (#30286) * feat(dashscope): add Responses API support DashScope's OpenAI-compatible endpoint serves /responses, so register a DashScopeResponsesAPIConfig that routes dashscope/* responses calls to {api_base}/responses without rewriting the upstream model id, instead of falling back to the chat-completions -> responses emulation pipeline. Closes #29780 * feat(dashscope): mark responses API as not supporting native websocket Matches the hosted_vllm/perplexity/openrouter responses configs, which all override supports_native_websocket() to False since the OpenAI-compatible endpoint has no native wss:// responses transport. --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix(spend-logs): preserve error_message on ProxyException failures (#30381) * fix(spend-logs): preserve error_message on ProxyException failures `StandardLoggingPayloadSetup.get_error_information` used `str(original_exception)` to populate the human-readable error message stored in `spend_logs.metadata.error_information.error_message`. `ProxyException` (litellm/proxy/_types.py:3453) sets `self.message` in its constructor but does NOT call `super().__init__(message)` and does NOT define `__str__`. As a result, `str(ProxyException(...))` returns the empty string, and every auth/budget/quota rejection was landing in spend_logs with `error_message=""` despite a fully populated traceback. Operator impact: dashboard "LLM Failure" rows became untriageable — the only way to tell a 401 from a 429 was to manually unpack the traceback JSON via psql. Burst failure patterns (e.g. a UI session polling with a stale token) produced 20-30 indistinguishable `error_code=401` rows per second. Fix: prefer the `.message` attribute (set by ProxyException and every litellm.exceptions.* class) over `str(exc)`. The `str(exc)` fallback is retained for non-litellm exception types, preserving prior behavior. Test plan: - 2 new unit tests in tests/test_litellm/litellm_core_utils/ test_litellm_logging.py: * test_get_error_information_prefers_message_attribute_over_str * test_get_error_information_falls_back_to_str_when_no_message_attr - Existing test_get_error_information_error_code_priority still passes - End-to-end verified: bad-key 401 now stores full "Authentication Error, Invalid proxy server token passed..." message in spend_logs.metadata.error_information.error_message * fix(spend-logs): preserve explicit empty .message + drop dead reference Greptile P2 on #30381. The truthiness check `if message_attr:` silently skipped an explicit empty-string `.message` and fell through to `str(original_exception)`. For ProxyException-shaped objects both produce empty, so the bug was latent; for other exception types it would inject a different string into error_information.error_message and corrupt the signal. Use `is not None` so an empty string survives verbatim. Also drop the stale `See e2e/cases/11.` comment reference — that path does not exist anywhere in the repo and confuses future readers. Regression test added: an exception with `.message=""` and a non-empty `super().__init__()` arg must yield error_message == "". * ci: retrigger workflows after base branch change to litellm_internal_staging * fix(anthropic): strip LiteLLM-injected total_tokens from /v1/messages response (#30382) * fix(anthropic): strip LiteLLM-injected total_tokens from /v1/messages response The non-streaming /v1/messages response carries a LiteLLM-injected usage.total_tokens = input_tokens + output_tokens that is not part of the Anthropic API spec. This caused three problems: 1. Shape divergence with streaming on the same endpoint. message_delta.usage in the SSE path never carries total_tokens. Clients parsing both paths get two different schemas from one endpoint. 2. Shape divergence with upstream. Direct calls to https://api.anthropic.com/v1/messages return no total_tokens field, so clients using the official Anthropic SDK couldn't rely on it, and clients that did rely on the LiteLLM-injected one broke when bypassing the proxy. 3. Numerical misuse. total = input + output undercounts when cache_read_input_tokens and cache_creation_input_tokens are non-zero, because cache tokens are reported in their own fields. A 100k-token cached prompt with 1 non-cache input token + 200 output tokens reports total_tokens = 201, off by ~99.8% from any reasonable definition of "total." Fix: add _strip_total_tokens_from_anthropic_response in litellm/proxy/anthropic_endpoints/endpoints.py and invoke it in the success path of anthropic_response right before returning. Only mutates dict-shaped responses; streaming (which already lacks the field) is left untouched. spend_logs / Prometheus continue to compute total_tokens internally for billing — this fix only strips the field from the wire response. Scope: only the Anthropic passthrough endpoint /v1/messages. The OpenAI-shape /v1/chat/completions is unaffected. * fix(anthropic): gate total_tokens strip behind flag + handle Pydantic .usage Two P1 greptile threads on #30382: P1 — **Backwards-incompatible removal without a feature flag** Stripping `usage.total_tokens` unconditionally breaks any client currently reading the LiteLLM-shaped non-streaming /v1/messages response. Per the codebase's policy (mirrors #30418), gate behind a new flag. - `litellm.strip_anthropic_total_tokens: bool = False` (default — backward-compat: clients keep seeing total_tokens). - Env override: `LITELLM_STRIP_ANTHROPIC_TOTAL_TOKENS=true`. - Docstring: planned to flip to True in a future major release; opt in early. P1 — **Silent no-op if `result` is a Pydantic model** `base_process_llm_request` may return a Pydantic-style object whose `.usage` is a plain dict (the most common shape — e.g. objects wrapping raw upstream JSON). The original `isinstance(response, dict)` guard skipped strip on those, so `total_tokens` would still hit the wire. Helper now also reads `getattr(response, "usage", None)` and strips when that's a dict. Strongly-typed Pydantic `Usage` sub-models with required `total_tokens` fields are still skipped — those impose type constraints the helper doesn't try to subvert. Tests: - `test_strips_total_tokens_on_pydantic_model_with_dict_usage` - `test_flag_defaults_off` 8/8 pass locally. * fix(anthropic): drop env var for strip flag (docs CI) Mirrors #30418's pattern (`expose_router_debug_in_errors: bool = True`, no `os.getenv`). The `LITELLM_STRIP_ANTHROPIC_TOTAL_TOKENS` env var introduced in the prior commit was flagged by `tests/documentation_tests/test_env_keys.py` because the documentation file `docs/my-website/docs/proxy/config_settings.md` lives in `BerriAI/litellm-docs` (separate repo) and registering a new env key requires a parallel docs PR — a friction we avoid here by exposing the flag only as a Python attribute + `litellm_settings` config key, both of which load through the existing proxy config plumbing without needing the env-var registry to be updated. No semantic change: default still False, behavior identical when set via `litellm.strip_anthropic_total_tokens = True` or `litellm_settings.strip_anthropic_total_tokens: true` in config.yaml. Verified locally: env scan no longer surfaces the key; 8/8 tests pass. * ci: retrigger workflows after base branch change to litellm_internal_staging * fix(pricing): correct swapped input/output token costs for command-r7b-12-2024 (#30413) * fix(pricing): correct swapped input/output token costs for command-r7b-12-2024 * test: resolve model prices JSON relative to test file for pip installs * fix(exception-mapping): map Gemini upstream-error body code 429 to RateLimitError (#30417) * fix(exception-mapping): map Gemini upstream-error body code 429 to RateLimitError Some Gemini-compatible gateways (e.g. new-api) wrap a 429 rate-limit signal from upstream inside an HTTP 500/503 envelope, with the real code only surfaced in the JSON body: {"error":{"message":"...high demand...","type":"upstream_error", "param":"","code":429}} Previously LiteLLM only looked at the HTTP status and mapped this to InternalServerError, which Router treats as non-retryable for many configs — so users got hard 500s instead of fallback/retry. Now the Gemini/Vertex exception mapper parses error.code from the body and routes code 429 to RateLimitError before falling through to the HTTP-status branches. Other body codes fall through unchanged. Tests cover: - new-api gateway's `code:429` payload now maps to RateLimitError - Genuine 500-body responses stay InternalServerError - Non-JSON body strings fall through to status-code mapping unchanged * fix(exception-mapping): scope body-code 429 promotion to 5xx envelopes Addresses greptile P1/P2 + @Sameerlite's review on #30417. The new elif branch was firing for any HTTP status, so a gateway response of HTTP 400 with body {"error":{"code":429,...}} would be incorrectly promoted to RateLimitError (retryable) instead of falling through to BadRequestError. Same trap for 401 -> AuthenticationError. Scoped the body-code 429 check to `500 <= status_code < 600` — covers 500/502/503/504 (gateways wrapping upstream 429 in any 5xx envelope) without inviting the 4xx misclassification. Tests: parametrized table now covers 5xx (500/502/503), 4xx (400/401), and the existing fall-through cases, asserting each maps to the exception type that matches the HTTP status code. 50/50 pass locally. * ci: retrigger workflows after base branch change to litellm_internal_staging * feat(router): add expose_router_debug_in_errors flag (default True) to redact internal model_group/fallback names (#30418) * feat(router)!: redact internal model_group/fallback names from exception messages The Router was unconditionally appending internal config names onto exception.message: - "Received Model Group=..." - "Available Model Group Fallbacks=..." - "No fallback model group found... Fallbacks={...}" - "context_window_fallbacks={...}" - Deployment-timeout messages including model_group - Fallback failure detail listing fallback chain ProxyException forwards .message verbatim to clients, so gateways were leaking their model_name / fallback wiring in every failed call. Fix: gate all five mutation sites on a new `litellm.expose_router_debug_in_errors` flag (default False). Set to True to restore upstream debug behavior for local debugging. Why: matches the redaction posture this codebase already has for upstream model identifiers (cf. _litellm_returned_model_name) and removes the last common error-path leak of internal model_group names. Breaking change marker (!): if anything parses "Received Model Group=" out of client error messages, flip the flag on or migrate to the x-litellm-* response headers instead. Tests: 7 cases covering each of the 5 redaction sites + the flag-on inverse path, plus a "default off" sanity check. * test(router): cover sites 1 + 3 of expose_router_debug_in_errors gate Addresses Greptile / codecov feedback on #30418: patch coverage was 55.6% with 4 lines uncovered in litellm/router.py. The existing tests exercised sites 2 (ContextWindowExceededError), 4 (no-fallback-found), and 5 (Received Model Group) — both default and flag-on. Sites 1 and 3 were declared in the PR description as covered by "site 5 also fires" but the gate body lines for each (the `e.message +=` inside the `if litellm.expose_router_debug_in_errors:` branch) only execute when the flag is on AND the specific exception path is taken, which neither existing test triggered. Added 4 new tests (default + flag-on × 2 sites): - test_default_does_not_leak_deployment_timeout_debug - test_flag_on_leaks_deployment_timeout_debug - test_default_does_not_leak_content_policy_fallback_hint - test_flag_on_leaks_content_policy_fallback_hint Trigger details: - Site 1 (litellm.Timeout in _acompletion) is reached via the Router-supported `mock_timeout=True` + `timeout=0.001` kwargs on `acompletion(...)`. Cannot embed a Timeout instance in model_list because Router.__init__ deep-copies it and Timeout.__reduce__ does not preserve the required positional args. - Site 3 (ContentPolicyViolationError without content_policy_fallbacks set, in async_function_with_fallbacks_common_utils) is reached by passing a `mock_response=litellm.ContentPolicyViolationError(...)` instance via the call-site kwarg — same deepcopy-avoidance reason. 11/11 tests pass locally. Patch coverage on litellm/router.py for this PR's diff should now be 100%. * chore(router): flip expose_router_debug_in_errors default to True Addresses @Sameerlite's review on #30418 — maintain backward compat on the wire. Redact becomes opt-in via setting the flag to False; the historical behavior (leak internal model_group / fallback wiring through exception messages) is preserved as the default. - litellm/__init__.py: default flipped to True, docstring rewritten with deprecation note pointing at a future flip to False (redact by default) in a major release. - tests/test_litellm/test_router_exception_redaction.py: fixture resets to True (was False); the "off" tests now explicitly set False; the "default_leaks_*" tests rely on the fixture default. test_flag_defaults_off -> test_flag_defaults_on. - No router.py change needed; the gate keys off the same flag, only the default changes. - PR title no longer needs the breaking-change `!` marker — no client sees a behavior change at default settings. 11/11 pass locally. * ci: retrigger workflows after base branch change to litellm_internal_staging * feat(guardrails): integrate Repelloai Argus guardrail (#30465) * feat(guardrails): add RepelloAI Argus guardrail integration (#1) * feat(guardrails): add RepelloAI Argus guardrail integration Add a new guardrail hook backed by RepelloAI Argus, with dashboard-managed asset policies enforced via an asset_id and X-API-Key auth. * fix(guardrails): harden RepelloAI Argus guardrail - scan streaming responses on output (was bypassing the guardrail) - log blocked verdicts as guardrail_intervened instead of success - treat auth/config errors (401/403/404/422) as misconfiguration that always blocks, not a fail-open-able unreachable error - default unreachable_fallback to fail_closed and read it directly; block on unknown/malformed verdicts so an API change can't silently disable enforcement - type unreachable_fallback as a Literal, drop the duplicate config model, expose unreachable_fallback in the config schema, and stop leaking the raw provider response / exception strings to the client * fix(guardrails): address RepelloAI Argus review feedback - support ARGUS_API_KEY (with REPELLOAI_API_KEY fallback) - make asset_id required in the config model - normalize unreachable_fallback so only fail_open opens; block on 400 misconfig - correct the shared unreachable_fallback field description * docs(guardrails): add RepelloAI Argus docs page and dashboard listing - add docs page covering config, env vars, modes, verdicts, failure semantics - list RepelloAI Argus in the Guardrail Garden with provider/logo mappings - add a regression test for the provider logo and display-name resolution * fix(guardrails): keep RepelloAI asset_id optional in config model A required asset_id leaked onto the shared LitellmParams (which inherits RepelloAIGuardrailConfigModel), breaking validation for every other guardrail. Keep it optional like sibling models; the guardrail __init__ still raises when asset_id is missing, which is the real enforcement. * Add comment for last user turn scanning * feat(guardrails): harden repelloai scanning * feat(guardrails): expand repelloai scanning to include tool definitions Add extraction of tool definitions and tool call arguments to the RepelloAI guardrail scanning. Improves detection coverage by including function schemas and parameters in the prompt sent to the guardrail service. Also captures detailed error responses in logs and adds guardrail header to streaming responses. * refactor(guardrails): fix and harden repelloai schema text extraction - Fix duplicate text in _iter_schema_text: previously all dict values were re-queued onto the stack even after scalar/list keys were already extracted explicitly, causing names/descriptions to appear twice in the scanned prompt - Extract schema key frozensets to module-level constants so they are not reconstructed on every call - Change _iter_schema_text from @classmethod to @staticmethod (cls unused) - Narrow _call_analyze stage param from str to Literal["prompt", "response"] - Add HttpxResponse type annotation to _raise_for_config_error - Add LLMResponseTypes annotation to async_post_call_success_hook response param * fix(guardrails): resolve pyright type errors in repelloai guardrail - Narrow async_handler.post return from Response|None to Response with explicit None guard before calling raise_for_status/json - Fix list comprehension returning str|None by switching to explicit loop with isinstance guard so pyright tracks the narrowing - Cast model_dump() result to Dict since hasattr does not narrow object type in pyright * fix(guardrails/repello): include Responses API instructions field in prompt scan The /v1/responses top-level `instructions` field was not included in _extract_prompt_text, allowing a caller to bypass guardrail policy checks by putting blocked content in `instructions` while keeping `input` benign. * feat: add api_key to config model and read prompt from data dict * fix(guardrails/repello): plug input_text and tool-call response bypass gaps Responses API input content parts with type 'input_text' were silently dropped by build_inspection_messages (which only handles type='text'), allowing callers to send blocked content via that path without triggering the pre-call scan. Fix: add _extract_input_text_parts to RepelloAIGuardrail and call it when walking the Responses API input messages. Post-call scanning skipped responses whose choices contained only tool_calls or function_call (message.content=None), letting models put blocked output in function arguments undetected. Fix: _extract_chat_completion_text now calls _extract_tool_call_args_from_message on each choice message. Also replace typing.Dict/List with builtin dict/list to clear TID251 strict ruff violations introduced by this file. * fix(guardrails/repello): scan Responses API function_call output arguments Output items with type 'function_call' in a /v1/responses response were skipped by _extract_responses_api_text; only 'message' items were walked. A model could return blocked content in function_call.arguments undetected. Now extract arguments from function_call output items before scanning. * fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients (#30486) * fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients When an Anthropic server-side tool (web_search, id `srvtoolu_...`) is used, its result is carried in `provider_specific_fields.web_search_results` — PRs #17746 / #17798 restore it for callers that round-trip provider_specific_fields. A generic OpenAI client that does NOT preserve provider_specific_fields (e.g. Open WebUI talking to a Vertex/Anthropic model over /chat/completions) drops it on replay and instead sends back an assistant `tool_call` + a `tool` message both keyed to the `srvtoolu_` id. The transform then produced a bare `server_tool_use` (with no following *_tool_result) plus a user `tool_result` for the same id — both invalid, so the next turn 400s: messages.N.content.0: unexpected `tool_use_id` found in `tool_result` blocks: srvtoolu_... Each `tool_result` block must have a corresponding `tool_use` block in the previous message. This is the commonly-reported vertex_ai symptom where Gemini works but Claude 400s on the 2nd turn of a web-search chat. Fix (litellm/litellm_core_utils/prompt_templates/factory.py): - convert_to_anthropic_tool_invoke: only emit a server_tool_use when its matching *_tool_result is available to pair with it; otherwise skip it (a bare server_tool_use is itself rejected). - anthropic_messages_pt: drop a replayed `tool`/`function` message whose tool_call_id starts with `srvtoolu_` (a server-executed tool produces no client result; a user tool_result for it is invalid). The existing reconstruction path (provider_specific_fields present, e.g. the litellm SDK) is unchanged, as is regular client tool_use/tool_result. Tests (tests/llm_translation/test_prompt_factory.py): - update test_convert_to_anthropic_tool_invoke_server_tool -> test_convert_to_anthropic_tool_invoke_server_tool_without_result_is_dropped - add test_anthropic_messages_pt_generic_client_drops_orphan_server_tool Follow-up to #17746 / #17798; addresses the generic-client (no provider_specific_fields) case of #17737. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(anthropic): cover the srvtoolu_ round-trip fix in the test_litellm unit suite The regression tests added in tests/llm_translation/test_prompt_factory.py aren't run by the coverage CI job (it runs tests/test_litellm), so the new factory.py branches showed as uncovered (codecov patch coverage). Add equivalent focused tests in the unit suite so both new branches are exercised there: - convert_to_anthropic_tool_invoke drops a srvtoolu_ server_tool_use when no matching *_tool_result is available. - anthropic_messages_pt drops the orphaned srvtoolu_ tool message a generic OpenAI client replays. Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(anthropic): cover the server_tool_use + result valid-pair path in unit suite Covers the remaining patch-coverage lines codecov flagged: convert_to_anthropic_tool_invoke emitting server_tool_use followed by its web_search_tool_result when the matching result is present (the litellm-SDK round-trip path). Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * style(anthropic): flatten srvtoolu_ tool-message guard to a negated if Addresses the Greptile style nit: replace the if-pass/else with a single negated `if not (...)` guard around the tool_result append. Behavior unchanged. Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(proxy): require premium only when enabling premium metadata fields (#30285) (#30506) Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix(perplexity): stop double-billing reasoning tokens in manual cost fallback (#30488) * fix(perplexity): stop double-billing reasoning tokens in manual cost fallback When perplexity_cost_per_token cannot use the API-provided usage.cost.total_cost short-circuit and falls back to manual calculation, it multiplies the full usage.completion_tokens by output_cost_per_token and then adds reasoning_tokens * output_cost_per_reasoning_token on top. Per the OpenAI/Perplexity usage convention codified for the central path in PR #18607, completion_tokens already INCLUDES reasoning_tokens, so the manual fallback double-bills reasoning at both the output and reasoning rate. Concrete impact on perplexity/sonar-deep-research (input 2e-6, output 8e-6, reasoning 3e-6): for the exact usage shape exercised by the live response fixture in tests/llm_translation/test_perplexity_reasoning.py (prompt_tokens=9, completion_tokens=20, reasoning_tokens=15) the current code charges 0.000223 vs the convention-correct 0.000103, a 2.165x overcharge. The bug is reachable whenever Perplexity omits the cost object (streaming chunks, fixture-driven paths, older API versions). Subtracts reasoning_tokens (clamped at zero) from completion_tokens before applying the output rate, mirroring how dashscope/cost_calculator.py and the central generic_cost_per_token already handle it. Preserves the existing fallback behaviour when output_cost_per_reasoning_token is unset (all completion_tokens stay at the output rate). Existing tests in tests/test_litellm/llms/perplexity/test_perplexity_cost_calculator.py asserted the buggy math and are updated to the convention-correct math. Adds a focused regression test using the exact usage shape from the live response fixture so this class of bug cannot be silently reintroduced. * style(perplexity): drop redundant type annotation on else branch to satisfy mypy mypy [no-redef] flagged 'completion_cost' as declared in both if and else arms; keeping the annotation only on the first declaration matches existing patterns in this file. * fix(perplexity): update integration test expected costs for non-double-billed math Three tests in test_perplexity_integration.py asserted the old buggy expectation that reasoning_tokens are billed in addition to the full completion_tokens count. After the fix in cost_per_token, reasoning_tokens are billed at the reasoning rate and the remaining (completion_tokens - reasoning_tokens) at the standard output rate, matching OpenAI/Perplexity convention (PR #18607). Updates: test_end_to_end_cost_calculation_with_transformation, test_main_cost_calculator_integration, test_high_volume_cost_calculation. The high-volume sanity threshold drops to 0.25 to reflect the corrected total. * fix(ui): use dynamic proxy base URL in MCP usage examples (#30487) Replace hardcoded http://localhost:4000 with getProxyBaseUrl() in the MCP server usage example and copy-to-clipboard snippet so the generated configuration works for non-local deployments. Fixes #30466 * feat: add missing UK PII entity types to Presidio guardrail (#30537) * feat: add missing UK PII entity types to Presidio guardrail Add UK_PASSPORT, UK_POSTCODE, and UK_VEHICLE_REGISTRATION to PiiEntityType enum and PII_ENTITY_CATEGORIES_MAP. These entity types are supported by Microsoft Presidio but were missing from litellm's type definitions, preventing users from configuring UK-specific PII detection. * test: remove fragile hardcoded entity count test Remove test_uk_category_entity_count which hardcodes len() == 5. The test_uk_entities_match_presidio_recognizers test already verifies exact set equality, making the count test redundant and fragile to future Presidio additions. * style: apply Black formatting to match CI requirements * fix: route volcengine (Doubao) tiered-pricing models to the tiered cost handler (#30357) Volcengine (Doubao) models define `tiered_pricing` but no flat per-token cost, so cost_per_token fell through to generic_cost_per_token (which only reads flat costs) and tracked them at $0 Route custom_llm_provider == "volcengine" to the shared tiered-pricing handler in litellm/llms/dashscope/cost_calculator.py, which already computes graduated tier costs. Make that handler provider-agnostic by adding a custom_llm_provider argument (default "dashscope" preserves existing behavior) so get_model_info resolves the correct model map entry Fixes #30346 * feat(mcp): make MCP gateway name and description configurable via env vars (#30473) * feat(mcp): make MCP gateway name and description configurable via env vars * Rename function _restore_env to _apply_env * docs(mcp): document import-time capture of env-backed identity constants Address Greptile review feedback: clarify that LITELLM_MCP_SERVER_NAME and LITELLM_MCP_SERVER_DESCRIPTION are read once at import and require a module reload to observe env changes after import. Generated with AI assistance Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Yevhen Luhovtsov <yevhen.luhovtsov@intapp.com> Co-authored-by: Claude <noreply@anthropic.com> * fix(mcp): preserve native tools in semantic filter hook (#26650) * fix(mcp): preserve native tools in semantic filter hook The SemanticToolFilterHook.async_pre_call_hook passed ALL tools (MCP + native) to filter_tools(), which only knows MCP-registered tool names. Native tools silently failed the name match in _get_tools_by_names() and were dropped from the request. Fix: partition tools into native and MCP-registered before filtering. Run the semantic filter only on MCP tools, then merge native tools back unconditionally. Changes: - Robust _is_mcp_tool() using shape-based detection for OpenAI-format dicts, safe regardless of future _extract_tool_info changes - Single-pass partition loop (no double _is_mcp_tool calls) - Preserve native tools in MCP expansion path (mixed requests) - Track MCP expansion to prevent expanded tools bypassing filtering - filter_stats reports MCP-only counts for accurate metrics - Extracted _emit_filter_metadata() helper - Skip spurious filter headers for all-native tool requests Closes #26212 * remove stale docstring note referencing tools_expanded_from_mcp * fix: handle Responses API name collision and preserve tool ordering - Classify Responses API tools ({type: 'function', name: '...'}) as native to prevent name collisions with MCP canonical names - Preserve original request tool ordering using id()-based merge instead of naive native+mcp concatenation - Add 2 regression tests: name collision and ordering preservation * style: apply black formatting * fix(mcp): harden semantic filter — preserve all native tool formats, safe metadata access, graceful expansion failure, name-based merge * lint: suppress PLR0915 on async_pre_call_hook (matches codebase convention) * ci: retrigger checks after rebase onto litellm_internal_staging * feat(fireworks): sync Fireworks AI model registry with current platform catalog (#30616) Adds 12 new Fireworks serverless models and updates 3 existing entries in model_prices_and_context_window.json and its bundled backup to match the current Fireworks platform model list. New direct models: glm-5p2, qwen3p7-plus, minimax-m3, minimax-m2p7, kimi-k2p7-code, kimi-k2p6, deepseek-v4-pro, deepseek-v4-flash. New router endpoints: glm-5p1-fast, kimi-k2p6-fast, kimi-k2p7-code-fast. Updated: glm-5p1, gpt-oss-120b, and gpt-oss-20b now carry correct output token caps, cache-read pricing, and explicit capability flags max_tokens is set equal to max_output_tokens (not the full context window) for models whose generation cap is below their context window. This avoids the shared input+output budget path in get_modified_max_tokens, which would otherwise let callers request output sizes the model cannot produce. The same fix corrects the pre-existing glm-5p1, gpt-oss-120b, and gpt-oss-20b entries that had max_tokens equal to the full context window Short-form aliases (fireworks_ai/<model>) are added for every direct accounts/fireworks/models/ entry so cost attribution works for callers using bare model names. Router endpoints get short-form aliases too, and transform_request now routes bare names ending in -fast to the accounts/fireworks/routers/ path instead of defaulting every bare name to models/. This keeps the kimi-k2p6-fast router from being misrouted to the nonexistent models/kimi-k2p6-fast endpoint kimi-k2p6-turbo is intentionally excluded; kimi-k2p6-fast is its replacement. Context windows for deepseek-v4 and kimi models use the power-of-two values (1048576 and 262144) published on the Fireworks model pages, matching the convention already used by existing entries Two regression tests in test_utils.py assert the exact per-token costs, token limits, capability flags, and short-form-to-long-form equality for all 15 models against both the main and backup cost maps. Two routing tests in test_fireworks_ai_chat_transformation.py verify bare -fast names route to routers/ and bare direct-model names route to models/ * fix(bedrock): handle role:"system" inside the messages array on /v1/messages (#29698) (#30443) * feat(anthropic): hoist leading in-array system to top-level (helper) * test(anthropic): cover _system_content_to_blocks edge cases; deepcopy cache_control * test(anthropic): mid-conversation system normalization cases * feat: add supports_mid_conversation_system flag to Claude Opus 4.8 Add supports_mid_conversation_system: true to all 9 claude-opus-4-8 cost-map entries (Anthropic-native, Bedrock, Vertex, Azure AI) in both the root cost map and the bundled package backup, since the runtime helper and tests read the backup in local/offline mode. Pin the mid-system passthrough regression test to the local cost map via the existing local_model_cost_map fixture so it reads the branch-local flag rather than the network-fetched main copy. * fix(bedrock): normalize in-array system in /v1/messages handler (#29698) Wire normalize_system_messages_for_anthropic into anthropic_messages_handler so all Bedrock /v1/messages paths (Invoke / Mantle / ClaudePlatform / Converse-bridge) hoist leading in-array system entries (and demote mid-conversation ones on models lacking supports_mid_conversation_system) into the top-level system field. The normalized messages/system are written back into the local_vars snapshot the base_llm branch reads from, otherwise the Invoke/Mantle fix would silently no-op. Also fix the helper to resolve supports_mid_conversation_system through the prefix-aware AnthropicModelInfo._supports_model_capability resolver. The raw _supports_factory could not see the flag once get_llm_provider left the invoke/ prefix on the model id, which would have wrongly demoted mid-conversation system on a Bedrock invoke opus-4-8 path. * fix(bedrock): resolve mid-conversation-system flag through mantle/invoke/converse route prefixes; drop unused param * fix(types): widen system param to Union[str, List] for hoisted system blocks * refactor(bedrock): drop dead local_vars messages writeback * fix(bedrock/converse): translate in-array system in anthropic->openai adapter (#29698) * fix(bedrock/converse): preserve cache_control on in-array system; test drop-empty * fix(bedrock/converse): rename colliding local to satisfy mypy; test handler system-merge branches * fix(types): register supports_mid_conversation_system in model-info schema The cost-map JSON-schema validation test (test_aaamodel_prices_and_context_window_json_is_valid) rejects unknown properties, so adding supports_mid_conversation_system to the opus-4-8 cost-map entries failed CI with 'Additional properties are not allowed'. Register the flag in the INTENDED_SCHEMA allow-list and in the ProviderSpecificModelInfo TypedDict so it is a typed, first-class capability flag alongside its peers (supports_output_config, etc.). --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix(bedrock/agentcore): optionally forward multimodal content blocks in InvokeAgentRuntime payload (#28885) * fix(bedrock/agentcore): optionally forward multimodal content blocks in InvokeAgentRuntime payload By default the agentcore provider flattens the last message to a text-only {"prompt": "..."} payload via convert_content_list_to_str, silently dropping OpenAI multimodal blocks (image_url, file, input_audio, ...). This adds an opt-in `forward_multimodal_content` litellm param. When truthy and the last message's content is a list containing a non-text block, the original OpenAI content list is forwarded verbatim under a new "content" field so an attachment-aware AgentCore agent can read it. Default off keeps the payload byte-identical to the legacy {"prompt": "..."} shape — existing agents are unaffected. The flag is read from optional_params (where other AgentCore params land) with a litellm_params fallback, and accepts a bool or a config/env string ('true', '1', ...). AgentCore Runtime is schemaless on the agent side — the agent's @app.entrypoint parses arbitrary JSON up to 100 MB (per https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-invoke-agent.html), so this is a purely upstream change; no AgentCore-side schema is asserted. * fix(bedrock/agentcore): shallow-copy forwarded multimodal content list Address review feedback (Sameerlite): payload["content"] = last_content aliased the caller's mutable messages[-1]["content"] list. Harmless today because the payload is JSON-serialized immediately, but a latent footgun if a future caller mutates the returned payload before serialization. Forward list(last_content) so the payload owns its own list. Block dicts stay shared on purpose — a deep copy would clone potentially large base64 media on the request hot path, and the flagged risk was the shared list, not the blocks. Update the passthrough tests to assert equality + distinct identity, and add a regression test that mutating the payload list can't leak back into the original message content. * Revert "fix(mcp): preserve native tools in semantic filter hook (#26650)" This reverts commit 438c825. * Revert "feat(guardrails): integrate Repelloai Argus guardrail (#30465)" This reverts commit 54da785. * Revert "feat(dashscope): add Responses API support (#30286)" This reverts commit 6766256. * Revert "fix(bedrock): handle role:"system" inside the messages array on /v1/messages (#29698) (#30443)" This reverts commit b8a8083. * Revert "fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients (#30486)" This reverts commit 6e9c0b0. * Revert "fix: route volcengine (Doubao) tiered-pricing models to the tiered cost handler (#30357)" This reverts commit 172e302. * Revert "feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery (#30273)" This reverts commit 4e31885. * fix: pass key_limit=None in team_member_update and patch model_cost in pricing test team_member_update called team_info without key_limit, so the fastapi.Query default object (not None) was passed through to get_data, which failed when serializing it. Pass key_limit=None explicitly to avoid this. test_get_model_info_costs patched litellm.model_cost from the local backup so the assertion holds before the PR is merged and the remote main URL is updated. * fix(security): validate resolved model in /realtime/client_secrets for non-transcription sessions (#30710) Omitting both model and session.model caused the endpoint to default to gpt-4o-realtime-preview without running can_key_call_resolved_model, so any key could access that model regardless of its allowed-model list. The transcription path already called can_key_call_resolved_model; this adds the same call for the realtime path before returning. * fix(lint): fix F821 undefined model_info and F841 unused metadata in create_model_info_response * fix: black formatting and stub get_model_group_info in third team translation test * fix: reformat utils.py with black 26.3.1 to match CI * fix: replace Optional[X] with X | None to satisfy UP045 ruff strict gate --------- Co-authored-by: Habon Laszlo <habonlaci@users.noreply.github.com> Co-authored-by: habonlaci <4699494+habonlaci@users.noreply.github.com> Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com> Co-authored-by: santino18727-debug <santino18727@gmail.com> Co-authored-by: Eric (GabiDevFamily) <271972409+santino18727-debug@users.noreply.github.com> Co-authored-by: Nitish Agarwal <1592163+nitishagar@users.noreply.github.com> Co-authored-by: jho1-godaddy <171078705+jho1-godaddy@users.noreply.github.com> Co-authored-by: 安妮的心动录 <74543653+anneheartrecord@users.noreply.github.com> Co-authored-by: Harshith Gujjeti <153299927+Harshxth@users.noreply.github.com> Co-authored-by: Tomoya Tabuchi <t@tomoyat1.com> Co-authored-by: Vedant Agarwal <43557509+Vedant-Agarwal@users.noreply.github.com> Co-authored-by: Prathamesh Jadhav <55660103+lollinng@users.noreply.github.com> Co-authored-by: songkuan-zheng <252822057+songkuan-zheng@users.noreply.github.com> Co-authored-by: Kropiunig <48442031+Kropiunig@users.noreply.github.com> Co-authored-by: Lavish Bansal <lavish.bansal619@gmail.com> Co-authored-by: Shane Emmons <27679+semmons99@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Anuj ojha <ojhaanuj224@gmail.com> Co-authored-by: Nahrin <nahrin@nahrinoda.com> Co-authored-by: Nbouyaa <67773915+FadelT@users.noreply.github.com> Co-authored-by: Vineeth Sai <vineethsai4444@gmail.com> Co-authored-by: Eugene Lugovtsov <34510252+EugeneLugovtsov@users.noreply.github.com> Co-authored-by: Yevhen Luhovtsov <yevhen.luhovtsov@intapp.com> Co-authored-by: Ayush Shekhar <106994833+ayushh0110@users.noreply.github.com> Co-authored-by: Ahmad Shahzad <107808273+shzdehmd@users.noreply.github.com> Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com> Co-authored-by: Jón Levy <levy@apro.is>

vercel Bot deployed to Preview December 10, 2025 01:36 View deployment

ghost merged commit 01dec55 into BerriAI:main Dec 10, 2025
4 of 7 checks passed

KeremTurgutlu mentioned this pull request Dec 10, 2025

upgrade litellm>1.80.5 and fixes AnswerDotAI/lisette#66

Closed

Chesars deleted the fix/anthropic-server-tool-use-multi-turn branch December 10, 2025 23:53

KeremTurgutlu mentioned this pull request Dec 15, 2025

update litellm 1.80.10 AnswerDotAI/lisette#69

Merged

jhylands mentioned this pull request Dec 17, 2025

[Bug]: web_fetch context breaks multi-turn conversation #18137

Closed

Chesars mentioned this pull request Dec 17, 2025

fix(anthropic): preserve web_fetch_tool_result in multi-turn conversations #18142

Merged

3 tasks

semmons99 mentioned this pull request Jun 15, 2026

fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients #30486

Merged

4 tasks

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(anthropic): preserve server_tool_use and web_search_tool_result in multi-turn conversations#17746

fix(anthropic): preserve server_tool_use and web_search_tool_result in multi-turn conversations#17746
1 commit merged into
BerriAI:mainfrom
Chesars:fix/anthropic-server-tool-use-multi-turn

Chesars commented Dec 10, 2025 •

edited

Loading

Uh oh!

vercel Bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

ghost commented Dec 10, 2025

Uh oh!

KeremTurgutlu commented Dec 10, 2025

Uh oh!

KeremTurgutlu commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Chesars commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Type

Changes

Bug 1: web_search_tool_result is dropped

Example request

Bug 2: server_tool_use reconstructed as tool_use

Files changed

Uh oh!

vercel Bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ghost commented Dec 10, 2025

Uh oh!

KeremTurgutlu commented Dec 10, 2025

Uh oh!

KeremTurgutlu commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chesars commented Dec 10, 2025 •

edited

Loading

Bug 1: `web_search_tool_result` is dropped

Bug 2: `server_tool_use` reconstructed as `tool_use`

vercel Bot commented Dec 10, 2025 •

edited

Loading