Litellm websocket improvements by Sameerlite · Pull Request #29563 · BerriAI/litellm

Sameerlite · 2026-06-03T06:00:26Z

Relevant issues

## Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have added meaningful tests
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible; it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix

Changes

Note

Medium Risk
Touches proxy auth (model allowlist after first frame), WebSocket request handling, and spend/router accounting; behavior changes for clients without ?model= but is covered by new tests.

Overview
Aligns the proxy Responses API WebSocket flow with OpenAI-style clients that omit ?model= and send the model on the first response.create frame (e.g. Codex).

Model resolution and first frame: ?model= is now optional. When it is missing, the server reads the first frame (30s timeout), validates JSON and type: response.create, extracts model from flat or nested response payloads, and returns structured invalid_request_error frames instead of silently dropping bad first messages.

Auth and plumbing: After the model is known from that frame, _enforce_responses_ws_first_frame_model_auth runs key allowlist and centralized checks. The consumed frame is passed as first_message through the HTTP handler, native proxy streaming, and managed WS handler so it is forwarded/processed once and not read again from the socket.

Managed WS routing: custom_llm_provider is only forced when the per-event model resolves to the same provider as the connection model (via _same_provider), so cross-provider overrides in a frame are not pinned to the wrong backend.

Spend / router: Session wrappers _aresponses_websocket and _arealtime are skipped in proxy cost tracking, router TPM/RPM updates, and budget limiter when there is no standard_logging_object, since per-turn costs are logged on inner calls.

Router alias fix: _ageneric_api_call_with_fallbacks_helper sets model to the deployment’s real model name so router aliases do not overwrite the resolved deployment model in kwargs.

^{Reviewed by Cursor Bugbot for commit 348853e. Bugbot is set up for automated code reviews on this repo. Configure here.}

@client

The @client decorator on _aresponses_websocket fires async_success_handler with result=None after the session ends. This triggered cost tracking errors because standard_logging_object is never built for None results. Per-turn costs are correctly tracked by individual litellm.aresponses calls inside the session. The outer session-level logging obj should not attempt cost tracking. Fix: skip _aresponses_websocket and _arealtime call types in deployment_callback_on_success, RouterBudgetLimiting.async_log_success_event, and _PROXY_track_cost_callback.

codecov · 2026-06-03T06:03:34Z

Codecov Report

❌ Patch coverage is 89.13043% with 10 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/proxy/response_api_endpoints/endpoints.py	91.37%	5 Missing ⚠️
litellm/responses/streaming_iterator.py	86.20%	4 Missing ⚠️
litellm/router.py	50.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-06-03T06:06:00Z

Greptile Summary

This PR aligns the Responses API WebSocket endpoint with OpenAI-style clients (e.g. Codex) that omit ?model= at connection time and send the model inside the first response.create frame. user_api_key_auth_websocket already runs full auth for URL-supplied models; a new _enforce_responses_ws_first_frame_model_auth call mirrors those checks for first-frame models, keeping coverage symmetric.

?model= is now optional; _read_ws_model_from_first_frame reads, validates, and extracts the model with a 30-second timeout and structured invalid_request_error frames on failure.
_same_provider gates custom_llm_provider injection per event — improves same-provider variant routing while leaving cross-provider overrides to litellm's own resolution.
WS session wrappers (_aresponses_websocket, _arealtime) are skipped in all three cost-tracking hooks to avoid duplicate spend accounting; per-turn costs are recorded on inner calls.
_ageneric_api_call_with_fallbacks_helper now places \"model\": model_name last in response_kwargs, so an alias leaking through **kwargs can no longer overwrite the deployment's resolved model.

Confidence Score: 4/5

Safe to merge with attention to the double-auth-check on the first-frame path and the router-alias gap in provider injection.

The core WebSocket auth flow is correct: URL-model connections get model-allowlist checks through the existing user_api_key_auth_websocket path, and first-frame connections get equivalent checks via _enforce_responses_ws_first_frame_model_auth. The cost-tracking and router-alias fixes are straightforward and well-tested. The _same_provider logic is a genuine improvement over the previous all-or-nothing guard, but router aliases sent as per-event models are treated as unresolvable (same as before). The first-frame auth function also double-invokes _enforce_key_and_fallback_model_access and _run_centralized_common_checks relative to common_processing_pre_call_logic, adding latency on first-frame connections.

litellm/proxy/response_api_endpoints/endpoints.py (double auth call on first-frame path) and litellm/responses/streaming_iterator.py (_same_provider alias behaviour)

Important Files Changed

Filename	Overview
litellm/proxy/response_api_endpoints/endpoints.py	Makes ?model= optional; adds _read_ws_model_from_first_frame, _extract_model_from_first_ws_event, and _enforce_responses_ws_first_frame_model_auth. Model-specific auth (allowlist + centralized checks) only runs when model comes from the first frame, not from the URL query param — an intentional asymmetry that warrants attention.
litellm/responses/streaming_iterator.py	Adds first_message forwarding through both streaming paths; introduces _same_provider and _connection_provider to gate custom_llm_provider injection. The _same_provider logic returns False for unresolvable event models (e.g. router aliases), preventing provider inheritance that the old code also didn't supply — net behavior is the same for that case.
litellm/router.py	Two changes: explicit "model": model_name at end of response_kwargs so alias from kwargs can't overwrite deployment model; early return in _update_tpm_rpm_redis_async for WS session wrapper call types. Both changes are correct and well-targeted.
litellm/proxy/hooks/proxy_track_cost_callback.py	Adds skip guard for _aresponses_websocket and _arealtime call types when standard_logging_object is None; per-turn costs are tracked on inner calls. Simple and correct.
litellm/router_strategy/budget_limiter.py	Same skip guard as proxy_track_cost_callback.py for WS session wrappers. Simple one-liner, correctly placed before payload extraction.
litellm/llms/custom_httpx/llm_http_handler.py	Threads first_message through aresponses_websocket_handler and ResponsesWebSocketStreaming constructors; plumbing-only change, no logic changes.
tests/test_litellm/proxy/response_api_endpoints/test_endpoints.py	Adds extensive mock-only tests for first-frame model extraction, validation, auth enforcement, cost-tracking skips, and _same_provider logic. All tests use mocks with no real network calls. Coverage is thorough for the happy paths.
tests/test_litellm/test_router.py	Adds test_ageneric_api_call_deployment_model_overrides_alias with inject_alias_into_kwargs side_effect to replicate the real call-path alias leak — an improvement over what was flagged in the previous review thread.

_{Reviews (12): Last reviewed commit: "fix(responses-ws): fall back to explicit..." | Re-trigger Greptile}

veria-ai · 2026-06-03T06:17:22Z

PR overview

All previously flagged issues have been addressed. No open security concerns remain on this pull request.

Security review

No open security issues remain on this pull request.

Fixed/addressed: 1 · PR risk: 0/10

Fix JSON injection: use json.dumps instead of f-string interpolation for model name in WS body. Add 30s timeout for first WS frame to prevent unbounded connection resource tie-up. Restore per-event model override in streaming_iterator; fall back to connection-level model when event omits it. Strengthen regression test: inject alias into kwargs via _update_kwargs_with_deployment mock so the test would fail on un-fixed code.

Sameerlite · 2026-06-03T09:57:51Z

@greptileai

…tion When ?model= is omitted, the first WS frame can carry the model in either flat format (first_event["model"]) or nested format (first_event["response"]["model"]). The flat-only check would silently reject clients using the nested wire format. Mirrors the same two-format logic in _build_base_call_kwargs.

Sameerlite · 2026-06-03T10:10:53Z

@greptileai

…del overrides If a client sends a different model per response.create turn, litellm needs to re-resolve the provider from that model string. Forcing the connection-level custom_llm_provider would silently route the request to the wrong backend. Only inject custom_llm_provider when the per-event model matches the connection-level model.

Sameerlite · 2026-06-03T10:23:14Z

@greptileai

Pull the flat/nested model extraction into _extract_model_from_first_ws_event so tests import and exercise the real function rather than a copy.

Sameerlite · 2026-06-03T10:35:29Z

@greptileai

The model == self.model guard was too strict: same-provider model variants (e.g., vertex_ai/gemini-2.0 -> vertex_ai/gemini-1.5 on one connection) would lose custom_llm_provider, breaking routing when a custom api_base is in use. Compare the provider extracted by get_llm_provider instead, so same-provider variants still inherit the connection-level provider while cross-provider overrides let litellm re-resolve.

Sameerlite · 2026-06-03T10:47:09Z

@greptileai

…ny statements)

Sameerlite · 2026-06-03T13:01:46Z

@greptileai

Sameerlite · 2026-06-03T13:07:16Z

bugbot run

cursor

Cursor Bugbot has reviewed your changes using high effort and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

✅ Fixed: Consumed first frame dropped silently
- The first frame is now rejected with a WebSocket error unless it is a response.create JSON object, so it is never silently consumed.
✅ Fixed: Auth skips model allowlist
- When the model is resolved from the first frame, the proxy now reruns key and common model authorization checks before routing.
✅ Fixed: Non-object JSON crashes handler
- The first-frame parser now validates the decoded payload type and handles non-object JSON as a structured invalid request.

_{You can send follow-ups to the cloud agent here.}

CLAassistant · 2026-06-03T13:21:55Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ Sameerlite
✅ mateo-berri
❌ cursoragent
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Sameerlite · 2026-06-03T13:25:47Z

bugbot run

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 348853e. Configure here.}

Sameerlite · 2026-06-03T13:30:40Z

@greptileai

Distinguish client disconnects from server errors when reading the responses WebSocket first frame, make the cost-tracking skip log message accurate for session wrappers (which do carry a model), and resolve the connection-level provider once per session instead of on every response.create event.

mateo-berri · 2026-06-03T16:06:50Z

@greptileai

…njection Adds regression tests for the still-uncovered responses WebSocket paths: the timeout, invalid-JSON and missing-model branches of _read_ws_model_from_first_frame, plus the provider comparison in ManagedResponsesWebSocketHandler._same_provider and _inject_credentials (same-provider model variants keep the connection provider; cross-provider models re-resolve).

mateo-berri · 2026-06-03T16:14:29Z

@greptileai

…ements

mateo-berri · 2026-06-03T17:32:32Z

@greptileai

…nection model is unresolvable When a WebSocket session is opened with a custom deployment alias that litellm cannot resolve to a provider, _connection_provider was None, so _same_provider returned False for every resolvable per-event model and the connection-level custom_llm_provider was dropped. Use the explicitly-set custom_llm_provider as the connection provider in that case so same-provider per-event models still inherit it while genuinely cross-provider models continue to re-resolve.

mateo-berri · 2026-06-03T17:51:07Z

@greptileai

mateo-berri

LGTM; thanks!

krrish-berri-2 · 2026-06-04T00:41:13Z

i'm confused - does this add wss:// /v1/responses to litellm? codex is complaining this endpoint doesn't exist

krrish-berri-2 · 2026-06-04T00:41:28Z

this is causing a 4.5s startup latency when launching codex for a session via litellm

Sameerlite · 2026-06-04T03:28:43Z

i'm confused - does this add wss:// /v1/responses to litellm? codex is complaining this endpoint doesn't exist

So this was designed based on how we have designed realtime websocket. Model should be present at the time connction as query param and then based on that we will connect it to the correct provider. But openai was first connecting. And then sending model in codex. Hence it was not working. I updated the code to accept the websocket req first and when the model is sent, use that for routing and starting actual websocket connection with the provider

this is causing a 4.5s startup latency when launching codex for a session via litellm

Not able to reproduce
https://www.loom.com/share/6541b419e87d43ba815d01c67d5c39c9

…9.0) (#93) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/berriai/litellm](https://images.chainguard.dev/directory/image/wolfi-base/overview) ([source](https://github.com/BerriAI/litellm)) | minor | `v1.88.1` → `v1.89.0` | --- ### Release Notes <details> <summary>BerriAI/litellm (ghcr.io/berriai/litellm)</summary> ### [`v1.89.0`](https://github.com/BerriAI/litellm/releases/tag/v1.89.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.89.0...v1.89.0) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.89.0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - test(responses): bump deprecated gemini-3-pro-preview to gemini-3.1-pro-preview by [@mateo-berri](https://github.com/mateo-berri) in [#29433](https://github.com/BerriAI/litellm/pull/29433) - fix: map mistral/ministral-8b-latest in model price map by [@mateo-berri](https://github.com/mateo-berri) in [#29453](https://github.com/BerriAI/litellm/pull/29453) - fix(datadog): split oversized batches on 413 instead of re-queueing forever by [@yassin-berriai](https://github.com/yassin-berriai) in [#29444](https://github.com/BerriAI/litellm/pull/29444) - feat(otel): allowlist team\_metadata sub-keys promoted to baggage by [@yassin-berriai](https://github.com/yassin-berriai) in [#29442](https://github.com/BerriAI/litellm/pull/29442) - fix: stop use\_chat\_completions\_api flag from leaking into provider request body by [@mateo-berri](https://github.com/mateo-berri) in [#29447](https://github.com/BerriAI/litellm/pull/29447) - fix(anthropic, fireworks): inline legacy $ref defs in tool schemas by [@milan-berri](https://github.com/milan-berri) in [#28646](https://github.com/BerriAI/litellm/pull/28646) - fix(proxy): omit OpenAI \[DONE] on google-genai streamGenerateContent by [@Sameerlite](https://github.com/Sameerlite) in [#29426](https://github.com/BerriAI/litellm/pull/29426) - ci(release): create stable/X.Y.x line branch on X.Y.0 tags by [@yuneng-berri](https://github.com/yuneng-berri) in [#29457](https://github.com/BerriAI/litellm/pull/29457) - fix(vector-stores): support engines URL for Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#27885](https://github.com/BerriAI/litellm/pull/27885) - fix(ui): render caller-supplied filter options in caller order by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29462](https://github.com/BerriAI/litellm/pull/29462) - fix(batches): skip unnecessary batch input file reads by [@Sameerlite](https://github.com/Sameerlite) in [#29114](https://github.com/BerriAI/litellm/pull/29114) - docs(agents): clarify when to create new test files by [@Sameerlite](https://github.com/Sameerlite) in [#29472](https://github.com/BerriAI/litellm/pull/29472) - Litellm OSS Staging by [@Sameerlite](https://github.com/Sameerlite) in [#29161](https://github.com/BerriAI/litellm/pull/29161) - fix(mcp): clear allowed\_tools and tool overrides on MCP server edit by [@Sameerlite](https://github.com/Sameerlite) in [#29411](https://github.com/BerriAI/litellm/pull/29411) - Litellm OSS Staging 010626 by [@Sameerlite](https://github.com/Sameerlite) in [#29422](https://github.com/BerriAI/litellm/pull/29422) - fix(ci): make CircleCI rerun-failed-tests collect tests when 2+ test files fail by [@mateo-berri](https://github.com/mateo-berri) in [#29475](https://github.com/BerriAI/litellm/pull/29475) - feat(a2a): watsonx Orchestrate agent provider by [@Sameerlite](https://github.com/Sameerlite) in [#29410](https://github.com/BerriAI/litellm/pull/29410) - fix(azure\_ai): strip tool-level extra fields on 400 and retry by [@Sameerlite](https://github.com/Sameerlite) in [#29479](https://github.com/BerriAI/litellm/pull/29479) - fix(docs): remove fixed dimensions from README hero image by [@mateo-berri](https://github.com/mateo-berri) in [#29496](https://github.com/BerriAI/litellm/pull/29496) - Litellm oss staging by [@Sameerlite](https://github.com/Sameerlite) in [#29492](https://github.com/BerriAI/litellm/pull/29492) - fix: small CLAUDE.md nits by [@mateo-berri](https://github.com/mateo-berri) in [#29504](https://github.com/BerriAI/litellm/pull/29504) - Add MCP semantic conventions to otelv2 by [@yassin-berriai](https://github.com/yassin-berriai) in [#29468](https://github.com/BerriAI/litellm/pull/29468) - fix(passthrough): emit otel guardrail span when a guardrail blocks by [@yassin-berriai](https://github.com/yassin-berriai) in [#29470](https://github.com/BerriAI/litellm/pull/29470) - fix(proxy): strip NUL bytes from spend log payloads to prevent PostgreSQL 22P05 by [@milan-berri](https://github.com/milan-berri) in [#29515](https://github.com/BerriAI/litellm/pull/29515) - \[internal copy of [#28008](https://github.com/BerriAI/litellm/issues/28008)] Support MCP OAuth passthrough and issuer-scoped JWT auth by [@mateo-berri](https://github.com/mateo-berri) in [#28356](https://github.com/BerriAI/litellm/pull/28356) - feat(vector-stores): forward per-request params to Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29459](https://github.com/BerriAI/litellm/pull/29459) - feat(proxy): add per-MCP-server RPM rate limiting for keys and teams by [@Sameerlite](https://github.com/Sameerlite) in [#29482](https://github.com/BerriAI/litellm/pull/29482) - fix(tests): drop module-level test calls that break local\_testing collection by [@mateo-berri](https://github.com/mateo-berri) in [#29520](https://github.com/BerriAI/litellm/pull/29520) - feat(agents): add LangFlow agent provider with A2A session bridging by [@Sameerlite](https://github.com/Sameerlite) in [#28963](https://github.com/BerriAI/litellm/pull/28963) - fix(ui/agents): make A2A skill tags enterable and validated by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29512](https://github.com/BerriAI/litellm/pull/29512) - \[internal copy of [#29232](https://github.com/BerriAI/litellm/issues/29232)] feat: route future Claude models to Anthropic provider via pattern matching by [@mateo-berri](https://github.com/mateo-berri) in [#29239](https://github.com/BerriAI/litellm/pull/29239) - fix(tests): drop import-time completion call in test\_register\_model by [@mateo-berri](https://github.com/mateo-berri) in [#29521](https://github.com/BerriAI/litellm/pull/29521) - test: stabilize batch VCR coverage and stop live upload/network leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29477](https://github.com/BerriAI/litellm/pull/29477) - \[internal copy of [#29003](https://github.com/BerriAI/litellm/issues/29003)] fix(vertex\_ai): use user-supplied api\_base as is for Model Garden OpenAI-compat path by [@mateo-berri](https://github.com/mateo-berri) in [#29530](https://github.com/BerriAI/litellm/pull/29530) - feat(proxy): native /health/drain preStop hook for graceful shutdown by [@yassin-berriai](https://github.com/yassin-berriai) in [#29439](https://github.com/BerriAI/litellm/pull/29439) - fix(auth): preserve 401 status for expired JWTs in OTel traces by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29510](https://github.com/BerriAI/litellm/pull/29510) - fix(otel): capture 401 error details in management endpoint spans by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29535](https://github.com/BerriAI/litellm/pull/29535) - test(proxy/utils): pin bottom-of-file helper behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29509](https://github.com/BerriAI/litellm/pull/29509) - test(proxy/utils): pin PrismaClient and spend-update behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29488](https://github.com/BerriAI/litellm/pull/29488) - test(proxy/utils): pin ProxyLogging behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29485](https://github.com/BerriAI/litellm/pull/29485) - fix: missing span for guardrail passthrough by [@yassin-berriai](https://github.com/yassin-berriai) in [#29552](https://github.com/BerriAI/litellm/pull/29552) - fix(auth): let internal users view search tools by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29542](https://github.com/BerriAI/litellm/pull/29542) - fix: missing mcp otel attributes by [@yassin-berriai](https://github.com/yassin-berriai) in [#29554](https://github.com/BerriAI/litellm/pull/29554) - fix(proxy): resolve managed video model ids for auth by [@shivamrawat1](https://github.com/shivamrawat1) in [#29545](https://github.com/BerriAI/litellm/pull/29545) - fix(key\_generate): allow team members to create keys on org-scoped teams by [@milan-berri](https://github.com/milan-berri) in [#29310](https://github.com/BerriAI/litellm/pull/29310) - test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite by [@mateo-berri](https://github.com/mateo-berri) in [#29595](https://github.com/BerriAI/litellm/pull/29595) - Litellm oss staging 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29578](https://github.com/BerriAI/litellm/pull/29578) - Fix : a2a bugs 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29566](https://github.com/BerriAI/litellm/pull/29566) - \[internal copy of [#29533](https://github.com/BerriAI/litellm/issues/29533)] fix(anthropic/adapter): emit thinking block for reasoning\_content-only streaming chunks by [@mateo-berri](https://github.com/mateo-berri) in [#29600](https://github.com/BerriAI/litellm/pull/29600) - ci: reproduce default-Windows wheel install to guard MAX\_PATH by [@yuneng-berri](https://github.com/yuneng-berri) in [#29597](https://github.com/BerriAI/litellm/pull/29597) - fix(vertex): strip output\_config.effort for Vertex Claude models that reject it (Haiku 4.5) by [@mateo-berri](https://github.com/mateo-berri) in [#29585](https://github.com/BerriAI/litellm/pull/29585) - Litellm websocket improvements by [@Sameerlite](https://github.com/Sameerlite) in [#29563](https://github.com/BerriAI/litellm/pull/29563) - feat(arize/phoenix): OpenInference rendering parity — tool\_calls, cost, passthrough I/O, session/user, multimodal, cache tokens by [@milan-berri](https://github.com/milan-berri) in [#28800](https://github.com/BerriAI/litellm/pull/28800) - \[internal copy of [#29550](https://github.com/BerriAI/litellm/issues/29550)] fix: passthrough endpoints duplicate logs by [@mateo-berri](https://github.com/mateo-berri) in [#29598](https://github.com/BerriAI/litellm/pull/29598) - fix(ci): keep coverage rename green when a parallel node runs no tests by [@mateo-berri](https://github.com/mateo-berri) in [#29608](https://github.com/BerriAI/litellm/pull/29608) - test(vcr): close out the remaining VCR live-call leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29603](https://github.com/BerriAI/litellm/pull/29603) - fix(key\_generate): exempt UI/CLI session tokens from the budget ceiling for team keys by [@yuneng-berri](https://github.com/yuneng-berri) in [#29612](https://github.com/BerriAI/litellm/pull/29612) - fix(realtime): allow null transcripts in stream logging payloads by [@milan-berri](https://github.com/milan-berri) in [#29625](https://github.com/BerriAI/litellm/pull/29625) - build(ui): migrate eslint to flat config + bump eslint-config-next to 16 by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29626](https://github.com/BerriAI/litellm/pull/29626) - fix(key\_generate): scope session-token team-key budget exemption to caller-supplied team\_id by [@yuneng-berri](https://github.com/yuneng-berri) in [#29641](https://github.com/BerriAI/litellm/pull/29641) - fix(proxy): disable proxy buffering on streaming SSE responses by [@mateo-berri](https://github.com/mateo-berri) in [#29557](https://github.com/BerriAI/litellm/pull/29557) - fix(mcp): gate /public/mcp\_hub strictly on litellm.public\_mcp\_servers by [@michelligabriele](https://github.com/michelligabriele) in [#27764](https://github.com/BerriAI/litellm/pull/27764) - ci(ui): frontend-lint job enforcing prettier + eslint on changed files by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29633](https://github.com/BerriAI/litellm/pull/29633) - fix(gemini): googleSearch + server-side tools and googleMaps JSON schema by [@Sameerlite](https://github.com/Sameerlite) in [#29582](https://github.com/BerriAI/litellm/pull/29582) - fix(proxy): passthrough 404 when SERVER\_ROOT\_PATH is set by [@Sameerlite](https://github.com/Sameerlite) in [#29658](https://github.com/BerriAI/litellm/pull/29658) - fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29662](https://github.com/BerriAI/litellm/pull/29662) - Litellm oss staging 040626 by [@Sameerlite](https://github.com/Sameerlite) in [#29671](https://github.com/BerriAI/litellm/pull/29671) - style(ui): prettier formatting pass over the dashboard by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29622](https://github.com/BerriAI/litellm/pull/29622) - chore: ignore prettier dashboard reformat in git blame by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29695](https://github.com/BerriAI/litellm/pull/29695) - fix(helm): Enable Backend Deployment to mount Gateway config.yaml by [@tin-berri](https://github.com/tin-berri) in [#29605](https://github.com/BerriAI/litellm/pull/29605) - \[internal copy of [#29277](https://github.com/BerriAI/litellm/issues/29277)] fix(proxy): add default=None to LiteLLM\_TeamMembership.litellm\_budget\_table by [@mateo-berri](https://github.com/mateo-berri) in [#29684](https://github.com/BerriAI/litellm/pull/29684) - test: make custom\_tokenizer proxy tests hermetic by [@yuneng-berri](https://github.com/yuneng-berri) in [#29643](https://github.com/BerriAI/litellm/pull/29643) - test(proxy): stop running real-DB tests in GitHub Actions unit jobs by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29700](https://github.com/BerriAI/litellm/pull/29700) - chore(ui): remove the bare-fetch lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29712](https://github.com/BerriAI/litellm/pull/29712) - Litellm jwt mapping virtualkeys by [@shivamrawat1](https://github.com/shivamrawat1) in [#28510](https://github.com/BerriAI/litellm/pull/28510) - refactor(ui): shared HTTP client + location-pinned fetch() lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29723](https://github.com/BerriAI/litellm/pull/29723) - fix(proxy): stop team BYOK model name corruption on model edit by [@yuneng-berri](https://github.com/yuneng-berri) in [#29731](https://github.com/BerriAI/litellm/pull/29731) - \[internal copy of [#29511](https://github.com/BerriAI/litellm/issues/29511)] feat(guardrails): add sensitive data routing to on-premise models by [@mateo-berri](https://github.com/mateo-berri) in [#29531](https://github.com/BerriAI/litellm/pull/29531) - fix(proxy/hooks): populate llm\_provider on internal rate-limit errors by [@mateo-berri](https://github.com/mateo-berri) in [#27707](https://github.com/BerriAI/litellm/pull/27707) - fix(vertex/anthropic): handle namespace tools and strip client\_metadata for codex compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29489](https://github.com/BerriAI/litellm/pull/29489) - Support OAuth M2M for Databricks Apps A2A agents by [@mateo-berri](https://github.com/mateo-berri) in [#29586](https://github.com/BerriAI/litellm/pull/29586) - fix: small CLAUDE.md nit by [@mateo-berri](https://github.com/mateo-berri) in [#29749](https://github.com/BerriAI/litellm/pull/29749) - fix(anthropic): route Claude Opus 4.8 through adaptive thinking by [@mateo-berri](https://github.com/mateo-berri) in [#29702](https://github.com/BerriAI/litellm/pull/29702) - fix(proxy): persist oauth2\_flow on MCP server registration by [@michelligabriele](https://github.com/michelligabriele) in [#29690](https://github.com/BerriAI/litellm/pull/29690) - \[internal copy of [#27491](https://github.com/BerriAI/litellm/issues/27491)] fix(realtime): Fix Realtime Audio Token Cost Tracking by [@mateo-berri](https://github.com/mateo-berri) in [#29722](https://github.com/BerriAI/litellm/pull/29722) - fix(galileo): use ingest traces API and standard logging payload by [@Sameerlite](https://github.com/Sameerlite) in [#29651](https://github.com/BerriAI/litellm/pull/29651) - fix(auth): expand all-team-models sentinel in can\_key\_call\_model for batch validation by [@Sameerlite](https://github.com/Sameerlite) in [#29746](https://github.com/BerriAI/litellm/pull/29746) - test(vcr): stop refreshing cassette TTL on read so cassettes lapse after 24h by [@mateo-berri](https://github.com/mateo-berri) in [#29784](https://github.com/BerriAI/litellm/pull/29784) - test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound by [@mateo-berri](https://github.com/mateo-berri) in [#29787](https://github.com/BerriAI/litellm/pull/29787) - fix(ui): route MCP playground auth by oauth2 mode instead of token\_url by [@tin-berri](https://github.com/tin-berri) in [#29714](https://github.com/BerriAI/litellm/pull/29714) - refactor(ui): centralize proxy base URL resolution into tested resolver by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29793](https://github.com/BerriAI/litellm/pull/29793) - Litellm oss staging 050626 by [@Sameerlite](https://github.com/Sameerlite) in [#29774](https://github.com/BerriAI/litellm/pull/29774) - test(google): add google-genai SDK proxy integration tests by [@Sameerlite](https://github.com/Sameerlite) in [#29781](https://github.com/BerriAI/litellm/pull/29781) - fix(jwt): use resolved DB user\_id for spend on legacy email match by [@milan-berri](https://github.com/milan-berri) in [#29217](https://github.com/BerriAI/litellm/pull/29217) - feat(ui): generate dashboard API types from the proxy OpenAPI spec by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29816](https://github.com/BerriAI/litellm/pull/29816) - fix(proxy): drop deleted team BYOK model name from team.models by [@yuneng-berri](https://github.com/yuneng-berri) in [#29820](https://github.com/BerriAI/litellm/pull/29820) - feat(mcp): per-server env vars with global + per-user scopes by [@mateo-berri](https://github.com/mateo-berri) in [#28917](https://github.com/BerriAI/litellm/pull/28917) - refactor(ui): route behavior-preserving networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29806](https://github.com/BerriAI/litellm/pull/29806) - fix(mcp): persist Tools-tab MCP OAuth token to DB by [@tin-berri](https://github.com/tin-berri) in [#29809](https://github.com/BerriAI/litellm/pull/29809) - fix(ui): require new expiration when regenerating an expired key by [@milan-berri](https://github.com/milan-berri) in [#29838](https://github.com/BerriAI/litellm/pull/29838) - refactor(ui): route query-building networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29815](https://github.com/BerriAI/litellm/pull/29815) - Make the image-gen record/replay proxy report cache mode and per-request HIT/MISS by [@mateo-berri](https://github.com/mateo-berri) in [#29802](https://github.com/BerriAI/litellm/pull/29802) - feat(proxy): hot-reload .env in dev when running with --reload by [@mateo-berri](https://github.com/mateo-berri) in [#29783](https://github.com/BerriAI/litellm/pull/29783) - fix(ui): stop MCP playground tool calls from sending twice by [@tin-berri](https://github.com/tin-berri) in [#29821](https://github.com/BerriAI/litellm/pull/29821) - feat(fal\_ai): add Nano Banana / Gemini 2.5 Flash Image generation support by [@mateo-berri](https://github.com/mateo-berri) in [#29798](https://github.com/BerriAI/litellm/pull/29798) - Title: Fix managed batch cancel credential resolution by [@shivamrawat1](https://github.com/shivamrawat1) in [#29734](https://github.com/BerriAI/litellm/pull/29734) - Title: fix(proxy): resolve vector store file list credentials from team deployments by [@shivamrawat1](https://github.com/shivamrawat1) in [#29739](https://github.com/BerriAI/litellm/pull/29739) - refactor: convert AWS and GCP Terraform stacks into reusable modules … by [@yassin-berriai](https://github.com/yassin-berriai) in [#28103](https://github.com/BerriAI/litellm/pull/28103) - chore(ui): build ui for release by [@yuneng-berri](https://github.com/yuneng-berri) in [#29853](https://github.com/BerriAI/litellm/pull/29853) - fix(terraform/gcp): prompt for image\_registry in DeployStack one-click by [@yassin-berriai](https://github.com/yassin-berriai) in [#29852](https://github.com/BerriAI/litellm/pull/29852) - fix(terraform/gcp): abandon SQL user on destroy by [@yassin-berriai](https://github.com/yassin-berriai) in [#29855](https://github.com/BerriAI/litellm/pull/29855) - Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic by [@mateo-berri](https://github.com/mateo-berri) in [#29847](https://github.com/BerriAI/litellm/pull/29847) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#29860](https://github.com/BerriAI/litellm/pull/29860) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29861](https://github.com/BerriAI/litellm/pull/29861) - fix: 400 on Anthropic context overflow; seed identity on failed auth by [@yassin-berriai](https://github.com/yassin-berriai) in [#29848](https://github.com/BerriAI/litellm/pull/29848) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29862](https://github.com/BerriAI/litellm/pull/29862) - chore(release): patch v1.89.0-rc.1 with [#30064](https://github.com/BerriAI/litellm/issues/30064) (Claude Fable 5) for v1.89.0-rc.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30143](https://github.com/BerriAI/litellm/pull/30143) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.89.0> ### [`v1.89.0`](https://github.com/BerriAI/litellm/releases/tag/v1.89.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.2...v1.89.0) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.89.0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - test(responses): bump deprecated gemini-3-pro-preview to gemini-3.1-pro-preview by [@mateo-berri](https://github.com/mateo-berri) in [#29433](https://github.com/BerriAI/litellm/pull/29433) - fix: map mistral/ministral-8b-latest in model price map by [@mateo-berri](https://github.com/mateo-berri) in [#29453](https://github.com/BerriAI/litellm/pull/29453) - fix(datadog): split oversized batches on 413 instead of re-queueing forever by [@yassin-berriai](https://github.com/yassin-berriai) in [#29444](https://github.com/BerriAI/litellm/pull/29444) - feat(otel): allowlist team\_metadata sub-keys promoted to baggage by [@yassin-berriai](https://github.com/yassin-berriai) in [#29442](https://github.com/BerriAI/litellm/pull/29442) - fix: stop use\_chat\_completions\_api flag from leaking into provider request body by [@mateo-berri](https://github.com/mateo-berri) in [#29447](https://github.com/BerriAI/litellm/pull/29447) - fix(anthropic, fireworks): inline legacy $ref defs in tool schemas by [@milan-berri](https://github.com/milan-berri) in [#28646](https://github.com/BerriAI/litellm/pull/28646) - fix(proxy): omit OpenAI \[DONE] on google-genai streamGenerateContent by [@Sameerlite](https://github.com/Sameerlite) in [#29426](https://github.com/BerriAI/litellm/pull/29426) - ci(release): create stable/X.Y.x line branch on X.Y.0 tags by [@yuneng-berri](https://github.com/yuneng-berri) in [#29457](https://github.com/BerriAI/litellm/pull/29457) - fix(vector-stores): support engines URL for Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#27885](https://github.com/BerriAI/litellm/pull/27885) - fix(ui): render caller-supplied filter options in caller order by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29462](https://github.com/BerriAI/litellm/pull/29462) - fix(batches): skip unnecessary batch input file reads by [@Sameerlite](https://github.com/Sameerlite) in [#29114](https://github.com/BerriAI/litellm/pull/29114) - docs(agents): clarify when to create new test files by [@Sameerlite](https://github.com/Sameerlite) in [#29472](https://github.com/BerriAI/litellm/pull/29472) - Litellm OSS Staging by [@Sameerlite](https://github.com/Sameerlite) in [#29161](https://github.com/BerriAI/litellm/pull/29161) - fix(mcp): clear allowed\_tools and tool overrides on MCP server edit by [@Sameerlite](https://github.com/Sameerlite) in [#29411](https://github.com/BerriAI/litellm/pull/29411) - Litellm OSS Staging 010626 by [@Sameerlite](https://github.com/Sameerlite) in [#29422](https://github.com/BerriAI/litellm/pull/29422) - fix(ci): make CircleCI rerun-failed-tests collect tests when 2+ test files fail by [@mateo-berri](https://github.com/mateo-berri) in [#29475](https://github.com/BerriAI/litellm/pull/29475) - feat(a2a): watsonx Orchestrate agent provider by [@Sameerlite](https://github.com/Sameerlite) in [#29410](https://github.com/BerriAI/litellm/pull/29410) - fix(azure\_ai): strip tool-level extra fields on 400 and retry by [@Sameerlite](https://github.com/Sameerlite) in [#29479](https://github.com/BerriAI/litellm/pull/29479) - fix(docs): remove fixed dimensions from README hero image by [@mateo-berri](https://github.com/mateo-berri) in [#29496](https://github.com/BerriAI/litellm/pull/29496) - Litellm oss staging by [@Sameerlite](https://github.com/Sameerlite) in [#29492](https://github.com/BerriAI/litellm/pull/29492) - fix: small CLAUDE.md nits by [@mateo-berri](https://github.com/mateo-berri) in [#29504](https://github.com/BerriAI/litellm/pull/29504) - Add MCP semantic conventions to otelv2 by [@yassin-berriai](https://github.com/yassin-berriai) in [#29468](https://github.com/BerriAI/litellm/pull/29468) - fix(passthrough): emit otel guardrail span when a guardrail blocks by [@yassin-berriai](https://github.com/yassin-berriai) in [#29470](https://github.com/BerriAI/litellm/pull/29470) - fix(proxy): strip NUL bytes from spend log payloads to prevent PostgreSQL 22P05 by [@milan-berri](https://github.com/milan-berri) in [#29515](https://github.com/BerriAI/litellm/pull/29515) - \[internal copy of [#28008](https://github.com/BerriAI/litellm/issues/28008)] Support MCP OAuth passthrough and issuer-scoped JWT auth by [@mateo-berri](https://github.com/mateo-berri) in [#28356](https://github.com/BerriAI/litellm/pull/28356) - feat(vector-stores): forward per-request params to Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29459](https://github.com/BerriAI/litellm/pull/29459) - feat(proxy): add per-MCP-server RPM rate limiting for keys and teams by [@Sameerlite](https://github.com/Sameerlite) in [#29482](https://github.com/BerriAI/litellm/pull/29482) - fix(tests): drop module-level test calls that break local\_testing collection by [@mateo-berri](https://github.com/mateo-berri) in [#29520](https://github.com/BerriAI/litellm/pull/29520) - feat(agents): add LangFlow agent provider with A2A session bridging by [@Sameerlite](https://github.com/Sameerlite) in [#28963](https://github.com/BerriAI/litellm/pull/28963) - fix(ui/agents): make A2A skill tags enterable and validated by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29512](https://github.com/BerriAI/litellm/pull/29512) - \[internal copy of [#29232](https://github.com/BerriAI/litellm/issues/29232)] feat: route future Claude models to Anthropic provider via pattern matching by [@mateo-berri](https://github.com/mateo-berri) in [#29239](https://github.com/BerriAI/litellm/pull/29239) - fix(tests): drop import-time completion call in test\_register\_model by [@mateo-berri](https://github.com/mateo-berri) in [#29521](https://github.com/BerriAI/litellm/pull/29521) - test: stabilize batch VCR coverage and stop live upload/network leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29477](https://github.com/BerriAI/litellm/pull/29477) - \[internal copy of [#29003](https://github.com/BerriAI/litellm/issues/29003)] fix(vertex\_ai): use user-supplied api\_base as is for Model Garden OpenAI-compat path by [@mateo-berri](https://github.com/mateo-berri) in [#29530](https://github.com/BerriAI/litellm/pull/29530) - feat(proxy): native /health/drain preStop hook for graceful shutdown by [@yassin-berriai](https://github.com/yassin-berriai) in [#29439](https://github.com/BerriAI/litellm/pull/29439) - fix(auth): preserve 401 status for expired JWTs in OTel traces by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29510](https://github.com/BerriAI/litellm/pull/29510) - fix(otel): capture 401 error details in management endpoint spans by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29535](https://github.com/BerriAI/litellm/pull/29535) - test(proxy/utils): pin bottom-of-file helper behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29509](https://github.com/BerriAI/litellm/pull/29509) - test(proxy/utils): pin PrismaClient and spend-update behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29488](https://github.com/BerriAI/litellm/pull/29488) - test(proxy/utils): pin ProxyLogging behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29485](https://github.com/BerriAI/litellm/pull/29485) - fix: missing span for guardrail passthrough by [@yassin-berriai](https://github.com/yassin-berriai) in [#29552](https://github.com/BerriAI/litellm/pull/29552) - fix(auth): let internal users view search tools by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29542](https://github.com/BerriAI/litellm/pull/29542) - fix: missing mcp otel attributes by [@yassin-berriai](https://github.com/yassin-berriai) in [#29554](https://github.com/BerriAI/litellm/pull/29554) - fix(proxy): resolve managed video model ids for auth by [@shivamrawat1](https://github.com/shivamrawat1) in [#29545](https://github.com/BerriAI/litellm/pull/29545) - fix(key\_generate): allow team members to create keys on org-scoped teams by [@milan-berri](https://github.com/milan-berri) in [#29310](https://github.com/BerriAI/litellm/pull/29310) - test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite by [@mateo-berri](https://github.com/mateo-berri) in [#29595](https://github.com/BerriAI/litellm/pull/29595) - Litellm oss staging 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29578](https://github.com/BerriAI/litellm/pull/29578) - Fix : a2a bugs 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29566](https://github.com/BerriAI/litellm/pull/29566) - \[internal copy of [#29533](https://github.com/BerriAI/litellm/issues/29533)] fix(anthropic/adapter): emit thinking block for reasoning\_content-only streaming chunks by [@mateo-berri](https://github.com/mateo-berri) in [#29600](https://github.com/BerriAI/litellm/pull/29600) - ci: reproduce default-Windows wheel install to guard MAX\_PATH by [@yuneng-berri](https://github.com/yuneng-berri) in [#29597](https://github.com/BerriAI/litellm/pull/29597) - fix(vertex): strip output\_config.effort for Vertex Claude models that reject it (Haiku 4.5) by [@mateo-berri](https://github.com/mateo-berri) in [#29585](https://github.com/BerriAI/litellm/pull/29585) - Litellm websocket improvements by [@Sameerlite](https://github.com/Sameerlite) in [#29563](https://github.com/BerriAI/litellm/pull/29563) - feat(arize/phoenix): OpenInference rendering parity — tool\_calls, cost, passthrough I/O, session/user, multimodal, cache tokens by [@milan-berri](https://github.com/milan-berri) in [#28800](https://github.com/BerriAI/litellm/pull/28800) - \[internal copy of [#29550](https://github.com/BerriAI/litellm/issues/29550)] fix: passthrough endpoints duplicate logs by [@mateo-berri](https://github.com/mateo-berri) in [#29598](https://github.com/BerriAI/litellm/pull/29598) - fix(ci): keep coverage rename green when a parallel node runs no tests by [@mateo-berri](https://github.com/mateo-berri) in [#29608](https://github.com/BerriAI/litellm/pull/29608) - test(vcr): close out the remaining VCR live-call leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29603](https://github.com/BerriAI/litellm/pull/29603) - fix(key\_generate): exempt UI/CLI session tokens from the budget ceiling for team keys by [@yuneng-berri](https://github.com/yuneng-berri) in [#29612](https://github.com/BerriAI/litellm/pull/29612) - fix(realtime): allow null transcripts in stream logging payloads by [@milan-berri](https://github.com/milan-berri) in [#29625](https://github.com/BerriAI/litellm/pull/29625) - build(ui): migrate eslint to flat config + bump eslint-config-next to 16 by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29626](https://github.com/BerriAI/litellm/pull/29626) - fix(key\_generate): scope session-token team-key budget exemption to caller-supplied team\_id by [@yuneng-berri](https://github.com/yuneng-berri) in [#29641](https://github.com/BerriAI/litellm/pull/29641) - fix(proxy): disable proxy buffering on streaming SSE responses by [@mateo-berri](https://github.com/mateo-berri) in [#29557](https://github.com/BerriAI/litellm/pull/29557) - fix(mcp): gate /public/mcp\_hub strictly on litellm.public\_mcp\_servers by [@michelligabriele](https://github.com/michelligabriele) in [#27764](https://github.com/BerriAI/litellm/pull/27764) - ci(ui): frontend-lint job enforcing prettier + eslint on changed files by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29633](https://github.com/BerriAI/litellm/pull/29633) - fix(gemini): googleSearch + server-side tools and googleMaps JSON schema by [@Sameerlite](https://github.com/Sameerlite) in [#29582](https://github.com/BerriAI/litellm/pull/29582) - fix(proxy): passthrough 404 when SERVER\_ROOT\_PATH is set by [@Sameerlite](https://github.com/Sameerlite) in [#29658](https://github.com/BerriAI/litellm/pull/29658) - fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29662](https://github.com/BerriAI/litellm/pull/29662) - Litellm oss staging 040626 by [@Sameerlite](https://github.com/Sameerlite) in [#29671](https://github.com/BerriAI/litellm/pull/29671) - style(ui): prettier formatting pass over the dashboard by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29622](https://github.com/BerriAI/litellm/pull/29622) - chore: ignore prettier dashboard reformat in git blame by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29695](https://github.com/BerriAI/litellm/pull/29695) - fix(helm): Enable Backend Deployment to mount Gateway config.yaml by [@tin-berri](https://github.com/tin-berri) in [#29605](https://github.com/BerriAI/litellm/pull/29605) - \[internal copy of [#29277](https://github.com/BerriAI/litellm/issues/29277)] fix(proxy): add default=None to LiteLLM\_TeamMembership.litellm\_budget\_table by [@mateo-berri](https://github.com/mateo-berri) in [#29684](https://github.com/BerriAI/litellm/pull/29684) - test: make custom\_tokenizer proxy tests hermetic by [@yuneng-berri](https://github.com/yuneng-berri) in [#29643](https://github.com/BerriAI/litellm/pull/29643) - test(proxy): stop running real-DB tests in GitHub Actions unit jobs by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29700](https://github.com/BerriAI/litellm/pull/29700) - chore(ui): remove the bare-fetch lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29712](https://github.com/BerriAI/litellm/pull/29712) - Litellm jwt mapping virtualkeys by [@shivamrawat1](https://github.com/shivamrawat1) in [#28510](https://github.com/BerriAI/litellm/pull/28510) - refactor(ui): shared HTTP client + location-pinned fetch() lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29723](https://github.com/BerriAI/litellm/pull/29723) - fix(proxy): stop team BYOK model name corruption on model edit by [@yuneng-berri](https://github.com/yuneng-berri) in [#29731](https://github.com/BerriAI/litellm/pull/29731) - \[internal copy of [#29511](https://github.com/BerriAI/litellm/issues/29511)] feat(guardrails): add sensitive data routing to on-premise models by [@mateo-berri](https://github.com/mateo-berri) in [#29531](https://github.com/BerriAI/litellm/pull/29531) - fix(proxy/hooks): populate llm\_provider on internal rate-limit errors by [@mateo-berri](https://github.com/mateo-berri) in [#27707](https://github.com/BerriAI/litellm/pull/27707) - fix(vertex/anthropic): handle namespace tools and strip client\_metadata for codex compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29489](https://github.com/BerriAI/litellm/pull/29489) - Support OAuth M2M for Databricks Apps A2A agents by [@mateo-berri](https://github.com/mateo-berri) in [#29586](https://github.com/BerriAI/litellm/pull/29586) - fix: small CLAUDE.md nit by [@mateo-berri](https://github.com/mateo-berri) in [#29749](https://github.com/BerriAI/litellm/pull/29749) - fix(anthropic): route Claude Opus 4.8 through adaptive thinking by [@mateo-berri](https://github.com/mateo-berri) in [#29702](https://github.com/BerriAI/litellm/pull/29702) - fix(proxy): persist oauth2\_flow on MCP server registration by [@michelligabriele](https://github.com/michelligabriele) in [#29690](https://github.com/BerriAI/litellm/pull/29690) - \[internal copy of [#27491](https://github.com/BerriAI/litellm/issues/27491)] fix(realtime): Fix Realtime Audio Token Cost Tracking by [@mateo-berri](https://github.com/mateo-berri) in [#29722](https://github.com/BerriAI/litellm/pull/29722) - fix(galileo): use ingest traces API and standard logging payload by [@Sameerlite](https://github.com/Sameerlite) in [#29651](https://github.com/BerriAI/litellm/pull/29651) - fix(auth): expand all-team-models sentinel in can\_key\_call\_model for batch validation by [@Sameerlite](https://github.com/Sameerlite) in [#29746](https://github.com/BerriAI/litellm/pull/29746) - test(vcr): stop refreshing cassette TTL on read so cassettes lapse after 24h by [@mateo-berri](https://github.com/mateo-berri) in [#29784](https://github.com/BerriAI/litellm/pull/29784) - test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound by [@mateo-berri](https://github.com/mateo-berri) in [#29787](https://github.com/BerriAI/litellm/pull/29787) - fix(ui): route MCP playground auth by oauth2 mode instead of token\_url by [@tin-berri](https://github.com/tin-berri) in [#29714](https://github.com/BerriAI/litellm/pull/29714) - refactor(ui): centralize proxy base URL resolution into tested resolver by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29793](https://github.com/BerriAI/litellm/pull/29793) - Litellm oss staging 050626 by [@Sameerlite](https://github.com/Sameerlite) in [#29774](https://github.com/BerriAI/litellm/pull/29774) - test(google): add google-genai SDK proxy integration tests by [@Sameerlite](https://github.com/Sameerlite) in [#29781](https://github.com/BerriAI/litellm/pull/29781) - fix(jwt): use resolved DB user\_id for spend on legacy email match by [@milan-berri](https://github.com/milan-berri) in [#29217](https://github.com/BerriAI/litellm/pull/29217) - feat(ui): generate dashboard API types from the proxy OpenAPI spec by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29816](https://github.com/BerriAI/litellm/pull/29816) - fix(proxy): drop deleted team BYOK model name from team.models by [@yuneng-berri](https://github.com/yuneng-berri) in [#29820](https://github.com/BerriAI/litellm/pull/29820) - feat(mcp): per-server env vars with global + per-user scopes by [@mateo-berri](https://github.com/mateo-berri) in [#28917](https://github.com/BerriAI/litellm/pull/28917) - refactor(ui): route behavior-preserving networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29806](https://github.com/BerriAI/litellm/pull/29806) - fix(mcp): persist Tools-tab MCP OAuth token to DB by [@tin-berri](https://github.com/tin-berri) in [#29809](https://github.com/BerriAI/litellm/pull/29809) - fix(ui): require new expiration when regenerating an expired key by [@milan-berri](https://github.com/milan-berri) in [#29838](https://github.com/BerriAI/litellm/pull/29838) - refactor(ui): route query-building networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29815](https://github.com/BerriAI/litellm/pull/29815) - Make the image-gen record/replay proxy report cache mode and per-request HIT/MISS by [@mateo-berri](https://github.com/mateo-berri) in [#29802](https://github.com/BerriAI/litellm/pull/29802) - feat(proxy): hot-reload .env in dev when running with --reload by [@mateo-berri](https://github.com/mateo-berri) in [#29783](https://github.com/BerriAI/litellm/pull/29783) - fix(ui): stop MCP playground tool calls from sending twice by [@tin-berri](https://github.com/tin-berri) in [#29821](https://github.com/BerriAI/litellm/pull/29821) - feat(fal\_ai): add Nano Banana / Gemini 2.5 Flash Image generation support by [@mateo-berri](https://github.com/mateo-berri) in [#29798](https://github.com/BerriAI/litellm/pull/29798) - Title: Fix managed batch cancel credential resolution by [@shivamrawat1](https://github.com/shivamrawat1) in [#29734](https://github.com/BerriAI/litellm/pull/29734) - Title: fix(proxy): resolve vector store file list credentials from team deployments by [@shivamrawat1](https://github.com/shivamrawat1) in [#29739](https://github.com/BerriAI/litellm/pull/29739) - refactor: convert AWS and GCP Terraform stacks into reusable modules … by [@yassin-berriai](https://github.com/yassin-berriai) in [#28103](https://github.com/BerriAI/litellm/pull/28103) - chore(ui): build ui for release by [@yuneng-berri](https://github.com/yuneng-berri) in [#29853](https://github.com/BerriAI/litellm/pull/29853) - fix(terraform/gcp): prompt for image\_registry in DeployStack one-click by [@yassin-berriai](https://github.com/yassin-berriai) in [#29852](https://github.com/BerriAI/litellm/pull/29852) - fix(terraform/gcp): abandon SQL user on destroy by [@yassin-berriai](https://github.com/yassin-berriai) in [#29855](https://github.com/BerriAI/litellm/pull/29855) - Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic by [@mateo-berri](https://github.com/mateo-berri) in [#29847](https://github.com/BerriAI/litellm/pull/29847) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#29860](https://github.com/BerriAI/litellm/pull/29860) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29861](https://github.com/BerriAI/litellm/pull/29861) - fix: 400 on Anthropic context overflow; seed identity on failed auth by [@yassin-berriai](https://github.com/yassin-berriai) in [#29848](https://github.com/BerriAI/litellm/pull/29848) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29862](https://github.com/BerriAI/litellm/pull/29862) - chore(release): patch v1.89.0-rc.1 with [#30064](https://github.com/BerriAI/litellm/issues/30064) (Claude Fable 5) for v1.89.0-rc.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30143](https://github.com/BerriAI/litellm/pull/30143) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.89.0> ### [`v1.88.2`](https://github.com/BerriAI/litellm/releases/tag/v1.88.2) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.2...v1.88.2) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.2/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - chore(release): backport Fable 5, batch-file auth, CrowdStrike AIDR, Mantle Responses SigV4, and NetApp streaming-cost fix to stable/1.88.x and cut 1.88.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30144](https://github.com/BerriAI/litellm/pull/30144) - chore(release): backport DB-resilience, passthrough, model-info, budget, and deps fixes to stable/1.88.x by [@yuneng-berri](https://github.com/yuneng-berri) in [#30408](https://github.com/BerriAI/litellm/pull/30408) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2> ### [`v1.88.2`](https://github.com/BerriAI/litellm/releases/tag/v1.88.2) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.2/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - chore(release): backport Fable 5, batch-file auth, CrowdStrike AIDR, Mantle Responses SigV4, and NetApp streaming-cost fix to stable/1.88.x and cut 1.88.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30144](https://github.com/BerriAI/litellm/pull/30144) - chore(release): backport DB-resilience, passthrough, model-info, budget, and deps fixes to stable/1.88.x by [@yuneng-berri](https://github.com/yuneng-berri) in [#30408](https://github.com/BerriAI/litellm/pull/30408) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2> </details> --- ### Configuration 📅 **Schedule**: (in timezone Europe/London) - Branch creation - At any time (no schedule defined) - Automerge - At any time (no schedule defined) 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate).  Reviewed-on: https://forgejo.hayden.moe/hayden/phoebe/pulls/93

…to v1.89.0 (#200) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [https://github.com/BerriAI/litellm.git](https://github.com/BerriAI/litellm) | minor | `v1.85.1` → `v1.89.0` | --- > ⚠️ **Warning** > > Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/155) for more information. --- ### Release Notes <details> <summary>BerriAI/litellm (https://github.com/BerriAI/litellm.git)</summary> ### [`v1.89.0`](https://github.com/BerriAI/litellm/releases/tag/v1.89.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.2...v1.89.0) #### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.89.0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** #### What's Changed - test(responses): bump deprecated gemini-3-pro-preview to gemini-3.1-pro-preview by [@mateo-berri](https://github.com/mateo-berri) in [#29433](https://github.com/BerriAI/litellm/pull/29433) - fix: map mistral/ministral-8b-latest in model price map by [@mateo-berri](https://github.com/mateo-berri) in [#29453](https://github.com/BerriAI/litellm/pull/29453) - fix(datadog): split oversized batches on 413 instead of re-queueing forever by [@yassin-berriai](https://github.com/yassin-berriai) in [#29444](https://github.com/BerriAI/litellm/pull/29444) - feat(otel): allowlist team\_metadata sub-keys promoted to baggage by [@yassin-berriai](https://github.com/yassin-berriai) in [#29442](https://github.com/BerriAI/litellm/pull/29442) - fix: stop use\_chat\_completions\_api flag from leaking into provider request body by [@mateo-berri](https://github.com/mateo-berri) in [#29447](https://github.com/BerriAI/litellm/pull/29447) - fix(anthropic, fireworks): inline legacy $ref defs in tool schemas by [@milan-berri](https://github.com/milan-berri) in [#28646](https://github.com/BerriAI/litellm/pull/28646) - fix(proxy): omit OpenAI \[DONE] on google-genai streamGenerateContent by [@Sameerlite](https://github.com/Sameerlite) in [#29426](https://github.com/BerriAI/litellm/pull/29426) - ci(release): create stable/X.Y.x line branch on X.Y.0 tags by [@yuneng-berri](https://github.com/yuneng-berri) in [#29457](https://github.com/BerriAI/litellm/pull/29457) - fix(vector-stores): support engines URL for Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#27885](https://github.com/BerriAI/litellm/pull/27885) - fix(ui): render caller-supplied filter options in caller order by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29462](https://github.com/BerriAI/litellm/pull/29462) - fix(batches): skip unnecessary batch input file reads by [@Sameerlite](https://github.com/Sameerlite) in [#29114](https://github.com/BerriAI/litellm/pull/29114) - docs(agents): clarify when to create new test files by [@Sameerlite](https://github.com/Sameerlite) in [#29472](https://github.com/BerriAI/litellm/pull/29472) - Litellm OSS Staging by [@Sameerlite](https://github.com/Sameerlite) in [#29161](https://github.com/BerriAI/litellm/pull/29161) - fix(mcp): clear allowed\_tools and tool overrides on MCP server edit by [@Sameerlite](https://github.com/Sameerlite) in [#29411](https://github.com/BerriAI/litellm/pull/29411) - Litellm OSS Staging 010626 by [@Sameerlite](https://github.com/Sameerlite) in [#29422](https://github.com/BerriAI/litellm/pull/29422) - fix(ci): make CircleCI rerun-failed-tests collect tests when 2+ test files fail by [@mateo-berri](https://github.com/mateo-berri) in [#29475](https://github.com/BerriAI/litellm/pull/29475) - feat(a2a): watsonx Orchestrate agent provider by [@Sameerlite](https://github.com/Sameerlite) in [#29410](https://github.com/BerriAI/litellm/pull/29410) - fix(azure\_ai): strip tool-level extra fields on 400 and retry by [@Sameerlite](https://github.com/Sameerlite) in [#29479](https://github.com/BerriAI/litellm/pull/29479) - fix(docs): remove fixed dimensions from README hero image by [@mateo-berri](https://github.com/mateo-berri) in [#29496](https://github.com/BerriAI/litellm/pull/29496) - Litellm oss staging by [@Sameerlite](https://github.com/Sameerlite) in [#29492](https://github.com/BerriAI/litellm/pull/29492) - fix: small CLAUDE.md nits by [@mateo-berri](https://github.com/mateo-berri) in [#29504](https://github.com/BerriAI/litellm/pull/29504) - Add MCP semantic conventions to otelv2 by [@yassin-berriai](https://github.com/yassin-berriai) in [#29468](https://github.com/BerriAI/litellm/pull/29468) - fix(passthrough): emit otel guardrail span when a guardrail blocks by [@yassin-berriai](https://github.com/yassin-berriai) in [#29470](https://github.com/BerriAI/litellm/pull/29470) - fix(proxy): strip NUL bytes from spend log payloads to prevent PostgreSQL 22P05 by [@milan-berri](https://github.com/milan-berri) in [#29515](https://github.com/BerriAI/litellm/pull/29515) - \[internal copy of [#28008](https://github.com/BerriAI/litellm/issues/28008)] Support MCP OAuth passthrough and issuer-scoped JWT auth by [@mateo-berri](https://github.com/mateo-berri) in [#28356](https://github.com/BerriAI/litellm/pull/28356) - feat(vector-stores): forward per-request params to Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29459](https://github.com/BerriAI/litellm/pull/29459) - feat(proxy): add per-MCP-server RPM rate limiting for keys and teams by [@Sameerlite](https://github.com/Sameerlite) in [#29482](https://github.com/BerriAI/litellm/pull/29482) - fix(tests): drop module-level test calls that break local\_testing collection by [@mateo-berri](https://github.com/mateo-berri) in [#29520](https://github.com/BerriAI/litellm/pull/29520) - feat(agents): add LangFlow agent provider with A2A session bridging by [@Sameerlite](https://github.com/Sameerlite) in [#28963](https://github.com/BerriAI/litellm/pull/28963) - fix(ui/agents): make A2A skill tags enterable and validated by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29512](https://github.com/BerriAI/litellm/pull/29512) - \[internal copy of [#29232](https://github.com/BerriAI/litellm/issues/29232)] feat: route future Claude models to Anthropic provider via pattern matching by [@mateo-berri](https://github.com/mateo-berri) in [#29239](https://github.com/BerriAI/litellm/pull/29239) - fix(tests): drop import-time completion call in test\_register\_model by [@mateo-berri](https://github.com/mateo-berri) in [#29521](https://github.com/BerriAI/litellm/pull/29521) - test: stabilize batch VCR coverage and stop live upload/network leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29477](https://github.com/BerriAI/litellm/pull/29477) - \[internal copy of [#29003](https://github.com/BerriAI/litellm/issues/29003)] fix(vertex\_ai): use user-supplied api\_base as is for Model Garden OpenAI-compat path by [@mateo-berri](https://github.com/mateo-berri) in [#29530](https://github.com/BerriAI/litellm/pull/29530) - feat(proxy): native /health/drain preStop hook for graceful shutdown by [@yassin-berriai](https://github.com/yassin-berriai) in [#29439](https://github.com/BerriAI/litellm/pull/29439) - fix(auth): preserve 401 status for expired JWTs in OTel traces by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29510](https://github.com/BerriAI/litellm/pull/29510) - fix(otel): capture 401 error details in management endpoint spans by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29535](https://github.com/BerriAI/litellm/pull/29535) - test(proxy/utils): pin bottom-of-file helper behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29509](https://github.com/BerriAI/litellm/pull/29509) - test(proxy/utils): pin PrismaClient and spend-update behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29488](https://github.com/BerriAI/litellm/pull/29488) - test(proxy/utils): pin ProxyLogging behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29485](https://github.com/BerriAI/litellm/pull/29485) - fix: missing span for guardrail passthrough by [@yassin-berriai](https://github.com/yassin-berriai) in [#29552](https://github.com/BerriAI/litellm/pull/29552) - fix(auth): let internal users view search tools by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29542](https://github.com/BerriAI/litellm/pull/29542) - fix: missing mcp otel attributes by [@yassin-berriai](https://github.com/yassin-berriai) in [#29554](https://github.com/BerriAI/litellm/pull/29554) - fix(proxy): resolve managed video model ids for auth by [@shivamrawat1](https://github.com/shivamrawat1) in [#29545](https://github.com/BerriAI/litellm/pull/29545) - fix(key\_generate): allow team members to create keys on org-scoped teams by [@milan-berri](https://github.com/milan-berri) in [#29310](https://github.com/BerriAI/litellm/pull/29310) - test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite by [@mateo-berri](https://github.com/mateo-berri) in [#29595](https://github.com/BerriAI/litellm/pull/29595) - Litellm oss staging 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29578](https://github.com/BerriAI/litellm/pull/29578) - Fix : a2a bugs 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29566](https://github.com/BerriAI/litellm/pull/29566) - \[internal copy of [#29533](https://github.com/BerriAI/litellm/issues/29533)] fix(anthropic/adapter): emit thinking block for reasoning\_content-only streaming chunks by [@mateo-berri](https://github.com/mateo-berri) in [#29600](https://github.com/BerriAI/litellm/pull/29600) - ci: reproduce default-Windows wheel install to guard MAX\_PATH by [@yuneng-berri](https://github.com/yuneng-berri) in [#29597](https://github.com/BerriAI/litellm/pull/29597) - fix(vertex): strip output\_config.effort for Vertex Claude models that reject it (Haiku 4.5) by [@mateo-berri](https://github.com/mateo-berri) in [#29585](https://github.com/BerriAI/litellm/pull/29585) - Litellm websocket improvements by [@Sameerlite](https://github.com/Sameerlite) in [#29563](https://github.com/BerriAI/litellm/pull/29563) - feat(arize/phoenix): OpenInference rendering parity — tool\_calls, cost, passthrough I/O, session/user, multimodal, cache tokens by [@milan-berri](https://github.com/milan-berri) in [#28800](https://github.com/BerriAI/litellm/pull/28800) - \[internal copy of [#29550](https://github.com/BerriAI/litellm/issues/29550)] fix: passthrough endpoints duplicate logs by [@mateo-berri](https://github.com/mateo-berri) in [#29598](https://github.com/BerriAI/litellm/pull/29598) - fix(ci): keep coverage rename green when a parallel node runs no tests by [@mateo-berri](https://github.com/mateo-berri) in [#29608](https://github.com/BerriAI/litellm/pull/29608) - test(vcr): close out the remaining VCR live-call leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29603](https://github.com/BerriAI/litellm/pull/29603) - fix(key\_generate): exempt UI/CLI session tokens from the budget ceiling for team keys by [@yuneng-berri](https://github.com/yuneng-berri) in [#29612](https://github.com/BerriAI/litellm/pull/29612) - fix(realtime): allow null transcripts in stream logging payloads by [@milan-berri](https://github.com/milan-berri) in [#29625](https://github.com/BerriAI/litellm/pull/29625) - build(ui): migrate eslint to flat config + bump eslint-config-next to 16 by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29626](https://github.com/BerriAI/litellm/pull/29626) - fix(key\_generate): scope session-token team-key budget exemption to caller-supplied team\_id by [@yuneng-berri](https://github.com/yuneng-berri) in [#29641](https://github.com/BerriAI/litellm/pull/29641) - fix(proxy): disable proxy buffering on streaming SSE responses by [@mateo-berri](https://github.com/mateo-berri) in [#29557](https://github.com/BerriAI/litellm/pull/29557) - fix(mcp): gate /public/mcp\_hub strictly on litellm.public\_mcp\_servers by [@michelligabriele](https://github.com/michelligabriele) in [#27764](https://github.com/BerriAI/litellm/pull/27764) - ci(ui): frontend-lint job enforcing prettier + eslint on changed files by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29633](https://github.com/BerriAI/litellm/pull/29633) - fix(gemini): googleSearch + server-side tools and googleMaps JSON schema by [@Sameerlite](https://github.com/Sameerlite) in [#29582](https://github.com/BerriAI/litellm/pull/29582) - fix(proxy): passthrough 404 when SERVER\_ROOT\_PATH is set by [@Sameerlite](https://github.com/Sameerlite) in [#29658](https://github.com/BerriAI/litellm/pull/29658) - fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29662](https://github.com/BerriAI/litellm/pull/29662) - Litellm oss staging 040626 by [@Sameerlite](https://github.com/Sameerlite) in [#29671](https://github.com/BerriAI/litellm/pull/29671) - style(ui): prettier formatting pass over the dashboard by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29622](https://github.com/BerriAI/litellm/pull/29622) - chore: ignore prettier dashboard reformat in git blame by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29695](https://github.com/BerriAI/litellm/pull/29695) - fix(helm): Enable Backend Deployment to mount Gateway config.yaml by [@tin-berri](https://github.com/tin-berri) in [#29605](https://github.com/BerriAI/litellm/pull/29605) - \[internal copy of [#29277](https://github.com/BerriAI/litellm/issues/29277)] fix(proxy): add default=None to LiteLLM\_TeamMembership.litellm\_budget\_table by [@mateo-berri](https://github.com/mateo-berri) in [#29684](https://github.com/BerriAI/litellm/pull/29684) - test: make custom\_tokenizer proxy tests hermetic by [@yuneng-berri](https://github.com/yuneng-berri) in [#29643](https://github.com/BerriAI/litellm/pull/29643) - test(proxy): stop running real-DB tests in GitHub Actions unit jobs by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29700](https://github.com/BerriAI/litellm/pull/29700) - chore(ui): remove the bare-fetch lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29712](https://github.com/BerriAI/litellm/pull/29712) - Litellm jwt mapping virtualkeys by [@shivamrawat1](https://github.com/shivamrawat1) in [#28510](https://github.com/BerriAI/litellm/pull/28510) - refactor(ui): shared HTTP client + location-pinned fetch() lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29723](https://github.com/BerriAI/litellm/pull/29723) - fix(proxy): stop team BYOK model name corruption on model edit by [@yuneng-berri](https://github.com/yuneng-berri) in [#29731](https://github.com/BerriAI/litellm/pull/29731) - \[internal copy of [#29511](https://github.com/BerriAI/litellm/issues/29511)] feat(guardrails): add sensitive data routing to on-premise models by [@mateo-berri](https://github.com/mateo-berri) in [#29531](https://github.com/BerriAI/litellm/pull/29531) - fix(proxy/hooks): populate llm\_provider on internal rate-limit errors by [@mateo-berri](https://github.com/mateo-berri) in [#27707](https://github.com/BerriAI/litellm/pull/27707) - fix(vertex/anthropic): handle namespace tools and strip client\_metadata for codex compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29489](https://github.com/BerriAI/litellm/pull/29489) - Support OAuth M2M for Databricks Apps A2A agents by [@mateo-berri](https://github.com/mateo-berri) in [#29586](https://github.com/BerriAI/litellm/pull/29586) - fix: small CLAUDE.md nit by [@mateo-berri](https://github.com/mateo-berri) in [#29749](https://github.com/BerriAI/litellm/pull/29749) - fix(anthropic): route Claude Opus 4.8 through adaptive thinking by [@mateo-berri](https://github.com/mateo-berri) in [#29702](https://github.com/BerriAI/litellm/pull/29702) - fix(proxy): persist oauth2\_flow on MCP server registration by [@michelligabriele](https://github.com/michelligabriele) in [#29690](https://github.com/BerriAI/litellm/pull/29690) - \[internal copy of [#27491](https://github.com/BerriAI/litellm/issues/27491)] fix(realtime): Fix Realtime Audio Token Cost Tracking by [@mateo-berri](https://github.com/mateo-berri) in [#29722](https://github.com/BerriAI/litellm/pull/29722) - fix(galileo): use ingest traces API and standard logging payload by [@Sameerlite](https://github.com/Sameerlite) in [#29651](https://github.com/BerriAI/litellm/pull/29651) - fix(auth): expand all-team-models sentinel in can\_key\_call\_model for batch validation by [@Sameerlite](https://github.com/Sameerlite) in [#29746](https://github.com/BerriAI/litellm/pull/29746) - test(vcr): stop refreshing cassette TTL on read so cassettes lapse after 24h by [@mateo-berri](https://github.com/mateo-berri) in [#29784](https://github.com/BerriAI/litellm/pull/29784) - test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound by [@mateo-berri](https://github.com/mateo-berri) in [#29787](https://github.com/BerriAI/litellm/pull/29787) - fix(ui): route MCP playground auth by oauth2 mode instead of token\_url by [@tin-berri](https://github.com/tin-berri) in [#29714](https://github.com/BerriAI/litellm/pull/29714) - refactor(ui): centralize proxy base URL resolution into tested resolver by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29793](https://github.com/BerriAI/litellm/pull/29793) - Litellm oss staging 050626 by [@Sameerlite](https://github.com/Sameerlite) in [#29774](https://github.com/BerriAI/litellm/pull/29774) - test(google): add google-genai SDK proxy integration tests by [@Sameerlite](https://github.com/Sameerlite) in [#29781](https://github.com/BerriAI/litellm/pull/29781) - fix(jwt): use resolved DB user\_id for spend on legacy email match by [@milan-berri](https://github.com/milan-berri) in [#29217](https://github.com/BerriAI/litellm/pull/29217) - feat(ui): generate dashboard API types from the proxy OpenAPI spec by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29816](https://github.com/BerriAI/litellm/pull/29816) - fix(proxy): drop deleted team BYOK model name from team.models by [@yuneng-berri](https://github.com/yuneng-berri) in [#29820](https://github.com/BerriAI/litellm/pull/29820) - feat(mcp): per-server env vars with global + per-user scopes by [@mateo-berri](https://github.com/mateo-berri) in [#28917](https://github.com/BerriAI/litellm/pull/28917) - refactor(ui): route behavior-preserving networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29806](https://github.com/BerriAI/litellm/pull/29806) - fix(mcp): persist Tools-tab MCP OAuth token to DB by [@tin-berri](https://github.com/tin-berri) in [#29809](https://github.com/BerriAI/litellm/pull/29809) - fix(ui): require new expiration when regenerating an expired key by [@milan-berri](https://github.com/milan-berri) in [#29838](https://github.com/BerriAI/litellm/pull/29838) - refactor(ui): route query-building networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29815](https://github.com/BerriAI/litellm/pull/29815) - Make the image-gen record/replay proxy report cache mode and per-request HIT/MISS by [@mateo-berri](https://github.com/mateo-berri) in [#29802](https://github.com/BerriAI/litellm/pull/29802) - feat(proxy): hot-reload .env in dev when running with --reload by [@mateo-berri](https://github.com/mateo-berri) in [#29783](https://github.com/BerriAI/litellm/pull/29783) - fix(ui): stop MCP playground tool calls from sending twice by [@tin-berri](https://github.com/tin-berri) in [#29821](https://github.com/BerriAI/litellm/pull/29821) - feat(fal\_ai): add Nano Banana / Gemini 2.5 Flash Image generation support by [@mateo-berri](https://github.com/mateo-berri) in [#29798](https://github.com/BerriAI/litellm/pull/29798) - Title: Fix managed batch cancel credential resolution by [@shivamrawat1](https://github.com/shivamrawat1) in [#29734](https://github.com/BerriAI/litellm/pull/29734) - Title: fix(proxy): resolve vector store file list credentials from team deployments by [@shivamrawat1](https://github.com/shivamrawat1) in [#29739](https://github.com/BerriAI/litellm/pull/29739) - refactor: convert AWS and GCP Terraform stacks into reusable modules … by [@yassin-berriai](https://github.com/yassin-berriai) in [#28103](https://github.com/BerriAI/litellm/pull/28103) - chore(ui): build ui for release by [@yuneng-berri](https://github.com/yuneng-berri) in [#29853](https://github.com/BerriAI/litellm/pull/29853) - fix(terraform/gcp): prompt for image\_registry in DeployStack one-click by [@yassin-berriai](https://github.com/yassin-berriai) in [#29852](https://github.com/BerriAI/litellm/pull/29852) - fix(terraform/gcp): abandon SQL user on destroy by [@yassin-berriai](https://github.com/yassin-berriai) in [#29855](https://github.com/BerriAI/litellm/pull/29855) - Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic by [@mateo-berri](https://github.com/mateo-berri) in [#29847](https://github.com/BerriAI/litellm/pull/29847) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#29860](https://github.com/BerriAI/litellm/pull/29860) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29861](https://github.com/BerriAI/litellm/pull/29861) - fix: 400 on Anthropic context overflow; seed identity on failed auth by [@yassin-berriai](https://github.com/yassin-berriai) in [#29848](https://github.com/BerriAI/litellm/pull/29848) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29862](https://github.com/BerriAI/litellm/pull/29862) - chore(release): patch v1.89.0-rc.1 with [#30064](https://github.com/BerriAI/litellm/issues/30064) (Claude Fable 5) for v1.89.0-rc.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30143](https://github.com/BerriAI/litellm/pull/30143) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.89.0> ### [`v1.88.2`](https://github.com/BerriAI/litellm/releases/tag/v1.88.2) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2) #### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.2/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** #### What's Changed - chore(release): backport Fable 5, batch-file auth, CrowdStrike AIDR, Mantle Responses SigV4, and NetApp streaming-cost fix to stable/1.88.x and cut 1.88.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30144](https://github.com/BerriAI/litellm/pull/30144) - chore(release): backport DB-resilience, passthrough, model-info, budget, and deps fixes to stable/1.88.x by [@yuneng-berri](https://github.com/yuneng-berri) in [#30408](https://github.com/BerriAI/litellm/pull/30408) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2> ### [`v1.88.1`](https://github.com/BerriAI/litellm/releases/tag/v1.88.1) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.0...v1.88.1) #### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.1 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.1/cosign.pub \ ghcr.io/berriai/litellm:v1.88.1 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** #### What's Changed - build(deps): bump pyjwt to 2.13.0 and ws override to 8.20.1 (1.88.x) by [@yuneng-berri](https://github.com/yuneng-berri) in [#29987](https://github.com/BerriAI/litellm/pull/29987) - chore(release): bump version to 1.88.1 by [@yuneng-berri](https://github.com/yuneng-berri) in [#29989](https://github.com/BerriAI/litellm/pull/29989) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.88.1> ### [`v1.88.0`](https://github.com/BerriAI/litellm/releases/tag/v1.88.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.87.3...v1.88.0) #### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** #### What's Changed - fix(proxy): gate team allowed\_passthrough\_routes to proxy admins by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28097](https://github.com/BerriAI/litellm/pull/28097) - fix(tests): stabilize image-edit VCR cassettes to stop live gpt-image-1 spend by [@mateo-berri](https://github.com/mateo-berri) in [#28110](https://github.com/BerriAI/litellm/pull/28110) - fix(bedrock/cohere): send embedding\_types as JSON array, not string by [@ishaan-berri](https://github.com/ishaan-berri) in [#28172](https://github.com/BerriAI/litellm/pull/28172) - fix(tests): migrate realtime + rerank tests off shut-down upstream models by [@yuneng-berri](https://github.com/yuneng-berri) in [#28191](https://github.com/BerriAI/litellm/pull/28191) - fix(caching): replay openai/responses bridge cache hits as chat streams by [@Sameerlite](https://github.com/Sameerlite) in [#28158](https://github.com/BerriAI/litellm/pull/28158) - Litellm oss staging by [@Sameerlite](https://github.com/Sameerlite) in [#28161](https://github.com/BerriAI/litellm/pull/28161) - feat(prometheus): add user\_email and user\_alias to user budget metrics by [@Sameerlite](https://github.com/Sameerlite) in [#28155](https://github.com/BerriAI/litellm/pull/28155) - test(callbacks): harden flaky proxy callback-leak detector by [@yuneng-berri](https://github.com/yuneng-berri) in [#28195](https://github.com/BerriAI/litellm/pull/28195) - fix(bedrock): sanitize batch metadata to prevent Pydantic ValidationError by [@mateo-berri](https://github.com/mateo-berri) in [#28202](https://github.com/BerriAI/litellm/pull/28202) - fix(deepseek): use native /anthropic/v1/messages endpoint and sanitize tools by [@mateo-berri](https://github.com/mateo-berri) in [#28200](https://github.com/BerriAI/litellm/pull/28200) - feat(ui): add Interactions API endpoint to playground with SSE streaming by [@Sameerlite](https://github.com/Sameerlite) in [#28156](https://github.com/BerriAI/litellm/pull/28156) - fix(proxy): decode bytes and pass-through SSE for Google-native streamGenerateContent ([#27444](https://github.com/BerriAI/litellm/issues/27444)) by [@Sameerlite](https://github.com/Sameerlite) in [#28213](https://github.com/BerriAI/litellm/pull/28213) - refactor(bedrock/sagemaker): switch to lazy loading for response stre… by [@harish-berri](https://github.com/harish-berri) in [#28189](https://github.com/BerriAI/litellm/pull/28189) - \[Refactor] UI - Spend Logs: consolidate filter state and extract components by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#25847](https://github.com/BerriAI/litellm/pull/25847) - fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 by [@yuneng-berri](https://github.com/yuneng-berri) in [#28281](https://github.com/BerriAI/litellm/pull/28281) - chore(ci): bump versions by [@yuneng-berri](https://github.com/yuneng-berri) in [#28287](https://github.com/BerriAI/litellm/pull/28287) - feat: propagate team\_id and team\_alias to all child OTEL spans by [@yassin-berriai](https://github.com/yassin-berriai) in [#28273](https://github.com/BerriAI/litellm/pull/28273) - Day 0 support : Gemini 3.5 Flash by [@Sameerlite](https://github.com/Sameerlite) in [#28268](https://github.com/BerriAI/litellm/pull/28268) - Gemini managed agents support by [@Sameerlite](https://github.com/Sameerlite) in [#28270](https://github.com/BerriAI/litellm/pull/28270) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#28292](https://github.com/BerriAI/litellm/pull/28292) - feat(gemini): add gemini-3.1-flash-lite model cost map by [@Sameerlite](https://github.com/Sameerlite) in [#28320](https://github.com/BerriAI/litellm/pull/28320) - fix(spend\_counter): seed Redis counter via SET NX to prevent cross-pod double-seed by [@milan-berri](https://github.com/milan-berri) in [#27854](https://github.com/BerriAI/litellm/pull/27854) - fix(proxy): normalize batch file IDs before ManagedObjectTable write by [@Sameerlite](https://github.com/Sameerlite) in [#28339](https://github.com/BerriAI/litellm/pull/28339) - fix(router): use forwarded model\_id for native Azure container IDs by [@Sameerlite](https://github.com/Sameerlite) in [#27921](https://github.com/BerriAI/litellm/pull/27921) - fix(ui): restore log filter loading indicator by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28282](https://github.com/BerriAI/litellm/pull/28282) - test(e2e): migrate runner to uv, add All Proxy Models key test by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28313](https://github.com/BerriAI/litellm/pull/28313) - feat(ui): team passthrough routes create parity + edit load fix by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28098](https://github.com/BerriAI/litellm/pull/28098) - fix(mcp): JWT on tools/list and REST tools/call server resolution by [@Sameerlite](https://github.com/Sameerlite) in [#28227](https://github.com/BerriAI/litellm/pull/28227) - feat(interactions): migrate to Google Interactions API steps schema (May 2026) by [@Sameerlite](https://github.com/Sameerlite) in [#28153](https://github.com/BerriAI/litellm/pull/28153) - test(ui-e2e): admin key creation with a specific proxy model by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28365](https://github.com/BerriAI/litellm/pull/28365) - fix(vertex\_ai): omit function\_call id on Vertex Gemini 3.5+ tool turns by [@Sameerlite](https://github.com/Sameerlite) in [#28324](https://github.com/BerriAI/litellm/pull/28324) - feat(mcp): allow native MCP OAuth support for cursor by [@Sameerlite](https://github.com/Sameerlite) in [#28327](https://github.com/BerriAI/litellm/pull/28327) - fix(interactions): never drop streamed text deltas; always emit terminal completion by [@mateo-berri](https://github.com/mateo-berri) in [#28394](https://github.com/BerriAI/litellm/pull/28394) - fix(proxy): expose Prisma idle/connect timeout + extra DB URL params by [@yassin-berriai](https://github.com/yassin-berriai) in [#28395](https://github.com/BerriAI/litellm/pull/28395) - Litellm oss staging 1 by [@Sameerlite](https://github.com/Sameerlite) in [#28337](https://github.com/BerriAI/litellm/pull/28337) - fix: serialize guardrail\_response to JSON in OTEL traces by [@yassin-berriai](https://github.com/yassin-berriai) in [#28362](https://github.com/BerriAI/litellm/pull/28362) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28314](https://github.com/BerriAI/litellm/pull/28314) - test(realtime): expect session.created as xAI realtime initial event by [@yuneng-berri](https://github.com/yuneng-berri) in [#28424](https://github.com/BerriAI/litellm/pull/28424) - feat(tests): behavior-pinning harness + Key Tier-1 matrix by [@yuneng-berri](https://github.com/yuneng-berri) in [#28321](https://github.com/BerriAI/litellm/pull/28321) - fix(proxy): hydrate wildcard discovery credentials ([#28284](https://github.com/BerriAI/litellm/issues/28284)) - CCI Run by [@yuneng-berri](https://github.com/yuneng-berri) in [#28419](https://github.com/BerriAI/litellm/pull/28419) - Litellm oss staging 04 21 2026 2 by [@Sameerlite](https://github.com/Sameerlite) in [#26569](https://github.com/BerriAI/litellm/pull/26569) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28290](https://github.com/BerriAI/litellm/pull/28290) - fix(vertex\_gemma): strip `context_management` from request body by [@mateo-berri](https://github.com/mateo-berri) in [#28438](https://github.com/BerriAI/litellm/pull/28438) - fix(logging): recalculate cost after router retry failures by [@milan-berri](https://github.com/milan-berri) in [#28476](https://github.com/BerriAI/litellm/pull/28476) - fix(otel): emit guardrail span on violation, surface status + categories by [@yassin-berriai](https://github.com/yassin-berriai) in [#28364](https://github.com/BerriAI/litellm/pull/28364) - test(proxy): behavior-pinning matrix for team management endpoints by [@yuneng-berri](https://github.com/yuneng-berri) in [#28441](https://github.com/BerriAI/litellm/pull/28441) - test(vertex\_ai): tolerate transient 500 in google maps grounding test by [@yuneng-berri](https://github.com/yuneng-berri) in [#28503](https://github.com/BerriAI/litellm/pull/28503) - fix(docker): restore npm to non\_root builder image by [@yuneng-berri](https://github.com/yuneng-berri) in [#28519](https://github.com/BerriAI/litellm/pull/28519) - chore(ci): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#28524](https://github.com/BerriAI/litellm/pull/28524) - build(deps-dev): bump black to 26.3.1 and apply formatting by [@yuneng-berri](https://github.com/yuneng-berri) in [#28525](https://github.com/BerriAI/litellm/pull/28525) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#28528](https://github.com/BerriAI/litellm/pull/28528) - test(e2e): forward LITELLM\_LICENSE to UI e2e proxy by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28398](https://github.com/BerriAI/litellm/pull/28398) - Add granian as a ASGI compliant web server. Provider better throughput stability, by [@harish-berri](https://github.com/harish-berri) in [#26027](https://github.com/BerriAI/litellm/pull/26027) - Fix conflicts and UI by [@Sameerlite](https://github.com/Sameerlite) in [#28477](https://github.com/BerriAI/litellm/pull/28477) - Add error\_description and hint for oauth flows by [@Sameerlite](https://github.com/Sameerlite) in [#28471](https://github.com/BerriAI/litellm/pull/28471) - feat(mcp): Add tool call and tool list support via UI for Oauth mcps by [@Sameerlite](https://github.com/Sameerlite) in [#28454](https://github.com/BerriAI/litellm/pull/28454) - feat(proxy): persist allowlisted OIDC claims in CLI SSO poll by [@Sameerlite](https://github.com/Sameerlite) in [#28463](https://github.com/BerriAI/litellm/pull/28463) - fix(responses): use OpenAI SSEDecoder for Responses API streaming by [@Sameerlite](https://github.com/Sameerlite) in [#28566](https://github.com/BerriAI/litellm/pull/28566) - Litellm oss staging 2 by [@Sameerlite](https://github.com/Sameerlite) in [#28582](https://github.com/BerriAI/litellm/pull/28582) - \[internal copy of [#28269](https://github.com/BerriAI/litellm/issues/28269)] Codex cli jwt team alias by [@mateo-berri](https://github.com/mateo-berri) in [#28621](https://github.com/BerriAI/litellm/pull/28621) - fix(check\_licenses): read PEP 639 license-expression metadata by [@yuneng-berri](https://github.com/yuneng-berri) in [#28529](https://github.com/BerriAI/litellm/pull/28529) - test(proxy): behavior-pinning matrix for tier-2/3 key + team management endpoints by [@yuneng-berri](https://github.com/yuneng-berri) in [#28620](https://github.com/BerriAI/litellm/pull/28620) - chore(test): remove dead old Playwright e2e suite by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28632](https://github.com/BerriAI/litellm/pull/28632) - fix(sagemaker): send native Cohere embed payload to Cohere SageMaker endpoints by [@milan-berri](https://github.com/milan-berri) in [#28613](https://github.com/BerriAI/litellm/pull/28613) - style: apply black formatting to fix lint CI (LIT-3274) ([#28639](https://github.com/BerriAI/litellm/issues/28639)) by [@krrish-berri-2](https://github.com/krrish-berri-2) in [#28641](https://github.com/BerriAI/litellm/pull/28641) - fix(bedrock): decouple STS region from Bedrock aws\_region\_name by [@milan-berri](https://github.com/milan-berri) in [#28245](https://github.com/BerriAI/litellm/pull/28245) - test(streaming): tolerate Vertex 429 wrapped in MidStreamFallbackError by [@yuneng-berri](https://github.com/yuneng-berri) in [#28669](https://github.com/BerriAI/litellm/pull/28669) - feat(guardrails): add Microsoft Purview DLP guardrail by [@Sameerlite](https://github.com/Sameerlite) in [#24966](https://github.com/BerriAI/litellm/pull/24966) - fix(mcp): forward upstream initialize instructions on cold gateway init by [@milan-berri](https://github.com/milan-berri) in [#28231](https://github.com/BerriAI/litellm/pull/28231) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#28680](https://github.com/BerriAI/litellm/pull/28680) - CI: copy of [#25177](https://github.com/BerriAI/litellm/issues/25177) (OCI GenAI: embeddings, streaming/reasoning fixes, model catalog) by [@mateo-berri](https://github.com/mateo-berri) in [#28223](https://github.com/BerriAI/litellm/pull/28223) - Encrypt callback\_vars in key/team metadata in DB by [@Michael-RZ-Berri](https://github.com/Michael-RZ-Berri) in [#27141](https://github.com/BerriAI/litellm/pull/27141) - perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths by [@yassin-berriai](https://github.com/yassin-berriai) in [#28289](https://github.com/BerriAI/litellm/pull/28289) - feat(azure): add Speech STT config support by [@ishaan-berri](https://github.com/ishaan-berri) in [#27482](https://github.com/BerriAI/litellm/pull/27482) - test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints by [@yuneng-berri](https://github.com/yuneng-berri) in [#28681](https://github.com/BerriAI/litellm/pull/28681) - feat(prometheus): emit per-token-type detail metrics (LIT-3220) ([#28372](https://github.com/BerriAI/litellm/issues/28372)) by [@ishaan-berri](https://github.com/ishaan-berri) in [#28378](https://github.com/BerriAI/litellm/pull/28378) - fix(otel): stamp http.response.status\_code on all error responses by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28405](https://github.com/BerriAI/litellm/pull/28405) - chore(ui): build ui by [@yuneng-berri](https://github.com/yuneng-berri) in [#28707](https://github.com/BerriAI/litellm/pull/28707) - fix(helm): drop main- prefix from default image tag by [@yuneng-berri](https://github.com/yuneng-berri) in [#28710](https://github.com/BerriAI/litellm/pull/28710) - test(model\_prices): allow audio\_transcription\_config in schema by [@yuneng-berri](https://github.com/yuneng-berri) in [#28708](https://github.com/BerriAI/litellm/pull/28708) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#28709](https://github.com/BerriAI/litellm/pull/28709) - fix(team): refresh team cache on team\_model\_add/delete (LIT-3244) by [@yuneng-berri](https://github.com/yuneng-berri) in [#28683](https://github.com/BerriAI/litellm/pull/28683) - fix(ui/add-model): stop vertex\_ai-anthropic\_models from leaking into Anthropic dropdown by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28723](https://github.com/BerriAI/litellm/pull/28723) - Fix spend logs v2 route permissions by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28705](https://github.com/BerriAI/litellm/pull/28705) - fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body by [@milan-berri](https://github.com/milan-berri) in [#27526](https://github.com/BerriAI/litellm/pull/27526) - chore(tests): migrate Bedrock CI to AWS account [`9412775`](https://github.com/BerriAI/litellm/commit/941277531214) by [@mateo-berri](https://github.com/mateo-berri) in [#28728](https://github.com/BerriAI/litellm/pull/28728) - fix(otel): export SERVER span on management-endpoint success without http\_request by [@yassin-berriai](https://github.com/yassin-berriai) in [#28794](https://github.com/BerriAI/litellm/pull/28794) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28801](https://github.com/BerriAI/litellm/pull/28801) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28657](https://github.com/BerriAI/litellm/pull/28657) - fix(ui): show 2-decimal precision for max\_budget on key overview by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28809](https://github.com/BerriAI/litellm/pull/28809) - feat(proxy): allow `llm_api_routes` virtual keys to list MCP servers by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28442](https://github.com/BerriAI/litellm/pull/28442) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28807](https://github.com/BerriAI/litellm/pull/28807) - fix(team): keep team\_alias cache in sync on \_cache\_team\_object writes by [@yuneng-berri](https://github.com/yuneng-berri) in [#28737](https://github.com/BerriAI/litellm/pull/28737) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28822](https://github.com/BerriAI/litellm/pull/28822) - ci: daily oss-agent-shin canonical branch by [@ishaan-berri](https://github.com/ishaan-berri) in [#28829](https://github.com/BerriAI/litellm/pull/28829) - test(proxy): add harness for proxy\_server.py behavior-pinning by [@yuneng-berri](https://github.com/yuneng-berri) in [#28827](https://github.com/BerriAI/litellm/pull/28827) - feat(openai): apply regional-processing cost uplift for EU/US data residency by [@mateo-berri](https://github.com/mateo-berri) in [#28626](https://github.com/BerriAI/litellm/pull/28626) - chore(admin-ui): regenerate static export with trailingSlash: true by [@mateo-berri](https://github.com/mateo-berri) in [#28112](https://github.com/BerriAI/litellm/pull/28112) - fix(azure): preserve AD token refresh in v1 OpenAI client path by [@mateo-berri](https://github.com/mateo-berri) in [#28627](https://github.com/BerriAI/litellm/pull/28627) - fix(ui): route API Reference back to query-param page by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28726](https://github.com/BerriAI/litellm/pull/28726) - fix(model-edit): allow clearing custom pricing on wildcard models by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28719](https://github.com/BerriAI/litellm/pull/28719) - fix(tests/vcr): make Redis cassette cache replay deterministically (zero VCR misses on consecutive runs) by [@mateo-berri](https://github.com/mateo-berri) in [#28826](https://github.com/BerriAI/litellm/pull/28826) - fix(proxy): strip LiteLLM policy tracking from OpenAI batch metadata by [@shivamrawat1](https://github.com/shivamrawat1) in [#28425](https://github.com/BerriAI/litellm/pull/28425) - Litellm OpenAI double prefix bug by [@shivamrawat1](https://github.com/shivamrawat1) in [#28661](https://github.com/BerriAI/litellm/pull/28661) - Litellm oss staging 250526 by [@Sameerlite](https://github.com/Sameerlite) in [#28770](https://github.com/BerriAI/litellm/pull/28770) - fix(bedrock): align toolUse/toolSpec names and allow hyphens by [@Sameerlite](https://github.com/Sameerlite) in [#28874](https://github.com/BerriAI/litellm/pull/28874) - fix(realtime): send TEXT frames and valid guardrail session.update by [@Sameerlite](https://github.com/Sameerlite) in [#28848](https://github.com/BerriAI/litellm/pull/28848) - fix(mcp): extend key access-group union to MCP servers by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28890](https://github.com/BerriAI/litellm/pull/28890) - fix(galileo): support hosted v2 spans API and string output extraction by [@Sameerlite](https://github.com/Sameerlite) in [#28771](https://github.com/BerriAI/litellm/pull/28771) - fix(proxy): exclude proxy\_server\_request from its own body snapshot by [@michelligabriele](https://github.com/michelligabriele) in [#28618](https://github.com/BerriAI/litellm/pull/28618) - \[Feat] Add tool calling support for gemini and vertex ai live api by [@Sameerlite](https://github.com/Sameerlite) in [#26590](https://github.com/BerriAI/litellm/pull/26590) - refactor(ui): remove dead App Router scaffolding in (dashboard)/\* by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28891](https://github.com/BerriAI/litellm/pull/28891) - fix(docker): use system Node in componentized builders + retry apk add by [@yassin-berriai](https://github.com/yassin-berriai) in [#28888](https://github.com/BerriAI/litellm/pull/28888) - docs(agents): require consent before writing new third-party names by [@yuneng-berri](https://github.com/yuneng-berri) in [#28908](https://github.com/BerriAI/litellm/pull/28908) - refactor(ui): extract auth state into AuthContext by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28910](https://github.com/BerriAI/litellm/pull/28910) - fix(mcp): resolve team.access\_group\_ids → MCP servers by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28997](https://github.com/BerriAI/litellm/pull/28997) - test(ui): e2e cover team model edit + admin identity in navbar by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28652](https://github.com/BerriAI/litellm/pull/28652) - test(e2e): cover add-fallback flow in Router Settings by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29069](https://github.com/BerriAI/litellm/pull/29069) - test(e2e): cover Team-BYOK add-model flow as proxy admin by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29068](https://github.com/BerriAI/litellm/pull/29068) - fix(containers): record ownership for service-account keys + fix Prisma Json serialization by [@Sameerlite](https://github.com/Sameerlite) in [#28990](https://github.com/BerriAI/litellm/pull/28990) - test(e2e): cover add-MCP-server flow via discovery → custom form by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29070](https://github.com/BerriAI/litellm/pull/29070) - test(e2e): cover AI Hub make-public flow and public model\_hub\_table by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29071](https://github.com/BerriAI/litellm/pull/29071) - \[internal copy of [#28877](https://github.com/BerriAI/litellm/issues/28877)] feat: add support for claude code goal mode for bedrock opus output config by [@mateo-berri](https://github.com/mateo-berri) in [#28898](https://github.com/BerriAI/litellm/pull/28898) - feat(guardrails): wire apply\_guardrail into proxy logging callbacks by [@Sameerlite](https://github.com/Sameerlite) in [#28970](https://github.com/BerriAI/litellm/pull/28970) - chore(ci): merge dev brach by [@yuneng-berri](https://github.com/yuneng-berri) in [#29192](https://github.com/BerriAI/litellm/pull/29192) - perf(streaming): cut per-chunk overhead \~30% on Anthropic + Bedrock hot path by [@yassin-berriai](https://github.com/yassin-berriai) in [#28720](https://github.com/BerriAI/litellm/pull/28720) - fix(proxy): enforce tag budgets for key-level tags by [@Sameerlite](https://github.com/Sameerlite) in [#29108](https://github.com/BerriAI/litellm/pull/29108) - fix(vertex-ai): use DB credentials in video handlers + implement Veo video edit by [@Sameerlite](https://github.com/Sameerlite) in [#29098](https://github.com/BerriAI/litellm/pull/29098) - fix(datadog): drain cost-management queue + opt-in FinOps tag allowlist by [@michelligabriele](https://github.com/michelligabriele) in [#28487](https://github.com/BerriAI/litellm/pull/28487) - feat(helm): split per-component ServiceAccounts for gateway, backend, and UI by [@yassin-berriai](https://github.com/yassin-berriai) in [#28712](https://github.com/BerriAI/litellm/pull/28712) - chore(ci): bump deps ([#29208](https://github.com/BerriAI/litellm/issues/29208)) by [@yuneng-berri](https://github.com/yuneng-berri) in [#29226](https://github.com/BerriAI/litellm/pull/29226) - fix(tests/vcr): mint Google OAuth tokens live to prevent stale-token replay by [@yuneng-berri](https://github.com/yuneng-berri) in [#29229](https://github.com/BerriAI/litellm/pull/29229) - chore(cookbook): bump Go directive to 1.26.3 in gollem example by [@yuneng-berri](https://github.com/yuneng-berri) in [#29234](https://github.com/BerriAI/litellm/pull/29234) - chore(ci): bump version by [@yuneng-berri](https://github.com/yuneng-berri) in [#29242](https://github.com/BerriAI/litellm/pull/29242) - feat(anthropic): add Claude Opus 4.8 and prune reasoning-effort flags by [@mateo-berri](https://github.com/mateo-berri) in [#29238](https://github.com/BerriAI/litellm/pull/29238) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29243](https://github.com/BerriAI/litellm/pull/29243) - fix(ci): restore real Bedrock batch S3 bucket/role in oai\_misc\_config by [@mateo-berri](https://github.com/mateo-berri) in [#29245](https://github.com/BerriAI/litellm/pull/29245) - fix(guardrails): persist disable\_global\_guardrails on keys by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29233](https://github.com/BerriAI/litellm/pull/29233) - test(e2e): cover Team Admin view + member + key flows by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29072](https://github.com/BerriAI/litellm/pull/29072) - docs: hand-written CLAUDE.md; remove AGENTS.md, point GEMINI.md at it by [@mateo-berri](https://github.com/mateo-berri) in [#29252](https://github.com/BerriAI/litellm/pull/29252) - fix(teams): expose keys\_count on /v2/team/list and wire UI Resources badge by [@michelligabriele](https://github.com/michelligabriele) in [#28502](https://github.com/BerriAI/litellm/pull/28502) - fix(anthropic): stop injecting unsupported output\_config.effort=xhigh for Claude Code on Sonnet/Opus 4.6 by [@mateo-berri](https://github.com/mateo-berri) in [#29304](https://github.com/BerriAI/litellm/pull/29304) - test(e2e): cover Internal Viewer nav, key, and team-info gating by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29075](https://github.com/BerriAI/litellm/pull/29075) - test(e2e): cover Internal User key modal, team info, key page by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29074](https://github.com/BerriAI/litellm/pull/29074) - test(e2e): cover navbar Logout flow as proxy admin by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29076](https://github.com/BerriAI/litellm/pull/29076) - fix(mcp): resolve key.access\_group\_ids → MCP servers (ungated) by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29195](https://github.com/BerriAI/litellm/pull/29195) - fix(router): enforce deployment budgets for dynamically added models by [@Sameerlite](https://github.com/Sameerlite) in [#29273](https://github.com/BerriAI/litellm/pull/29273) - fix(proxy): map stripped batch body.model to proxy alias for auth by [@Sameerlite](https://github.com/Sameerlite) in [#29264](https://github.com/BerriAI/litellm/pull/29264) - feat(mcp): support stateless and stateful clients via session-id routing by [@Sameerlite](https://github.com/Sameerlite) in [#26857](https://github.com/BerriAI/litellm/pull/26857) - fix(bedrock): support tool search results + chat annotations by [@Sameerlite](https://github.com/Sameerlite) in [#29120](https://github.com/BerriAI/litellm/pull/29120) - fix(mcp): ignore stale ids on key save by [@Sameerlite](https://github.com/Sameerlite) in [#29128](https://github.com/BerriAI/litellm/pull/29128) - feat(a2a): well-known agent-card discovery + LangGraph Platform mode by [@Sameerlite](https://github.com/Sameerlite) in [#28860](https://github.com/BerriAI/litellm/pull/28860) - fix(proxy): link passthrough success spans to the SERVER root OTEL span by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29315](https://github.com/BerriAI/litellm/pull/29315) - \[internal copy of [#29089](https://github.com/BerriAI/litellm/issues/29089)] fix: duplicate claude code traces by [@mateo-berri](https://github.com/mateo-berri) in [#29311](https://github.com/BerriAI/litellm/pull/29311) - feat(otel): typed semconv-aligned OpenTelemetry instrumentation by [@yassin-berriai](https://github.com/yassin-berriai) in [#28909](https://github.com/BerriAI/litellm/pull/28909) - tests(proxy\_server): surface current behavior in tests by [@yuneng-berri](https://github.com/yuneng-berri) in [#29309](https://github.com/BerriAI/litellm/pull/29309) - test(e2e): cover Internal User create-key flow when in no teams by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29083](https://github.com/BerriAI/litellm/pull/29083) - test(e2e): assert internal-user navbar identity is scoped to that user by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29077](https://github.com/BerriAI/litellm/pull/29077) - feat(otel): add team\_metadata, http.route, and model names to inference spans by [@yassin-berriai](https://github.com/yassin-berriai) in [#29319](https://github.com/BerriAI/litellm/pull/29319) - feat(context\_management): compact\_20260112 polyfill for non-Anthropic providers by [@Sameerlite](https://github.com/Sameerlite) in [#28868](https://github.com/BerriAI/litellm/pull/28868) - feat(enterprise): add RESEND\_FROM\_EMAIL for self-hosted Resend sends by [@shivamrawat1](https://github.com/shivamrawat1) in [#28830](https://github.com/BerriAI/litellm/pull/28830) - Revert Bedrock CI back to the reactivated AWS account ([`8886022`](https://github.com/BerriAI/litellm/commit/888602223428)) by [@mateo-berri](https://github.com/mateo-berri) in [#29326](https://github.com/BerriAI/litellm/pull/29326) - fix(mcp): preserve source\_url in GET /v1/mcp/server list responses by [@shivamrawat1](https://github.com/shivamrawat1) in [#29249](https://github.com/BerriAI/litellm/pull/29249) - fix(mcp): preserve omitted fields on PUT /v1/mcp/server partial updates by [@shivamrawat1](https://github.com/shivamrawat1) in [#29253](https://github.com/BerriAI/litellm/pull/29253) - fix(ci): make litellm\_internal\_staging green (logging test + Bedrock Opus 4.7 self-heal) by [@mateo-berri](https://github.com/mateo-berri) in [#29344](https://github.com/BerriAI/litellm/pull/29344) - refactor(proxy/auth): normalize Bearer prefix in safe-hash helper by [@yuneng-berri](https://github.com/yuneng-berri) in [#29343](https://github.com/BerriAI/litellm/pull/29343) - test(reasoning-effort-grid): cover Claude Opus 4.8 across provider routes by [@mateo-berri](https://github.com/mateo-berri) in [#29327](https://github.com/BerriAI/litellm/pull/29327) - fix(guardrails): return HTTP 400 for litellm content filter blocks by [@shivamrawat1](https://github.com/shivamrawat1) in [#28418](https://github.com/BerriAI/litellm/pull/28418) - fix(proxy): restrict vector store index create/delete to proxy admins by [@shivamrawat1](https://github.com/shivamrawat1) in [#29202](https://github.com/BerriAI/litellm/pull/29202) - feat(pass\_through): extend passthrough\_managed\_object\_ids to Azure by [@Sameerlite](https://github.com/Sameerlite) in [#29160](https://github.com/BerriAI/litellm/pull/29160) - fix(proxy): enforce allowed\_passthrough\_routes for auth=true pass-thr… by [@shivamrawat1](https://github.com/shivamrawat1) in [#29256](https://github.com/BerriAI/litellm/pull/29256) - feat(mcp/auth): additive key access-group grants + opt-in member assignment by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29313](https://github.com/BerriAI/litellm/pull/29313) - fix(reset\_budget): write only {spend, budget\_reset\_at} and stop pre-zeroing counter by [@yuneng-berri](https://github.com/yuneng-berri) in [#29358](https://github.com/BerriAI/litellm/pull/29358) - test(e2e): cover PROXY\_LOGOUT\_URL redirect on Logout by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29080](https://github.com/BerriAI/litellm/pull/29080) - fix(ui): break logout redirect loop across dev and proxy origins by [@yuneng-berri](https://github.com/yuneng-berri) in [#29360](https://github.com/BerriAI/litellm/pull/29360) - fix(openai-moderation): wire streaming flags through to unified dispatcher by [@michelligabriele](https://github.com/michelligabriele) in [#27324](https://github.com/BerriAI/litellm/pull/27324) - chore(ci): build ui by [@yuneng-berri](https://github.com/yuneng-berri) in [#29366](https://github.com/BerriAI/litellm/pull/29366) - fix(v3 limiter): cap no-max\_tokens TPM floor at smallest configured limit by [@michelligabriele](https://github.com/michelligabriele) in [#28805](https://github.com/BerriAI/litellm/pull/28805) - fix(e2e): tolerate trailing slash in SERVER\_ROOT\_PATH login redirect by [@yuneng-berri](https://github.com/yuneng-berri) in [#29369](https://github.com/BerriAI/litellm/pull/29369) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#29373](https://github.com/BerriAI/litellm/pull/29373) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29372](https://github.com/BerriAI/litellm/pull/29372) - chore(release): patch v1.88.0-rc.1 with four staged fixes by [@mateo-berri](https://github.com/mateo-berri) in [#29632](https://github.com/BerriAI/litellm/pull/29632) - chore(release): patch v1.88.0-rc.1 with [#29612](https://github.com/BerriAI/litellm/issues/29612) (session-token budget-ceiling exemption) by [@mateo-berri](https://github.com/mateo-berri) in [#29637](https://github.com/BerriAI/litellm/pull/29637) - fix(key\_generate): harden GHSA-q775 …

* fix(key_generate): allow team members to create keys on org-scoped teams (#29310) * fix(key_generate): allow team members to create keys on org-scoped teams When a virtual key is created for a team, enterprise logic inherits the team's organization_id onto the key (add_team_organization_id). Since the VERIA-55 org-IDOR fix, /key/generate then required the caller to be an explicit LiteLLM_OrganizationMembership member of that org, returning 403 "Caller is not a member of organization_id=<uuid>". Admins normally only add users to teams (not orgs), so self-serve key creation regressed for any user on an org-scoped team (regression since v1.84.0-rc.1). Skip the org-membership check when organization_id was inherited from the key's team (organization_id == team_table.organization_id). Team-level authorization already gates this path, so team membership is sufficient. The membership check still runs when a caller assigns an organization_id that did not come from the key's team, preserving the IDOR protection. Adds regression tests covering both the team-inherited (allowed) and foreign-org (still blocked) cases. Co-authored-by: Cursor <cursoragent@cursor.com> * test(key_generate): cover mismatched team org IDOR path on generate Add test_generate_key_foreign_org_with_mismatched_team_still_enforces_membership for the case where a team is present but request organization_id differs from team_table.organization_id. Enterprise inheritance is no-op'd in the test so the guard is exercised directly; membership validation must still run. Addresses Greptile review on #29310. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite (#29595) * test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite gemini-2.5-flash-lite is a generation behind and is slated for discontinuation on Vertex AI no earlier than October 16, 2026, so the pass-through suite was exercising an aging model. Every reference now points at gemini-3.1-flash-lite, which is GA and already priced in the cost map so the spend-logging assertions still compute a real cost test_vertex.test.js also gains jest.retryTimes(3) to match the sibling spend tests. The CI failures were intermittent 429 RESOURCE_EXHAUSTED from Vertex quota pressure, and that file was the only one without a retry, so a single rate-limited request was failing the whole job * test(pass-through): point Vertex tests at the global endpoint for gemini-3.1-flash-lite gemini-3.1-flash-lite is not served on the Vertex us-central1 regional endpoint for the CI project, so the Vertex pass-through tests were returning a deterministic 404 "Publisher Model ... was not found or your project does not have access to it" while the Gemini API tests passed. Move the Vertex clients to the global location, which the pass-through router maps to aiplatform.googleapis.com, where the 3.1 family is served * Litellm oss staging 030626 (#29578) * Fix incorrect agent API request example payload structure (#29556) * fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs (#29427) * fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs On /v1/messages and other LITELLM_METADATA_ROUTES, the parent OTel span is stored in litellm_params['litellm_metadata'] instead of litellm_params['metadata']. When the request body contains a native 'metadata' field (e.g. Anthropic's {"user_id": "..."}), litellm_params['metadata'] gets overwritten and the parent span is lost, producing orphan root spans with a different trace_id. Add fallback checks to litellm_metadata in: - _get_span_context(): so child spans find the correct parent - _end_proxy_span_from_kwargs(): so the proxy span gets closed Fixes: https://github.com/BerriAI/litellm/issues/27934 * test(otel): tighten assertions per Greptile review - test_span_context_metadata_takes_priority: assert litellm_metadata span is never accessed, proving metadata takes priority - test_span_context_no_parent_when_neither_has_span: assert both ctx and detected_span are None --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix: remove premature end-user budget check from get_end_user_object (#29420) * fix(proxy): remove premature end-user budget check from get_end_user_object Problem: - `_check_end_user_budget()` was called inside `get_end_user_object()` - This caused budget checks to run BEFORE `skip_budget_checks` could be evaluated - Zero-cost models (e.g., local vLLM) were incorrectly blocked when end-users exceeded their budget, even though they should bypass budget checks Solution: - Remove `_check_end_user_budget()` calls from `get_end_user_object()` - Budget enforcement now happens exclusively in `common_checks()` where `skip_budget_checks` context is available - `get_end_user_object()` keeps `route` as optional in function parameter for backwards compatibility and future implementation. * refactor(tests): update budget enforcement tests to reflect changes in get_end_user_object - test_get_end_user_object() verifies data fetching - test_check_end_user_budget() verifies enforcement - test_budget_enforcement_blocks_over_budget_users() integrates _check_end_user_budget() - test_resolve_end_user_reraises_budget_exceeded() is now test_resolve_end_user since no budget exceeded is thrown in get_end_user_object() * Gemini /images/generate and /images/edits billing fixes + add support for size and aspect ratio params (#29534) * Fix Gemini image config mapping * Address Gemini image config review * Format Gemini image generation transform * Fix Gemini image token usage logging * Share Gemini image request helpers * Fix Gemini Imagen model routing * Fixes as per self code review * Fixes per internal code review * Stop gating Imagen imageSize forwarding * Document Gemini image size mapping source * chore: retrigger lint * Clarify Gemini candidate count precedence * Add Inception provider (#29522) * add inception as provider (chat, fim) * linting * seperate test suite for chat and fim * fix test coverage * fix: model hub custom pricing model info (#29293) * Opik user auth key metadata extractors (#28397) * fix: enhance Opik metadata extraction to include user API key auth context fixed after refactoring to extractor logic * test: add unit tests for OPik metadata extraction logic * fix: enhance extract_opik_metadata function to prioritize metadata sources for improved accuracy * fix(ci): clarified comments and edited unit tests * test: add unit tests for OPik metadata extraction with auth and requester overrides * fix(ui): replace fixed favicon.ico with current api get /get_favicon (#29532) Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar> * fix(vertex/gemini): keep tool_call reference when a text-only assistant message follows (#29561) `_gemini_convert_messages_with_history` tracks `last_message_with_tool_calls` so a following tool result can be matched back to its tool call. The assignment was inside a branch guarded by `assistant_msg.get("tool_calls", []) is not None`, which is also True for a text-only assistant message (an empty list is not None). As a result, an assistant message with no tool calls that appears between a tool call and its tool result overwrote the reference, and conversion failed with: Exception: Missing corresponding tool call for tool response message. This shape is common: a model emits a short narration/assistant message after a tool call before the tool result is appended. Only update `last_message_with_tool_calls` when the assistant message actually carries tool_calls (or a function_call). Adds a regression test. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models (#28572) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models The 1-hour prompt-cache write tier (`cache_creation_input_token_cost_above_1hr`) was added to the us./global. variants of the Claude 4.5/4.6/4.7 family on Bedrock, but the eu./au./jp. cross-region inference profiles were left without it. AWS Bedrock pricing applies the same +10% regional premium across all geo profiles, so eu./au./jp. should carry the same 1-hour rates as us. (1.6x the 5-minute regional rate). Without these fields, cost tracking on EU/AU/JP Bedrock 1-hour-TTL prompt caching falls back to the 5-minute write rate and undercounts spend by ~60% for European, Australian, and Japanese tenants. Adds the 1-hour tier (and Sonnet 4.5's long-context >200K tier where AWS publishes one) to 14 regional Bedrock entries in both `model_prices_and_context_window.json` and the bundled `model_prices_and_context_window_backup.json`: - eu./au. Opus 4.6 ($11.00 / MTok) - eu./au. Opus 4.7 ($11.00 / MTok) - eu./au./jp. Sonnet 4.6 ($6.60 / MTok) - eu./au./jp. Sonnet 4.5 ($6.60 / MTok regular, $13.20 / MTok LC) - eu./au./jp. Haiku 4.5 ($2.20 / MTok) Also extends `tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py` with a `REGIONAL_EXPECTED` parametrized block covering all 13 new entries plus the existing 1.6x ratio invariant. Note: `eu.anthropic.claude-opus-4-5-20251101-v1:0` carries the wrong 5m rate today (base 6.25e-06 instead of regional 6.875e-06), which would break the 1.6x ratio check. It is intentionally left out of this PR so the scope stays "1-hour cache tier addition" — a separate follow-up should correct the EU 5m rates for Opus 4.5. --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * Add 1-hour cache write pricing tier for Vertex AI Anthropic models (#28569) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add 1-hour cache write pricing tier for Vertex AI Anthropic models GCP Vertex AI publishes a separate 1-hour cache write column for the Claude family (1.6x the 5-minute write rate, matching the documented Bedrock ratio). LiteLLM's Vertex AI Anthropic entries only carry the 5-minute tier, so any request that uses `cache_control: {"ttl": "1h"}` on Vertex AI Claude is undercounted in cost tracking by ~60%. The runtime side already supports the 1-hour tier — `VertexAIAnthropicConfig` extends `AnthropicConfig`, populating `ephemeral_1h_input_tokens`, and `_calculate_cache_creation_cost` reads `cache_creation_input_token_cost_above_1hr`. Only the price registry was missing data. Adds the field to 19 vertex_ai/claude-* entries across both `model_prices_and_context_window.json` and the bundled `model_prices_and_context_window_backup.json`: - Haiku 4.5 ($1.25 -> $2.00 / MTok) - Sonnet 3.7 / 4 / 4.5 / 4.6 ($3.75 -> $6.00 / MTok) - Opus 4.5 / 4.6 / 4.7 ($6.25 -> $10.00 / MTok) - Opus 4 / 4.1 ($18.75 -> $30.00 / MTok) Adds `tests/test_litellm/test_vertex_anthropic_1hr_cache_pricing.py` mirroring the Bedrock equivalent — pins each (5m, 1h) pair per model and asserts the 1.6x ratio across the family. Fixes #27781. --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * Fix Gemini multimodal function responses (#29325) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * address greptile review: add _transform_image_usage method and model-map supports_image_size flag - Add _transform_image_usage instance method to GoogleImageGenConfig that delegates to transform_gemini_image_usage, fixing the regression test - Replace hardcoded "2.5-flash" string check in supports_gemini_image_size with a get_model_info lookup on supports_image_size (default true) - Add supports_image_size: false to all gemini-2.5-flash model entries in model_prices_and_context_window.json so capability is controlled via the model map rather than embedded in code * fix test failures: schema validation, mypy type, model info plumbing, pricing test - Add supports_image_size to ModelInfoBase TypedDict so get_model_info surfaces it - Pass supports_image_size through _get_model_info_helper constructor call - Fix supports_gemini_image_size to use value is not False (None means unset, defaults to True) - Add supports_image_size to JSON schema in test_aaamodel_prices_and_context_window_json_is_valid - Correct gemini-3.1-flash-lite pricing assertions in test to match JSON values * Add Azure AI Kimi K2.6 metadata (#27052) * Add Azure AI Kimi K2.6 metadata * Scope Kimi metadata test cost map setup * fall back to substring check for models not in model_prices_and_context_window.json Models like gemini-2.5-flash-image-preview are not in the pricing JSON, so get_model_info raises. Fall back to "2.5-flash" not in model when the JSON has no explicit supports_image_size entry for the model. * fix(inception): don't forward global litellm.api_key to Inception FIM Match the Inception chat config: resolve only an Inception-specific key (param, litellm.inception_key, or INCEPTION_API_KEY) for the text-completion FIM path. The global litellm.api_key (often an OpenAI key) was both leaking to api.inceptionlabs.ai and taking precedence over the configured Inception key when set. * fix(auth): enforce end-user budget on custom-auth path that skips common_checks get_end_user_object() no longer raises BudgetExceededError, so custom-auth deployments with custom_auth_run_common_checks unset (which skip the centralized common_checks gate) stopped enforcing the end-user budget, letting an over-budget end user keep making requests. Re-enforce the budget in _run_post_custom_auth_checks on that path. --------- Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar> Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com> Co-authored-by: aneeshsangvikar <aneeshsangvikar@fiddler.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com> Co-authored-by: Suleiman Elkhoury <108065141+suleimanelkhoury@users.noreply.github.com> Co-authored-by: Dmitriy Alergant <93501479+DmitriyAlergant@users.noreply.github.com> Co-authored-by: Yanis Miraoui <yanis.miraoui19@imperial.ac.uk> Co-authored-by: Lovro Seder <vrovro@gmail.com> Co-authored-by: Thomas Mildner <12685945+Thomas-Mildner@users.noreply.github.com> Co-authored-by: José Luis Di Biase <josx@interorganic.com.ar> Co-authored-by: Lai Quang Huy <64073540+1qh@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: ZHONG Ziwen <67355585+zzw-math@users.noreply.github.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * Fix : a2a bugs 030626 (#29566) * Fix error code and context id injection bug * Add support for all A2A methods * Add logging * address greptile review: relay upstream JSON-RPC errors, move _PASCAL_TO_WIRE to module level, add error path tests * fix(a2a): run pre_call_hook for tasks/resubscribe SSE path to enforce guardrails tasks/resubscribe was returning the raw SSE stream without calling proxy_logging_obj.pre_call_hook, silently bypassing any guardrails configured on the agent. This patch calls pre_call_hook before streaming begins and wires post_call_failure_hook into the SSE generator so errors are logged. Adds a regression test verifying the hook is called. * fix(a2a): use get_async_httpx_client instead of creating httpx clients per request Creating httpx.AsyncClient instances per-request adds ~500ms latency. Switch _forward_jsonrpc and _forward_jsonrpc_sse to use the shared client from get_async_httpx_client(httpxSpecialProvider.A2A). * fix(a2a): forward caller identity headers on task ops; validate push notification URL Two security fixes for task management methods: 1. All task operations (tasks/get, tasks/list, tasks/cancel, tasks/resubscribe, push notification config methods) now forward X-LiteLLM-User-Id and X-LiteLLM-Team-Id headers to the upstream agent, so the agent can scope task access to the authenticated caller. 2. tasks/pushNotificationConfig/set validates the callback URL before forwarding: requires HTTPS and rejects private/loopback/reserved IP ranges and localhost hostnames to prevent SSRF. * Fix A2A task hook and push URL handling * fix(a2a): fix mypy type errors for request_id and header_name dict key types * Fix A2A request id and params forwarding * Forward trace IDs for A2A task calls * fix(a2a): strip client-forwarded X-LiteLLM-* headers before applying authenticated identity A client could send x-a2a-<agent>-x-litellm-user-id in their request and have it forwarded to the upstream agent as an authenticated identity header. Fix: sanitize any X-LiteLLM-* headers from agent_extra_headers before merging, then apply the authenticated identity headers last so they always override client-supplied values. * Fix A2A SSE fallback JSON-RPC error code * Fix A2A SSE error id backfill * fix(a2a): validate both push notification url fields to close SSRF bypass * fix(a2a): widen request_id annotation to match JSON-RPC id call sites * fix(a2a): run post-call streaming hook for tasks/resubscribe so agent guardrails apply tasks/resubscribe returned the raw upstream SSE stream without routing events through the post-call streaming hook, so output guardrails configured on the agent were silently skipped for streaming task subscriptions while every other task method and message/stream applied them. Parse upstream JSON-RPC SSE events and feed them through async_streaming_data_generator, matching message/stream, so guardrails inspect the streamed task content. Adds a regression test that fails when the streamed events bypass the guardrail hook. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * fix(anthropic/adapter): emit thinking block for reasoning_content-only streaming chunks (#29600) * fix(anthropic/adapter): open thinking block for reasoning_content-only streaming chunks The /v1/messages streaming content-block classifier (_translate_streaming_openai_chunk_to_anthropic_content_block) only recognized thinking_blocks. OpenAI-compatible reasoning backends (vLLM/SGLang reasoning parsers: DeepSeek-R1, Qwen3, gpt-oss, ...) populate reasoning_content with thinking_blocks=None, so the classifier fell through to a text block. The delta translator already emits thinking_delta for reasoning_content, so those deltas landed inside a text block and Anthropic streaming clients (Claude Code, SDK .stream()) silently dropped the chain-of-thought. Mirror the reasoning_content fallback already present in the non-stream translator and the streaming delta translator so the classifier opens a thinking block. Adds a focused regression test. * fix(anthropic/adapter): reach reasoning_content branch when thinking_blocks attr is absent Delta deletes the thinking_blocks attribute when unset, so the prior nested check was unreachable for reasoning-only chunks (vLLM/SGLang). Make it a sibling elif so the content block is classified as thinking. * test(proxy): stop component-allowlist test leaking DATABASE_URL into xdist peers The component-allowlist test pins throwaway DATABASE_URL/LITELLM_MASTER_KEY values at import time via os.environ so importing proxy_server doesn't need a live database. Those values persisted for the whole pytest-xdist worker, so a sibling test sharing the worker (test_key_rotation_e2e's DB-backed E2E case) saw the leaked sqlite DATABASE_URL, treated it as an available database instead of skipping, and the Prisma engine rejected the non-postgres URL (P1012 -> httpx.ConnectError). Restore the prior environment after the import so the throwaway values never escape the module. --------- Co-authored-by: Tai An <antai12232931@outlook.com> * ci: reproduce default-Windows wheel install to guard MAX_PATH (#29597) * ci: reproduce default-Windows wheel install to guard MAX_PATH The existing using_litellm_on_windows job installs the project with `uv sync`, an editable source install that never copies package files into a deep site-packages path, so it cannot see the 260-char MAX_PATH overflow that breaks `pip install litellm` on default Windows. The content-filter benchmark fixtures have hit that limit three times (#21941, #22039, #29536), each caught only after release. This adds a guard to the same job that builds the wheel and installs it the way an end user would: into a venv whose site-packages prefix is padded to a realistic worst-case Windows length (~100 chars), then asserts the install completes and litellm imports. Any packaged path long enough to bust MAX_PATH at that prefix is reported up front, so the check is deterministic regardless of the runner's long-path setting, while the real install also covers failure modes a length heuristic cannot (half-unpacked packages, reserved names, case collisions). This commit is the guard only; on the current tree it correctly fails because nine fixtures still exceed the limit. The rename that brings them back under it follows on this branch. * fix(packaging): shorten content-filter benchmark fixtures under MAX_PATH The 10 content-filter benchmark result fixtures used the legacy block_{topic}_-_contentfilter_({yaml}).json naming, up to 176 chars inside the wheel, which busts the Windows 260-char MAX_PATH limit once extracted under a realistic site-packages prefix and aborts `pip install litellm` on default Windows. Rename them to the short {topic}_cf.json scheme that _save_confusion_results already emits today (it splits the label on the em-dash and writes f"{topic}_cf"), matching the insults_cf.json and investment_cf.json files fixed earlier. Re-running the eval suite now regenerates these same short names rather than recreating the long ones. This drops the longest packaged path from 176 to 128, so the guard added in the previous commit goes from red to green with a 32-char margin. * test(windows): tidy MAX_PATH guard per review Close the wheel zip via a context manager rather than leaning on refcount collection, and select the wheel under dist/ by newest mtime so a stale artifact from an earlier build cannot be tested instead of the one just produced. Also pin down the venv-depth formula with a short note: the +2 is the separator joining the venv root to "Lib" plus the trailing separator before the entry, which lands the simulated site-packages prefix at exactly 100 chars. * fix(vertex): strip output_config.effort for Vertex Claude models that reject it (Haiku 4.5) (#29585) * fix(vertex): strip output_config.effort for models that reject it Haiku 4.5 on Vertex AI does not support output_config.effort and 400s with "output_config.effort: Extra inputs are not permitted". PR #27074 emptied VERTEX_UNSUPPORTED_OUTPUT_CONFIG_KEYS so effort would forward for Opus/Sonnet 4.6+, but that made the strip unconditional across every Vertex Anthropic model, including ones that don't support it. Claude Code injects effort into its default Messages payload, so `claude --model claude-haiku-4.5` started failing. Make the sanitizer model-aware: drop output_config.effort for models that don't advertise output_config support (or any reasoning effort level) while forwarding it for those that do. The fix covers both the chat-completion and Messages pass-through transformation paths since they share the helper. * chore(vertex): log at debug when dropping unsupported output_config.effort Operators pointing an unregistered Vertex Claude alias that does support effort would otherwise see it stripped with no signal. Debug level keeps it out of normal logs since Claude Code sends effort on every request. * Litellm websocket improvements (#29563) * Add support for websocket via codex * Add model alias and creds support * fix: skip cost tracking for WS session wrapper call types The @client decorator on _aresponses_websocket fires async_success_handler with result=None after the session ends. This triggered cost tracking errors because standard_logging_object is never built for None results. Per-turn costs are correctly tracked by individual litellm.aresponses calls inside the session. The outer session-level logging obj should not attempt cost tracking. Fix: skip _aresponses_websocket and _arealtime call types in deployment_callback_on_success, RouterBudgetLimiting.async_log_success_event, and _PROXY_track_cost_callback. * fix: address Greptile review comments Fix JSON injection: use json.dumps instead of f-string interpolation for model name in WS body. Add 30s timeout for first WS frame to prevent unbounded connection resource tie-up. Restore per-event model override in streaming_iterator; fall back to connection-level model when event omits it. Strengthen regression test: inject alias into kwargs via _update_kwargs_with_deployment mock so the test would fail on un-fixed code. * fix: handle nested response.create format in first-frame model extraction When ?model= is omitted, the first WS frame can carry the model in either flat format (first_event["model"]) or nested format (first_event["response"]["model"]). The flat-only check would silently reject clients using the nested wire format. Mirrors the same two-format logic in _build_base_call_kwargs. * fix: don't force connection-level custom_llm_provider on per-event model overrides If a client sends a different model per response.create turn, litellm needs to re-resolve the provider from that model string. Forcing the connection-level custom_llm_provider would silently route the request to the wrong backend. Only inject custom_llm_provider when the per-event model matches the connection-level model. * refactor: extract WS model extraction into testable function Pull the flat/nested model extraction into _extract_model_from_first_ws_event so tests import and exercise the real function rather than a copy. * fix: compare providers not full model strings in _inject_credentials The model == self.model guard was too strict: same-provider model variants (e.g., vertex_ai/gemini-2.0 -> vertex_ai/gemini-1.5 on one connection) would lose custom_llm_provider, breaking routing when a custom api_base is in use. Compare the provider extracted by get_llm_provider instead, so same-provider variants still inherit the connection-level provider while cross-provider overrides let litellm re-resolve. * style: black formatting * refactor: extract first-frame model resolution to fix PLR0915 (too many statements) * Fix responses WebSocket first-frame validation * fix: classify WS first-frame read errors and clarify cost-skip log Distinguish client disconnects from server errors when reading the responses WebSocket first frame, make the cost-tracking skip log message accurate for session wrappers (which do carry a model), and resolve the connection-level provider once per session instead of on every response.create event. * test: cover WS first-frame read errors and same-provider credential injection Adds regression tests for the still-uncovered responses WebSocket paths: the timeout, invalid-JSON and missing-model branches of _read_ws_model_from_first_frame, plus the provider comparison in ManagedResponsesWebSocketHandler._same_provider and _inject_credentials (same-provider model variants keep the connection provider; cross-provider models re-resolve). * fix(responses-ws): fall back to explicit custom_llm_provider when connection model is unresolvable When a WebSocket session is opened with a custom deployment alias that litellm cannot resolve to a provider, _connection_provider was None, so _same_provider returned False for every resolvable per-event model and the connection-level custom_llm_provider was dropped. Use the explicitly-set custom_llm_provider as the connection provider in that case so same-provider per-event models still inherit it while genuinely cross-provider models continue to re-resolve. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * feat(arize/phoenix): OpenInference rendering parity — tool_calls, cost, passthrough I/O, session/user, multimodal, cache tokens (#28800) * feat(arize): enrich OpenInference attributes for better span rendering Pure rendering enhancements to the Arize / Arize Phoenix integration. No existing attribute keys or values are removed or overwritten; every new emit is independently try/except-wrapped and fires only when its source data is present so existing behavior is preserved. What this adds - Coerce non-dict response objects (e.g. httpx.Response from passthrough routes) via JSON decode so id/model/usage extraction stops crashing with "'Response' object has no attribute 'get'". Dicts and Pydantic objects with .get pass through unchanged. - Set OPENINFERENCE_SPAN_KIND defensively early so a downstream failure can't blank the kind; the original late write (incl. TOOL upgrade) is preserved. - Add "passthrough" keyword to _infer_open_inference_span_kind so allm_passthrough_route / llm_passthrough_route resolve to LLM instead of UNKNOWN. - Emit cache token breakdown: LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ / _CACHE_WRITE / _AUDIO. Sources covered: OpenAI prompt_tokens_details and Anthropic / Bedrock cache_{read,creation}_input_tokens. - Render assistant tool_calls on both input and output messages via MESSAGE_TOOL_CALLS.* (Pydantic-aware, handles ModelResponse choices). Tool-result input messages also get MESSAGE_TOOL_CALL_ID and MESSAGE_NAME. - Render multimodal list-shaped content via MESSAGE_CONTENTS.* (OpenAI image_url, Anthropic source.{media_type,data} as data: URI). Legacy MESSAGE_CONTENT write is unchanged. - Emit SESSION_ID (end_user_id / trace_id), USER_ID (only when not already set by optional_params.user or model_params.user), and litellm.{team_id,team_alias,key_alias} from StandardLoggingPayload metadata. - Emit llm.response.cost as float from StandardLoggingPayload.response_cost. - Bedrock / Anthropic passthrough normalization: extract input from additional_args.complete_input_dict and output from the coerced provider response so INPUT_VALUE / OUTPUT_VALUE / LLM_INPUT_MESSAGES / LLM_OUTPUT_MESSAGES are populated. Only runs when call_type contains "passthrough" / "pass_through". Tests - 15 new unit tests covering each addition plus explicit regression guards (USER_ID overwrite protection, passthrough normalizer scope, coerce identity for dicts/.get-bearing objects, no spurious cache emits). - Existing test_arize_set_attributes count bumped from 26 to 27 to account for the additional defensive span.kind write (same value, written twice). - tests/test_litellm/integrations/arize/: 70 passed (55 baseline + 15 new). tests/test_litellm/integrations/test_opentelemetry.py: 221 passed. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(arize): collapse additive try/except blocks into _safe_emit helper The additive attribute emitters all share the same shape: run a callable, swallow any exception to debug log so it cannot blank the span. Hoisting that pattern into a single _safe_emit(label, fn, *args, **kwargs) helper removes 5 repeated try/except blocks. Behavior unchanged; arize test suite still passes (70/70). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(arize): emit cost under canonical llm.cost.total key Arize's "Total Cost" column reads the OpenInference-standard `llm.cost.total` attribute. The previous custom `llm.response.cost` key never surfaced in the trace list. Now emits both keys (canonical + legacy) so renderers + any existing consumers both work. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(arize): keep span.kind=LLM for tool-using completions + render tool_calls in Output A chat completion that passes `tools=[...]` or returns `tool_calls` is still an LLM call per the OpenInference spec — TOOL is reserved for actual tool execution. The previous override demoted these to TOOL, breaking Arize's LLM-scoped dashboards/evals and skewing token/cost analytics for any tool-using traffic. Additionally, when an assistant response had no text content but did request tool calls, `output.value` was set to the empty string so Arize's "Output" pane rendered blank. Now serializes the tool_calls into a compact JSON summary in `output.value` (the structured `MESSAGE_TOOL_CALLS.*` attributes are still emitted unchanged). Cleanups: - extract `_get_tool_calls` and `_normalize_tool_call` helpers, deduplicating the dict-vs-Pydantic + function-dict logic across `_set_choice_outputs`, `_emit_message_tool_calls`, and the new `_summarize_tool_calls_for_output`. - drop redundant late `OPENINFERENCE_SPAN_KIND` write — the defensive early write is now the single source of truth. - remove a dead local re-import of `MessageAttributes`/`SpanAttributes`. Tests: 73 pass (added regression guard asserting span.kind stays LLM for completions that pass tools AND return tool_calls; existing call_count assertion restored to 26). Co-authored-by: Cursor <cursoragent@cursor.com> * chore(arize): tighten cleanup — fold _get_tool_calls into _safe_get Two tiny cleanups, no behavior change: - collapse `_get_tool_calls` to use `_safe_get`, removing a 7-line hand-rolled dict-vs-attribute fallback that duplicated existing logic. - trim the `_set_choice_outputs` tool-call summary comment from 4 lines to 2 (was over-explaining). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(arize): address Greptile review — drop session_id=trace_id fallback, remove dead code, fix Black Three Greptile-flagged issues + the Black formatting CI failure. 1. SESSION_ID no longer falls back to trace_id. Previously every span without an explicit `user_api_key_end_user_id` would have its session.id set to the per-request trace_id, which creates one distinct "session" per request and breaks Arize's Session-grouping analytics. Now SESSION_ID is emitted only when an explicit end-user identifier exists, and the trace_id is emitted under its own `litellm.trace_id` key so spans remain filterable by trace. 2. Removed dead `ArizeOTELAttributes.set_response_output_messages` override. Confirmed zero callers in the entire repo (the live path is `_set_choice_outputs` via `_set_response_attributes`). The override was preexisting dead code, but the expansion of `_set_choice_outputs` in this PR made the divergence misleading. 3. Removed permanently-dead first branch in cache_write detection. `_safe_get(prompt_token_details, "cache_creation_tokens")` looks for a key that neither OpenAI's `prompt_tokens_details` nor Anthropic's payload ever exposes. Now reads straight off `usage` for `cache_creation_input_tokens`. 4. Reformatted both files under Black 26.3.1 (the version CI uses via `uv sync --frozen`). Local previously used 24.10.0. Tests: 74/74 pass in the arize suite (added `test_arize_does_not_use_trace_id_as_session_id_fallback`). Combined arize + opentelemetry suite: 295/295 pass. End-to-end verified live: tool-call still emits `span.kind=LLM` and JSON tool_calls in `output.value`; `session.id` is now correctly unset when no end_user_id is provided; `litellm.trace_id` is populated; Bedrock passthrough input/output unchanged. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(arize): gate passthrough prompt export on message redaction - Skip the complete_input_dict bridge in _maybe_normalize_passthrough when should_redact_message_logging() is true, so enabling redaction no longer leaks raw passthrough prompts into Arize (Veria security finding). - Split passthrough input/output rendering into helpers to satisfy PLR0915. - Remove dead call_type assignment (F841). Validated live against a Bedrock passthrough proxy exporting to Arize: non-redacted renders the real prompt on litellm_request; global turn_off_message_logging yields input.value=redacted-by-litellm with the raw_gen_ai_request child span suppressed and no SSN/marker leakage. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix: passthrough endpoints duplicate logs (#29598) * fix duplicate cost callbacks for anthropic streaming pass-through Two bugs caused _PROXY_track_cost_callback to see stream=True + complete_streaming_response=None on every streaming pass-through request, making the dedup guard in dispatch_success_handlers permanently inactive: 1. pass_through_endpoints.py created the Logging object with stream=False for all requests. _is_assembled_stream_success short-circuits on self.stream is not True, so has_dispatched_final_stream_success was never set and any second dispatch went through unchecked. Fix: set logging_obj.stream = True after stream detection. 2. _create_anthropic_response_logging_payload set complete_streaming_response inside the try block after litellm.completion_cost(), so a pricing error caused an early return without setting it on model_call_details. Fix: set complete_streaming_response before the try block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix stream * add stream to logging obj * test(pass_through): give mock logging object a real model_call_details dict The anthropic passthrough logging payload now records the assembled response on model_call_details before cost calculation, which requires model_call_details to support item assignment. In production it is always a dict; the existing unit test stubbed the logging object with a bare Mock whose attribute is not subscriptable, so the new assignment raised TypeError. Use a real dict to match the production logging object. * test(pass_through): cover streaming logging-obj stream flag The streaming branch of pass_through_request that marks the logging object as streaming (logging_obj.stream and model_call_details["stream"]) had no unit coverage, so the patch coverage gate flagged it. Add a regression test that drives a streaming pass-through request through pass_through_request and asserts the logging object is flagged as a stream before dispatch. * test(pass_through): cover SSE-response stream flag fallback branch The auto-detected streaming branch of pass_through_request (when a request that was not flagged as streaming returns a text/event-stream response) sets logging_obj.stream and model_call_details["stream"] but had no unit coverage, so the codecov patch gate failed at 60%. Drive a non-streaming pass-through request whose upstream response is SSE through pass_through_request and assert the logging object is flagged as a stream before dispatch. * fix(pass_through): gate complete_streaming_response on stream flag perform_redaction only scrubs complete_streaming_response when model_call_details["stream"] is True. Setting it unconditionally for non-streaming Anthropic pass-through responses left the assembled response unredacted in model_call_details, which is handed to logging callbacks as kwargs when message logging is disabled. Only record it for actual streaming responses so redaction always applies. --------- Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ci): keep coverage rename green when a parallel node runs no tests (#29608) * fix(ci): keep coverage rename green when a parallel node runs no tests local_testing_part1 and local_testing_part2 run with parallelism 4. When CircleCI reruns only the failed tests, the failed test lands on a single node and the other nodes receive an empty bucket, so pytest never writes coverage.xml or .coverage. The unguarded "mv coverage.xml ..." then exits 1 and turns the whole job red even though the rerun passed; the next persist_to_workspace step would fail the same way on the missing paths. Guard the rename so a node with no coverage emits empty placeholders instead. coverage combine tolerates the empty files, so the downstream upload-coverage job keeps the real nodes' data intact. * fix(ci): pre-create test-results in litellm_router_testing for empty-bucket reruns litellm_router_testing also runs with parallelism 4. On a rerun of only the failed tests, a node can receive no tests, so the test command never creates test-results and the final store_test_results step can fail on the missing path. Pre-create the directory up front, matching what local_testing_part1 and part2 already do and CircleCI's own guidance for parallel reruns. * test(openai): retry wildcard chat completion on transient OpenAI 500 build_and_test reddened on test_openai_wildcard_chat_completion when the real gpt-3.5-turbo-0125 call returned an OpenAI 500 ("The server had an error while processing your request"). The base branch passed the same call concurrently, so the 500 is an intermittent OpenAI server error, not a regression. Add the same pytest-retry marker the sibling real-call tests in this file already use so a transient upstream 500 no longer fails CI. * test(vcr): close out the remaining VCR live-call leaks (#29603) * Fix remaining VCR live-call leaks * test(vcr): dedupe live-test helpers and drop spurious kwargs Extract the duplicated isVertexQuotaError/runVertexRequestOrSkip Vertex quota-skip helpers into tests/pass_through_tests/vertex_test_helpers.js and the duplicated _skip_live_prompt_caching_test guard into tests/_live_test_helpers.py so each lives in one place. In test_aarun_thread_litellm, build a separate message_data carrying role/content for add_message and a thread_data without them for run_thread/run_thread_stream/get_messages, which no longer receive the spurious message fields. * test(overhead): assert mock transport is exercised in non-streaming and stream tests * fix(key_generate): exempt UI/CLI session tokens from the budget ceiling for team keys (#29612) Non-admin users creating a team key through the UI were rejected with "max_budget cannot exceed the caller's own max_budget (0.25)". The request is authenticated by a UI/CLI session token whose max_budget is the per-session chat spend cap (max_ui_session_budget, default $0.25), and the delegated-authority budget ceiling (GHSA-q775-qw9r-2r4g) treated that cap as a delegation limit. Skip the ceiling only when a session token creates a team key (data.team_id set); that key's spend is bounded by the team budget at request time. Personal keys and every other non-admin caller keep the ceiling, so a session token cannot mint an arbitrary-budget personal key. * fix(realtime): allow null transcripts in stream logging payloads (#29625) Allow realtime event transcript fields to be nullable so GA conversation.item payloads with transcript=null don't fail logging normalization and suppress success callbacks. Co-authored-by: Cursor <cursoragent@cursor.com> * build(ui): migrate eslint to flat config and bump eslint-config-next to 16 (#29626) ESLint 9 defaults to flat config and eslint-config-next was pinned at 15 while Next is on 16, so eslint only ran with ESLINT_USE_FLAT_CONFIG=false and next lint is gone on Next 16. Replace .eslintrc.json with a native flat eslint.config.mjs (config-next 16 ships flat configs, so no FlatCompat shim is needed), bump eslint-config-next to 16.2.6, add @eslint/js and typescript-eslint as explicit devDeps for the recommended rule sets, and point the lint script at eslint directly. This only makes eslint runnable on modern tooling; it does not wire it into CI. The same rules carry over (next/core-web-vitals, eslint and typescript-eslint recommended, prettier, unused-imports) * fix(key_generate): scope session-token team-key budget exemption to caller-supplied team_id (#29641) #29612 exempts UI/CLI session tokens from the key budget ceiling when they create a team key, keyed on data.team_id. That value is read after the default_key_generate_params loop can populate team_id, so on deployments that set default_key_generate_params.team_id a request the caller did not scope to a team is treated as a team key and skips the ceiling. Capture _requested_team_id before defaults run and key the exemption off it, mirroring how _requested_max_budget is already captured. Requests the caller did not scope to a team keep the ceiling. * fix(proxy): disable proxy buffering on streaming SSE responses (#29557) Streaming responses from the proxy (/chat/completions, /v1/messages, /v1/responses, assistants) all return through create_response() but never sent the headers that tell an intermediary reverse proxy not to buffer the SSE stream. nginx with the default proxy_buffering, k8s ingress-nginx, and Envoy/Istio sidecars therefore hold the whole stream and release it in one batch, which looks like a broken/buffered stream to the client even though litellm is yielding chunks incrementally. Add Cache-Control: no-cache and X-Accel-Buffering: no to every StreamingResponse create_response() returns, matching what the proxy already does for its own usage/policy SSE endpoints. Fixes #28384. * fix(mcp): gate /public/mcp_hub strictly on litellm.public_mcp_servers (#27764) * fix(mcp): gate /public/mcp_hub strictly on litellm.public_mcp_servers * fix(mcp): add public_mcp_hub_strict_whitelist flag (default True) for migration * ci(ui): frontend-lint job enforcing prettier + eslint on changed files (#29633) * ci(ui): add frontend-lint job enforcing prettier and eslint on changed files Lints only the files a PR adds or modifies under ui/litellm-dashboard, so new and touched code must be prettier-clean and eslint-clean while the existing tree is grandfathered. Skips cleanly when a PR touches no lintable UI files. This lets us adopt the formatters incrementally without a repo-wide reformat * ci(ui): write frontend-lint file lists to $RUNNER_TEMP Keep the prettier/eslint changed-file lists out of the checkout dir so they cannot collide with a future source file of the same name * lint(ui): baseline existing eslint findings so only new ones block Capture the current error-level eslint findings (318 across 183 files) in a committed suppressions baseline via eslint --suppress-all. Every rule stays at its error severity, so any newly introduced violation fails the frontend-lint gate, while the existing tree is grandfathered; touching a legacy file never forces fixing its pre-existing issues. CI runs eslint with --pass-on-unpruned-suppressions so that fixing a baselined issue does not fail on a now-stale suppression, and the generated baseline is prettier-ignored since eslint owns its format. Burn the baseline down over time with eslint --prune-suppressions * lint(ui): enforce a count budget for explicit any Make @typescript-eslint/no-explicit-any a warning and cap the total instead of hard-blocking each new one. A frontend-lint step counts the repo-wide explicit any and fails only when it exceeds the committed budget in eslint-any-budget.json. max starts at 2031, ten above the current 2021, so the next ten land as warnings and the build fails once that headroom is gone. Lower max over time toward target to ratchet the count down. New anys still surface as warnings on changed files via the normal eslint step * lint(ui): enable zero-cost rules no-var, no-self-assign, react/no-danger These have no existing violations, so they need no baseline; turning them on purely blocks new instances. react/no-danger guards against new dangerouslySetInnerHTML (XSS), no-var enforces let/const, and no-self-assign catches self-assignment typos. no-debugger is already enforced by the recommended preset * lint(ui): add baselined complexity rules Enable complexity:20, max-depth:4, max-params:4, max-nested-callbacks:4, with thresholds set near the codebase p99 so only genuine outliers are flagged. The 272 existing over-threshold functions are grandfathered in the suppressions baseline; new over-threshold functions block. Lower the thresholds over time to ratchet complexity down. max-lines-per-function is intentionally left off since React components are legitimately long * lint(ui): ban new raw fetch, standardize on React Query Add a no-restricted-syntax rule flagging bare fetch() calls, pointing contributors at React Query (@tanstack/react-query). The rule is not exempted anywhere, including the already-bloated networking.tsx, so all 331 existing fetch calls are grandfathered but no new ones can be added there or elsewhere. New data access goes through React Query, and the networking layer can be migrated out and pruned from the baseline over time * lint(ui): ban new @tremor/react imports Add a no-restricted-imports rule flagging imports from @tremor/react so tremor is phased out rather than spread further. The 232 existing tremor imports are grandfathered in the baseline; new ones block and point at antd. Migrate components off tremor and prune the baseline over time * lint(ui): widen explicit-any budget headroom to 2040 Raise max from 2031 to 2040, giving ~19 of slack over the current 2021 instead of 10 * style(ui): prettier-format eslint.config.mjs The frontend-lint gate flagged its own config file. Format it so the prettier check on this PR's changed files passes * lint(ui): soften complexity and max-depth to warnings These two are smell metrics with arbitrary thresholds where a legit new function can trip them, so make them advisory rather than hard-blocking. They drop out of the baseline (now 963). max-params, max-nested-callbacks, and the react-hooks rules stay strict since those are clear-cut * lint(ui): move complexity and max-depth to the count-budget pattern Generalize the explicit-any budget into a shared lint-budget mechanism: eslint-budgets.json maps a rule to {max, target} and check-lint-budgets.mjs counts each across the repo and fails when a count exceeds its max. complexity (129, max 140) and max-depth (61, max 70) now use the same slack-plus-counter model as explicit-any (2021, max 2040): they warn per-file and the build only fails if the repo-wide total crosses the ceiling. Lower each max toward its target over time * docs(ui): note pruning the eslint suppressions baseline when fixing lint debt * fix(gemini): googleSearch + server-side tools and googleMaps JSON schema (#29582) * fix(gemini): keep googleSearch with server-side tools and googleMaps JSON schema Wire include_server_side_tool_invocations through completion() so mixed google_search and function tools are not dropped on Gemini 3+. Rewrite generationConfig to responseFormat when googleMaps is used with JSON schema. Fixes #27479 Fixes #29451 Co-authored-by: Cursor <cursoragent@cursor.com> * address greptile review feedback (greploop iteration 1) * style: fix black formatting in main.py for py312 compat * Fix Gemini Google Maps extra_body JSON rewrite --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): passthrough 404 when SERVER_ROOT_PATH is set (#29658) * fix(proxy): match passthrough registry routes bare-to-bare with SERVER_ROOT_PATH After #28547, get_request_route strips the deployment prefix while registry lookup still re-inflated stored paths via SERVER_ROOT_PATH, causing 404s under paths like /llmproxy/ml. Compare normalized bare routes in both is_registered_pass_through_route and get_registered_pass_through_route. Co-authored-by: Cursor <cursoragent@cursor.com> * test(proxy): patch utils.get_server_root_path in passthrough auth tests After removing get_server_root_path from pass_through_endpoints, route and JWT tests must mock litellm.proxy.utils where normalization reads it. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility (#29662) * fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility Pipecat v1.3.0 adopted the OpenAI Realtime API GA event naming: response.audio.delta -> response.output_audio.delta response.text.delta -> response.output_text.delta response.audio.done -> response.output_audio.done response.text.done -> response.output_text.done The proxy was still emitting the old beta names; Pipecat's `parse_server_event` raises "Unimplemented server event type" for any unknown type, which killed the receive task handler and broke audio playback and tool-call delivery. Also: - conversation.item.created -> conversation.item.added (already handled) - client audio is buffered until backend setupComplete in deferred mode - call_id fallback UUID when Gemini returns empty id - status_details / token detail fields added to Pydantic-strict events The _GA_TO_BETA_EVENT_TYPES map in RealTimeStreaming already translates GA names back to beta for clients that opt in with the openai-beta header, so legacy clients are unaffected. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(gemini-realtime): address greptile review comments - emit outputTranscription as response.output_audio_transcript.delta instead of suppressing it; GA_TO_BETA map handles translation for legacy clients - cap pre-setup audio buffer at 200 frames to prevent memory exhaustion; log a warning when the limit is hit and additional frames are dropped - log remaining dropped message count on flush error Co-authored-by: Cursor <cursoragent@cursor.com> * fix(gemini-realtime): address veria review comments - remove unused OpenAIRealtimeConversationItemCreated import - fix guardrail bypass: semantic_vad early-return now preserves create_response when set so a guardrail-injected create_response:false is not silently dropped - add per-connection 10 MB byte cap alongside the 200-frame count cap for the pre-setup audio buffer to prevent memory exhaustion Co-authored-by: Cursor <cursoragent@cursor.com> * fix(gemini-realtime): fix mypy arg-type on _finalize_gemini_live_setup setup parameter typed as BidiGenerateContentSetup to match the TypedDict passed at both call sites; was dict which mypy rejected. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(gemini-realtime): widen _finalize_gemini_live_setup to Dict[str, Any] BidiGenerateContentSetup (TypedDict) is a subtype of Dict[str,Any] so both call sites (one passing a plain dict, one passing the TypedDict) satisfy mypy. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(gemini-realtime): cast BidiGenerateContentSetup to Dict at _finalize call site mypy rejects TypedDict as dict[str, Any] argument; cast at the call site where follow_up_setup is BidiGenerateContentSetup to satisfy the checker. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix Gemini realtime beta compatibility * Fix deferred Gemini setup audio ordering * fix: preserve Gemini audio transcript ids * fix(realtime): cap pre-setup client buffer on all append paths Route every append to the deferred-setup pending buffer through the per-connection message/byte caps. Previously only the audio-buffer fast path enforced the caps; once one frame was buffered, a client that withheld session.update could stream arbitrary frames into _pending_messages_until_setup unbounded and exhaust proxy memory. * style(gemini-realtime): apply black formatting to transformation.py * fix(gemini-realtime): log beta-translation fallback and name native-audio marker Surface the previously swallowed exception in _send_event_to_client so a failed GA->beta translation is observable instead of silently forwarding the untranslated event. Extract the native-audio model substring used by _finalize_gemini_live_setup into a named constant documenting why speechConfig is dropped on those setups. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * Litellm oss staging 040626 (#29671) * fix(azure): apply api_version fallback chain to image edit URL `AzureImageEditConfig.get_complete_url` only read `api_version` from `litellm_params`. When callers configured it via `litellm.api_version` or `AZURE_API_VERSION`, the constructed URL had no `?api-version=` and Azure responded `404 Resource not found`. Apply the same fallback chain the Azure chat path already uses in `common_utils.py`: litellm_params > litellm.api_version > AZURE_API_VERSION env > litellm.AZURE_DEFAULT_API_VERSION Adds 5 unit tests pinning each layer of the chain plus a regression guard for `api_base` that already carries `?api-version=`. * feat(mcp): core sampling and elicitation flow with security hardening - Add sampling_handler.py: full MCP sampling/createMessage flow with model selection (hint-based + priority-based), auth enforcement, budget checks, route restriction gates, and tag policy pre-auth - Add elicitation_handler.py: MCP elicitation/create relay with downstream client capability detection - Wire sampling/elicitation callbacks in mcp_server_manager.py gated behind allow_sampling/allow_elicitation config flags - Add allow_sampling/allow_elicitation fields to MCPServer type - Fix session lock deadlock: skip lock for JSON-RPC response POSTs (elicitation/sampling replies) with truncated-body heuristic - Extend client.py with sampling_callback and elicitation_callback - Security: RouteChecks gate, tag-budget bypass fix, x-forwarded-for spoofing fix, Latin-1 header encoding guard - Add 4 new test modules (model access, priority selection, request builder, tool conversion) + update existing MCP tests * fix(security): run pre-call guardrails before MCP sampling acompletion Without this, an upstream MCP server with allow_sampling enabled could send prompts that bypass every guardrail (content filtering, PII redaction, prompt-injection detection) configured on /chat/completions. - Call proxy_logging_obj.pre_call_hook(call_type='acompletion') before llm_router.acompletion so guardrails fire for sampling sub-calls - Add HTTPException to the re-raise list so guardrail rejections propagate correctly instead of being swallowed as generic errors * feat(bedrock_mantle): add Responses API support (/openai/v1/responses) (#29490) * feat(bedrock_mantle): add Responses API transformation config * test(bedrock_mantle): cover trailing-slash api_base normalization * feat(bedrock_mantle): export BedrockMantleResponsesAPIConfig * feat(bedrock_mantle): register gpt-5.x Responses config (gpt-oss unchanged) * feat(bedrock_mantle): add gpt-5.5/gpt-5.4 Responses price-map entries * refactor(bedrock_mantle): exclude gpt-oss instead of allow-listing gpt-5 for Responses routing Frontier OpenAI models on Bedrock Mantle are Responses-only on /openai/v1/responses; gpt-oss is the legacy family that also speaks chat-completions. Gate by excluding gpt-oss (which keeps its chat-completions emulation) and defaulting everything else to the native Responses config, so future frontier models (gpt-6, etc.) route correctly without a code change. Verified against the live us-east-2 Mantle endpoint: gpt-oss 400s on /openai/v1/responses while gpt-5.5 400s on both standard paths. * test(bedrock_mantle): cover supports_native_websocket opt-out Closes the one uncovered line flagged by codecov on the Responses config. The assertion documents that Mantle Responses has no realtime/websocket transport, so realtime routing must not attempt a socket it cannot serve. * fix(bedrock_mantle): route file_search through emulation instead of forwarding to Mantle BedrockMantleResponsesAPIConfig inherited supports_native_file_search() -> True from OpenAIResponsesAPIConfig but never overrode it. Mantle has no OpenAI vector stores, so a forwarded file_search tool is rejected with a 400 (verified upstream: Tool type 'file_search' is not supported). Opting out, like the existing supports_native_websocket override, routes the tool through LiteLLM's file_search emulation instead. * fix(bedrock_mantle): only route ope…

Sameerlite added 3 commits June 3, 2026 10:15

Add support for websocket via codex

bcdc58a

Add model alias and creds support

c9f8aa9

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tests/test_litellm/test_router.py

Comment thread litellm/proxy/response_api_endpoints/endpoints.py Outdated

Comment thread litellm/responses/streaming_iterator.py Outdated

veria-ai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread litellm/proxy/response_api_endpoints/endpoints.py Outdated

Sameerlite changed the title ~~Litellm websocker improvements~~ Litellm websocket improvements Jun 3, 2026

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread litellm/responses/streaming_iterator.py Outdated

refactor: extract WS model extraction into testable function

ac05320

Pull the flat/nested model extraction into _extract_model_from_first_ws_event so tests import and exercise the real function rather than a copy.

Sameerlite added 2 commits June 3, 2026 18:21

style: black formatting

362d919

refactor: extract first-frame model resolution to fix PLR0915 (too ma…

dc1d4ab

…ny statements)

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread litellm/responses/streaming_iterator.py

Comment thread litellm/proxy/response_api_endpoints/endpoints.py

Comment thread litellm/proxy/response_api_endpoints/endpoints.py

Fix responses WebSocket first-frame validation

348853e

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Merge branch 'litellm_internal_staging' into litellm_websocker_improv…

877f346

…ements

mateo-berri approved these changes Jun 3, 2026

View reviewed changes

mateo-berri merged commit 2453936 into litellm_internal_staging Jun 3, 2026
119 checks passed

mateo-berri deleted the litellm_websocker_improvements branch June 3, 2026 18:48

Uh oh!

Conversation

Sameerlite commented Jun 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Security review

Uh oh!

Sameerlite commented Jun 3, 2026

Uh oh!

Sameerlite commented Jun 3, 2026

Uh oh!

Uh oh!

Sameerlite commented Jun 3, 2026

Uh oh!

Sameerlite commented Jun 3, 2026

Uh oh!

Sameerlite commented Jun 3, 2026

Uh oh!

Sameerlite commented Jun 3, 2026

Uh oh!

Sameerlite commented Jun 3, 2026

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sameerlite commented Jun 3, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Sameerlite commented Jun 3, 2026

Uh oh!

mateo-berri commented Jun 3, 2026

Uh oh!

mateo-berri commented Jun 3, 2026

Uh oh!

mateo-berri commented Jun 3, 2026

Uh oh!

mateo-berri commented Jun 3, 2026

Uh oh!

mateo-berri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krrish-berri-2 commented Jun 4, 2026

Uh oh!

krrish-berri-2 commented Jun 4, 2026

Uh oh!

Sameerlite commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Sameerlite commented Jun 3, 2026 •

edited by cursor Bot

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading

greptile-apps Bot commented Jun 3, 2026 •

edited

Loading

veria-ai Bot commented Jun 3, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading

CLAassistant commented Jun 3, 2026 •

edited

Loading

Sameerlite commented Jun 4, 2026 •

edited

Loading