[Infra] Promote internal staging to main by yuneng-berri · Pull Request #27906 · BerriAI/litellm

yuneng-berri · 2026-05-14T04:43:49Z

Relevant issues

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

… path ``get_instance_fn`` previously routed any ``s3://`` / ``gcs://`` value into ``_load_instance_from_remote_storage`` regardless of how the value got there. The function ultimately calls ``spec.loader.exec_module(module)`` — Python in the proxy process. On admin-callable endpoints that accept a ``target`` / ``custom_handler`` field from the request body (e.g. ``/config/pass_through_endpoint``, custom-callback registration), that is a one-step admin-to-RCE primitive: any future privilege-escalation bug becomes immediate code execution. The documented operator flow for remote-module loading is ``litellm_settings.callbacks: ["s3://bucket/module.instance"]`` in ``config.yaml``. That path always carries the YAML's ``config_file_path`` through to ``get_instance_fn``. Use the presence of ``config_file_path`` as the discriminator: refuse remote URLs when it is absent (the request-body path) unless the operator explicitly opts back in via ``LITELLM_ALLOW_REMOTE_INSTANCE_FN_FROM_API=true``. The three success/failure/audit-log callback-loop call sites in ``proxy_server.py:load_config`` were already running inside the startup config-file load but had stopped threading ``config_file_path`` through. Pass it through so the documented ``s3://`` callback flow continues to work unchanged. Tests cover: remote URL without ``config_file_path`` raises; remote URL with the opt-in env reaches the loader; remote URL with ``config_file_path`` passes (documented startup flow); local dotted-name imports unaffected.

The pre-existing s3:// / gcs:// custom-logger tests called ``get_instance_fn`` without ``config_file_path``, which means the new runtime gate (refuse remote URLs unless invoked from a config-file load) now raises ``ValueError`` before reaching the mocked download paths. Each test was exercising the documented startup config-file load scenario; pass ``config_file_path="/any/path"`` to make that intent explicit and route past the gate. Affected: test_s3_download_success, test_gcs_download_success, test_invalid_url_format, test_download_failure_handling, test_file_cleanup.

The runtime gate on s3://gcs:// loading in get_instance_fn previously allowed an opt-in via LITELLM_ALLOW_REMOTE_INSTANCE_FN_FROM_API. That env var is admin-flippable at runtime (DB-overlay environment_variables flow into os.environ), which defeats the gate's purpose, and it isn't needed for the documented operator flow: config.yaml callbacks always pass config_file_path through to the loader. Remove the helper, raise unconditionally when config_file_path is None, and drop the corresponding test for the opt-in branch.

…ML loaders The previous commit's gate broke two legitimate startup paths for operators using s3://gcs:// remote module loading from their config.yaml: - general_settings.pass_through_endpoints[].custom_handler - mcp_tools[].handler Both call sites called get_instance_fn without a config_file_path, so the new gate rejected them at startup. Thread config_file_path through: - create_pass_through_route accepts config_file_path and forwards it to get_instance_fn. add_exact_path_route, add_subpath_route, _register_pass_through_endpoint, and initialize_pass_through_endpoints accept and propagate it. - The YAML-load call site in proxy_server.load_config now passes config_file_path; the DB-overlay call site in _update_general_settings leaves it as the default None so the gate still fires on admin-written s3:// values. - MCPToolRegistry.load_tools_from_config accepts config_file_path and threads it into get_instance_fn; _init_non_llm_configs forwards it from load_config. Adds two regression tests verifying that the YAML-source callers thread the path through to get_instance_fn.

@Sameerlite

) * fix(gemini): normalize response_schema on native generateContent The /v1beta/models/{model}:generateContent passthrough forwarded generationConfig.response_schema verbatim, so schemas containing $defs, $ref, anyOf-with-null, default, or title were rejected by Gemini even though /chat/completions already handles them. GoogleGenAIConfig.transform_generate_content_request now calls a new _normalize_response_schema helper that mirrors the chat/completions path: Gemini 2.0+ models get the schema promoted to responseJsonSchema via _build_json_schema (preserving $defs/$ref natively), older models keep responseSchema but the schema is flattened with _build_vertex_schema. VertexAIGoogleGenAIConfig (which overrides the transform entirely) calls the same helper before building the request. * fix(gemini): preserve caller-supplied responseJsonSchema when responseSchema co-present Previously, when both responseJsonSchema and responseSchema were present on Gemini 2.0+, _normalize_response_schema processed responseJsonSchema first (no-op normalization) then unconditionally promoted responseSchema to responseJsonSchema, clobbering the caller-supplied value. Now skip the promotion (and drop the redundant responseSchema) when the caller already supplied responseJsonSchema. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * chore: strip restating comments from response-schema normalize Drop the docstring on _normalize_response_schema and the two inline comments that just restated what the surrounding code/asserts already say. Function name + variable names carry the intent; PR description covers the why-it-exists context. * perf(gemini): drop redundant deepcopy on responseJsonSchema normalize _build_json_schema is a no-op (returns its argument unchanged), so the deepcopy + round-trip on the responseJsonSchema branch allocated a full schema copy on every request with no observable effect. Forward the caller's value as-is, and just move the popped responseSchema value when promoting on Gemini 2.0+. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * style: remove unneeded comment * fix(gemini): drop unsupported responseJsonSchema for older models * test(gemini): add parity test between native and chat schema normalization Per @Sameerlite review: lock the two Gemini schema-normalization paths together. If either GoogleGenAIConfig._normalize_response_schema (native generateContent) or VertexGeminiConfig.apply_response_schema_transformation (/chat/completions) drifts, the parity test fails — forcing both to be updated together. * fix(google_genai): preserve key naming convention in _normalize_response_schema When the input schema key is snake_case (response_schema), the promoted JSON schema key should also be snake_case (response_json_schema) instead of mixing in camelCase (responseJsonSchema). This matters for the Vertex AI google_genai path which converts all keys to snake_case before calling _normalize_response_schema. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>

…crypted_content (#27820)

…th when flag set (#27716) * feat(proxy): skip disable_background_health_check models on GET /health when flag set Co-authored-by: Cursor <cursoragent@cursor.com> * fix comment * fix greptile comments * Fix health check fallback kwargs * Format health endpoint * Harden direct health check kwargs compatibility for monkeypatched perform_health_check Replace substring-based TypeError detection with unexpected-keyword checks and a short retry chain (full kwargs, instrumentation only, filter only, minimal) so partial stubs work regardless of which optional kwarg fails first. Add proxy unit tests for legacy three-arg stubs and single-kwarg variants. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix black --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>

…ocks (#27850) * fix(bedrock-converse): drop blank-text fallback for empty thinking blocks Claude Code with extended thinking replays prior assistant turns that include an empty thinking block (`thinking=""`, `signature=""`) alongside tool_use blocks. The unsigned-reasoning fallback in `add_thinking_blocks_to_assistant_content` was emitting `BedrockContentBlock(text="")`, which Bedrock Converse rejects with: "The text field in the ContentBlock object at messages.X.content.0 is blank." Guard the fallback with a strip() check, matching the existing empty-text guards elsewhere in `_bedrock_converse_messages_pt`. * style: remove unneeded comments

…lidate LiteLLM_JWTAuth.__init__ calls get_instance_fn(custom_validate) without config_file_path, so an operator who configures custom_validate: s3://bucket/module.fn in their YAML JWT auth section would hit the runtime gate on startup and break their deployment. Accept config_file_path as a non-field kwarg (popped before the invalid-keys check), thread it into get_instance_fn, and pass it from the startup-load callsite via the existing user_config_file_path module-level path. Admin-API JWT config writes leave the kwarg at None and still hit the gate.

* fix(mcp): surface upstream 401 for token-forwarding MCP servers For MCP servers configured with extra_headers: [Authorization], the gateway forwards the client token directly to the upstream. When that token is rejected (expired or invalid) the upstream returns 401, but the MCP SDK starts the SSE stream with 200 OK before calling handlers, so the 401 can't be returned mid-stream. Fix: add a pre-flight httpx probe in handle_streamable_http_mcp — before the SDK opens the session — so the gateway can still return HTTP 401 with WWW-Authenticate: Bearer authorization_uri=<gateway-discovery-url> when the upstream rejects the token. The probe fails-open (returns 200) on network errors so a transient hiccup does not block valid requests. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): parallelize pre-flight auth probes and use HEAD to avoid side effects - Extract forwarded_auth outside the pass-through server loop (was called N times for the same scope value) - Gather all upstream auth probes concurrently with asyncio.gather instead of sequentially; eliminates N×5 s worst-case latency - Switch probe from POST+initialize JSON-RPC body to HEAD request; HEAD carries the Authorization header so the upstream rejects invalid tokens with 401 but never allocates a session or writes an audit entry Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): use get_async_httpx_client in _probe_upstream_auth Replaces bare httpx.AsyncClient with the project-standard get_async_httpx_client(httpxSpecialProvider.MCP) to satisfy the ensure_async_clients_test code coverage check and avoid the +500 ms per-request overhead of creating a new client on every probe call. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(mcp): extract pre-flight probe into _check_passthrough_upstream_auth Moves the parallel upstream auth probe logic out of handle_streamable_http_mcp into a dedicated helper to satisfy Ruff PLR0915 (Too many statements > 50). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): gate pre-flight probes on authorized server set to prevent bypass _check_passthrough_upstream_auth was resolving user-supplied server names directly before authorization ran, letting any permitted LiteLLM key trigger an upstream HEAD probe to a server it was not allowed to use. Changes: - Call _get_allowed_mcp_servers inside the helper so only servers the caller's key is authorized for are probed. - Move the call site to after toolset scoping so the auth context is fully resolved before the probe list is built. - Thread user_api_key_auth into the helper signature (replaces the raw mcp_servers name list). Co-authored-by: Cursor <cursoragent@cursor.com> * Add async HTTP HEAD support Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): use Scope type annotation in _get_forwarded_auth_from_scope Co-authored-by: Cursor <cursoragent@cursor.com> * Fix MCP upstream auth probe method Co-authored-by: Yassin Kortam <yassin@berri.ai> * Remove unused AsyncHTTPHandler head method Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): exclude has_client_credentials servers from pre-flight auth probe _prepare_mcp_server_headers skips caller Authorization when the server uses OAuth client-credentials (M2M), but the pre-flight probe was still selecting those servers and forwarding the caller's raw token in the HEAD request. Exclude servers with has_client_credentials from the probe list to match the actual downstream header-preparation logic. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): propagate upstream 403 as 403, not 401 with WWW-Authenticate Per RFC 9110, 401 means "go get new credentials." Mapping an upstream 403 to a gateway 401 causes OAuth clients to restart the authorization flow, obtain a fresh token with identical scopes, hit 403 again, and loop indefinitely. 401 from upstream → gateway 401 + WWW-Authenticate (re-authorize) 403 from upstream → gateway 403 (no WWW-Authenticate hint) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): skip auth probe when Authorization may be the LiteLLM proxy key The pre-flight upstream probe must not forward the caller's Authorization header when it could itself be the LiteLLM proxy API key. Restrict the probe to requests that supply x-litellm-api-key explicitly — only then is the Authorization header unambiguously the upstream OAuth token the caller wants forwarded. * Fix MCP ASGI HTTPException propagation Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): use public AsyncHTTPHandler.post() in auth probe Use AsyncHTTPHandler.post() and catch httpx.HTTPStatusError explicitly so the 401/403 we want to surface is not silently swallowed by the broad fail-open except Exception block. Avoids reaching into the handler's private client attribute, which would silently regress to fail-open if AsyncHTTPHandler is ever refactored. * Fix MCP auth probe tests Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(mcp): add coverage for httpx.HTTPStatusError path in auth probe AsyncHTTPHandler.post() calls raise_for_status() internally, so a real upstream 401/403 lands as httpx.HTTPStatusError. Add a test that exercises that specific exception path so a regression that swallows the error in the broad fail-open except Exception would be caught. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: claude-bot <claude-bot@anthropic.com>

…timodal pricing (#27848) * fix(cost): align vertex_ai/gemini-embedding-2-preview with Vertex multimodal pricing Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cost): align vertex_ai/gemini-embedding-2 GA source URL with preview Per Greptile review on #27848: GA entry referenced ai.google.dev while the preview entry was updated to the canonical Vertex AI pricing page. Both share identical pricing values; sync the source URL for consistency. https://claude.ai/code/session_01W8jRwstnmduadGw8Z8egxe --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude <noreply@anthropic.com>

…27834) * feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough Adds an opt-in per-server flag that lets clients (e.g. VS Code) complete PKCE directly with an upstream OAuth2 MCP server, instead of LiteLLM double-gating with its own API-key/SSO check. Only honored when auth_type=oauth2 and the operator explicitly sets the flag; mixed-target or non-oauth2 requests fail closed. - Adds the field to Pydantic models, Prisma schema, and a migration - New MCPRequestHandler._target_servers_delegate_auth_to_upstream gate that runs only when no x-litellm-api-key is present, so authenticated users still get user_id resolution + stored-credential lookup - Anonymous callers now see delegate servers in get_allowed_mcp_servers (scoped to delegate servers only; the upstream still enforces auth) - mcp_management_endpoints: allow anonymous /authorize and /token for delegate servers so VS Code can complete PKCE without a LiteLLM session - UI toggle (shown only for oauth2) + payload/view wiring - Tests covering: oauth2 on/off, non-oauth2 with flag, mixed targets, no resolvable target, explicit key precedence, and 401 emission Co-authored-by: Cursor <cursoragent@cursor.com> * Enforce oauth2 for delegated MCP auth bypass Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): close secondary Authorization bypass for delegate servers The delegate-auth bypass gated only on the primary `x-litellm-api-key` header, so a LiteLLM key sent via `Authorization: Bearer sk-...` (the secondary header) was silently dropped — skipping spend tracking and rate limiting. Gate on the resolved litellm_api_key (which considers both headers) so the bypass fires only when neither is present. Also update the existing "Authorization header present" test to reflect that an upstream OAuth token now flows through the existing oauth2 fallback (LiteLLM auth attempt → fail → anonymous), not via the delegate branch. Co-authored-by: Cursor <cursoragent@cursor.com> * Avoid duplicate MCP OAuth credential lookup Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): block delegate bypass for M2M and internal-only servers Two security issues flagged in code review: 1. High – client_credentials (M2M) servers must not be delegatable: LiteLLM auto-fetches the upstream token using stored credentials, so allowing anonymous bypass would let any external caller invoke tools authenticated as LiteLLM's service account. Fix: check `server.has_client_credentials` in `_target_servers_delegate_auth_to_upstream`, the anonymous allow-list in `get_allowed_mcp_servers`, and `_mcp_oauth_user_api_key_auth`. 2. Medium – internal-only servers exposed to public internet: The anonymous delegate allow-list was not filtering by `available_on_public_internet`, so external callers with an upstream OAuth token could invoke tools on servers marked internal-only. Fix: add `available_on_public_internet` guard to the anonymous delegate server list in `get_allowed_mcp_servers`. Tests added for both cases. Co-authored-by: Cursor <cursoragent@cursor.com> * Require public MCP delegate auth servers Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): align delegate auth path parsing with downstream routing `_extract_target_server_names_from_path` used a naive segments-based split while `server.py::_get_mcp_servers_in_path` uses a regex that allows server names with one embedded slash and comma-separated lists. With the old parser, a request to `/mcp/<delegated>/<garbage>` was parsed as targeting `<delegated>` by the auth gate (bypassing LiteLLM auth) while the routing layer parsed it as `<delegated>/<garbage>` — when that name did not resolve, the request fell back to the anonymous allow-list, which can include `allow_all_keys` servers that normally require a LiteLLM key. Replace the parser with the same regex logic as `_get_mcp_servers_in_path` so auth gating sees the exact target name(s) downstream routing sees. Add regression tests covering parser parity and the specific extra-path-segment bypass attempt. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * fix(mcp): close header/path TOCTOU in MCP delegate auth gate `_target_servers_delegate_auth_to_upstream` and `_target_servers_use_oauth2` trusted the `x-mcp-servers` header when present, but `server.py::extract_mcp_auth_context` overrides that header with the path-derived list for `/mcp/...` routes. An attacker could set `x-mcp-servers: <delegated>` while pointing the URL path at a non-delegate server, flipping the auth gate without changing the target downstream routing actually uses. Extract a shared `_resolve_target_server_names` helper that mirrors the downstream override (path-derived names for `/mcp/...` routes, header value otherwise). Add regression tests covering the TOCTOU attempt and the helper's path-vs-header precedence. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * Fix delegated MCP OAuth test mock Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): drop unreachable /{server}/mcp branch in auth path parser `_extract_target_server_names_from_path` also matched the ``/{server_name}/mcp`` form, but the downstream parser ``_get_mcp_servers_in_path`` only handles ``/mcp/...`` — and ``dynamic_mcp_route`` in ``proxy_server`` rewrites ``/{name}/mcp`` to ``/mcp/{name}`` on the scope before the MCP handler runs. Parsing the un-rewritten form on the auth side was therefore unreachable in production, and contradicted the docstring's claim of mirroring the downstream parser — exactly the kind of mismatch that risks a future header/path TOCTOU if any new entry point skips the rewrite. Drop the branch; the canonical ``/mcp/...`` path matches both parsers. Update the regression test to assert the new behavior. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * Fix MCP path auth target resolution Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): require auth for refresh_token grants on delegate-auth servers `_mcp_oauth_user_api_key_auth` gates the unauthenticated PKCE flow for ``delegate_auth_to_upstream`` servers, but the bypass applied to BOTH ``/authorize`` and ``/token`` regardless of grant type. ``mcp_token`` accepts ``grant_type=refresh_token`` as well as ``authorization_code``, and ``exchange_token_with_server`` attaches the server's stored ``client_secret`` to whatever is forwarded upstream. An unauthenticated caller holding a refresh token issued to that OAuth client could mint fresh upstream access tokens through LiteLLM. Limit the anonymous bypass on ``/token`` to ``grant_type=authorization_code`` (the only grant PKCE actually protects via ``code_verifier``); fall through to normal LiteLLM auth for ``refresh_token`` and any other grant. ``/authorize`` continues to allow anonymous PKCE redirects. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * fix(ui): clear delegate_auth_to_upstream when switching off oauth2 The ``delegate_auth_to_upstream`` form field is rendered inside an ``isOAuth2 && (...)`` conditional, so the Form.Item unmounts when the user changes ``auth_type`` away from ``oauth2``. The follow-up ``form.setFieldValue("delegate_auth_to_upstream", false)`` runs after the field has already deregistered, so ``onFinish`` receives ``undefined`` and the fallback ``?? mcpServer.delegate_auth_to_upstream`` preserved the old ``true``. The flag then persisted in the database for a non-oauth2 server and silently re-activated if ``auth_type`` was later switched back to ``oauth2``. In the edit payload, force the flag to ``false`` whenever ``auth_type !== oauth2``; only trust the form value (and the existing DB fallback) when the server is actually oauth2. Backend defense-in-depth already ignores the flag for non-oauth2 servers, but the DB state should stay clean too. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * Fix MCP delegate auth reset on edit Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Claude <claude@anthropic.com>

…etion transformation (#27727) * fix(responses): preserve cache_control in Responses API -> Chat Completion transformation cache_control injected by AnthropicCacheControlHook was silently dropped when _transform_responses_api_content_to_chat_completion_content rebuilt content blocks with only {type, text}. Now copies cache_control through so Anthropic prompt caching works correctly when using client.responses.create with cache_control_injection_points. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(responses): preserve cache_control for input_image and input_file blocks Extends the cache_control fix to image and file content blocks, which were also silently dropping cache_control during the Responses API -> Chat Completion transformation. Adds tests for all three content block types. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Babysitter <claude@anthropic.com>

External readiness probes consumed the legacy detailed payload's `db` field to drive alerting and pod-rotation decisions. Stripping the body to `{"status": "healthy"}` broke those probes silently — the HTTP code still flipped to 503, but probes checking `body.db == "connected"` treated the response as healthy. Add `db` back to the unauthenticated payload. Keep the rest of the diagnostic fields (litellm_version, callbacks, cache, log_level) gated behind /health/readiness/details so the recon-leak gate from #26912 holds. Values match the legacy contract: "connected", "disconnected", "Not connected".

fix(proxy): expose db status on public /health/readiness

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>

Document the purpose of the daemon thread that backs the sync branch of the timeout decorator. Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>

#26302) * fix: Fix Redis Sentinel client handling to solve authentication error with password protected sentinel (#25625) * fix Redis Sentinel authentication handling * test: cover Redis Sentinel auth routing * refactor: align Redis Sentinel kwargs threading * fix: avoid duplicate Redis Sentinel socket timeouts * Address review comments * refactor(_redis): return set from _get_redis_kwargs for O(1) lookup Align _get_redis_kwargs() with the cluster helper by returning a set instead of a list, so the sentinel connection-kwargs filter uses O(1) membership tests. Addresses Greptile review feedback on PR #26302. * fix(_redis): restore Azure-specific kwargs in cluster kwargs set The set-literal refactor of _get_redis_cluster_kwargs dropped four LiteLLM-custom Azure keys (azure_redis_ad_token, azure_client_id, azure_tenant_id, azure_client_secret) that the prior list form had explicitly appended. Because they are not in RedisCluster's argspec, they were silently stripped, breaking Azure IAM auth on cluster clients. Re-add them to the explicit include set. --------- Co-authored-by: Kristin Cowalcijk <kristincowalcijk@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: krrish-berri-2 <krrish-berri-2@users.noreply.github.com> Co-authored-by: claude <claude@anthropic.com>

* fix(ollama): Include provider in model list for ollama (#26135) * Include provider in model names for ollama * Fix unit tests * fix(ollama): process both thinking and content in same streaming chunk (#26098) * fix(health_check): skip max_tokens for image_generation mode (#26417) * fix(health_check): skip max_tokens for image_generation mode `_update_litellm_params_for_health_check` injected `max_tokens` for every deployment. OpenAI `/v1/images/generations` strictly rejects unknown fields, so health checks for dall-e-* and gpt-image-1 always failed with `400 "Unknown parameter: 'max_tokens'"` even though the actual image endpoint calls succeed. Skip the `max_tokens` injection when `model_info.mode == "image_generation"`. `messages` still gets injected (downstream `_filter_model_params` already strips it for non-chat handlers). * Switch to allow-list with per-deployment override Per @krrishdholakia review: deny-listing image_generation only re-introduces the same bug for every other non-chat mode (embedding, audio_*, rerank, video_generation, ocr, search, moderation, ...). Replace the single image_generation skip with `_MAX_TOKEN_SUPPORT_MODES = {chat, completion, responses}`. Missing `mode` is treated as chat for backward compatibility. New modes are safe by default. Add `model_info.health_check_supports_max_tokens` as an operator escape hatch — True forces injection on a non-listed deployment (operator wants to bound probe tokens), False suppresses it on a chat-style deployment behind a strict-schema provider. Tests: parametrize over 3 chat-style + 10 non-chat modes, plus override on/off and the no-mode legacy path. * fix(http_handler): handle RequestNotRead in MaskedHTTPStatusError for multipart uploads (#26718) Squash-merged by litellm-agent from dawidkulpa's PR. * fix(ollama): guard against double 'ollama/' prefix in live model listing Greptile flagged that Ollama servers can return names that already start with 'ollama/'. Check the prefix before prepending so we don't produce 'ollama/ollama/...'. Adds a regression test. * Fix Ollama empty reasoning stream chunks Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: James Myatt <james@jamesmyatt.co.uk> Co-authored-by: VHash <225398745+vhash0@users.noreply.github.com> Co-authored-by: hayden <sewhan.kim+@a-bly.com> Co-authored-by: dawidkulpa <84176950+dawidkulpa@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: strip Gemini thought-signature from tool_use.id in non-streaming path; example websearch config (#27873) - adapters/transformation.py: mirror the streaming path and strip the `__thought__<b64>` suffix off `tool_call.id` before building the AnthropicResponseContentBlockToolUse. Base64's `+ / =` characters violate Anthropic's `^[a-zA-Z0-9_-]+$` tool_use.id pattern, so when a conversation that flowed through Gemini is later replayed to an Anthropic-native provider (Bedrock or Anthropic API) the request 400s. - example_config_yaml/websearch_interception_config.yaml: register the interceptor under `callbacks:` not `success_callback:`. `success_callback` does not run pre-request hooks, so the tool-conversion step never fires on `/v1/messages` and the raw `web_search_20250305` tool is forwarded to Bedrock, which 400s. - adds a unit test pinning the non-streaming strip behavior and the surviving `^[a-zA-Z0-9_-]+$` shape of the resulting id. Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> * Fix/azure image edit auth header (#27863) * fix(azure/image_edit): use api-key header instead of Authorization Bearer Delegate `AzureImageEditConfig.validate_environment` to `BaseAzureLLM._base_validate_azure_environment` so the image-edit route follows the same auth resolution as every other Azure provider: - prefer the Azure-native `api-key` header when an API key is available - fall back to `Authorization: Bearer <azure_ad_token>` only for AAD auth The previous implementation unconditionally set `Authorization: Bearer <api_key>`, which is the OpenAI-direct convention and is rejected by Azure OpenAI / APIM-fronted deployments with `401 Access denied due to missing subscription key`. Adds regression tests covering api_key kwarg, litellm_params.api_key, and the AAD-token fallback path. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(azure/image_edit): pin api-key precedence semantics + add regression test Address review feedback that the move to ``BaseAzureLLM._base_validate_azure_environment`` changed the relative priority of the positional ``api_key`` kwarg vs. ``litellm_params["api_key"]``. The new behavior — ``litellm_params["api_key"]`` wins, positional only fills in when ``litellm_params["api_key"]`` is empty — is intentional and matches every other Azure ``validate_environment``: ``AzureVideosConfig`` uses the exact same merge logic, while ``AzureVectorStoresConfig`` and ``AzureResponsesAPIConfig`` don't accept a positional ``api_key`` at all. The old ``or`` chain (positional wins) was the outlier and was part of the same OpenAI-vs-Azure convention drift that produced the original ``Authorization: Bearer`` bug. The only production caller (``llm_http_handler.image_edit``) sources both values from the same ``litellm_params.api_key``, so this change is behaviorally a no-op there. Document the precedence in the docstring and lock it in with an explicit test so future refactors can't quietly re-invert it. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Adam Kirstein <adam.kirstein@disney.com> Co-authored-by: Cursor <cursoragent@cursor.com> * test(azure/image_edit): expect api-key header instead of Authorization Bearer PR #27863 fixed Azure image edit to use the Azure-native api-key header instead of OpenAI's Authorization: Bearer convention, but did not update test_azure_image_edit_litellm_sdk to match. The test still asserted 'Authorization' in headers, which now fails since the new code routes through BaseAzureLLM._base_validate_azure_environment and emits api-key when an api_key is provided. Update the assertion to pin the correct Azure behavior: api-key header present with the resolved key, and no Authorization header. --------- Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: Adam Kirstein <107421694+justalittleadam@users.noreply.github.com> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Adam Kirstein <adam.kirstein@disney.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

…Fireworks API call (#27881) * fix(fireworks_ai): strip thinking_blocks from chat messages before API call Fireworks OpenAI-compatible ChatMessage schema uses additionalProperties:false and rejects Anthropic-style messages[].thinking_blocks (e.g. Claude Code replays), returning invalid_request_error. Remove the field in _transform_messages_helper alongside provider_specific_fields. Adds unit test test_transform_messages_helper_strips_thinking_blocks. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(fireworks_ai): drop inline comments from message sanitization Co-authored-by: Cursor <cursoragent@cursor.com> * docs(fireworks_ai): explain why provider_specific_fields and thinking_blocks are stripped Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

Authenticated clients could supply CustomPricingLiteLLMParams fields (input_cost_per_token, output_cost_per_token, etc.) in the request body. These were forwarded to register_model() in main.py, permanently mutating the shared global litellm.model_cost dict for all users on the instance. Adds all CustomPricingLiteLLMParams fields to _BANNED_REQUEST_BODY_PARAMS so is_request_body_safe() rejects them before they reach completion(). New pricing fields added to CustomPricingLiteLLMParams are auto-covered. Admin opt-in via allow_client_side_credentials or configurable_clientside_auth_params still works as before. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

When ``ProxyConfig`` merges DB-persisted ``litellm_settings`` / ``general_settings`` on top of the YAML config, the merged dict is later iterated by ``load_config`` which threads ``config_file_path`` (the YAML path) into ``get_instance_fn``. The runtime gate that refuses ``s3://`` / ``gcs://`` modules when ``config_file_path`` is ``None`` therefore can't distinguish a YAML-sourced value from a DB-sourced one: both look the same to ``get_instance_fn``. Strip ``s3://`` / ``gcs://`` entries from the DB-overlay value for every field whose contents reach ``get_instance_fn`` during config load: - litellm_settings: ``callbacks``, ``success_callback``, ``failure_callback``, ``audit_log_callbacks``, ``post_call_rules``, ``custom_provider_map[].custom_handler`` - general_settings: ``custom_auth``, ``custom_key_generate``, ``custom_key_update``, ``custom_sso``, ``custom_ui_sso_sign_in_handler``, ``litellm_jwtauth.custom_validate`` The YAML config-file load path is unchanged — the documented operator flow (``callbacks: ["s3://bucket/module.instance"]`` in ``config.yaml``) still works. Only DB-overlay writes (e.g. via ``/config/update``) are stripped. Adds 16 regression tests covering the scrub matrix.

A pass-through endpoint's ``target`` field is passed through ``create_pass_through_route`` into ``get_instance_fn`` during config load. A PROXY_ADMIN persisting ``target: "s3://attacker/m.i"`` via the DB-overlay ``pass_through_endpoints`` write path was not covered by the previous scrub matrix, so the remote module load would still reach the loader because the YAML-load chain has ``config_file_path`` set. Walk each entry in ``general_settings.pass_through_endpoints`` and null out any ``target`` that starts with ``s3://`` or ``gcs://``. The entry itself is preserved so the path-registration helper can choose how to handle a missing target (the existing code skips the route when ``target is None``). Adds two regression tests.

…nd Vertex (#27705) * fix(prometheus): emit remaining_tokens/requests gauges for bedrock + vertex (LIT-2719) Bedrock and Vertex AI never return x-ratelimit-remaining-* response headers, so litellm_remaining_tokens_metric / litellm_remaining_requests_metric only fired for OpenAI / Azure / Anthropic deployments even when tpm/rpm was configured on the router. Add a provider-agnostic fallback in PrometheusLogger.async_log_success_event that asks Router.get_remaining_model_group_usage() for the same model_group and emits the gauges with configured_limit - current_usage when the upstream provider didn't populate the headers itself. Existing OpenAI / Azure / Anthropic flows are unchanged because the fallback short-circuits when both header values are already present. Tests: 8 new tests covering bedrock + vertex emission, header short-circuit, partial-header fill, llm_router=None, missing model_group, empty router result, and router exception swallowing. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(prometheus): narrow except to ImportError, log router lookup failures via verbose_logger.exception Address greptile review: - The optional 'from litellm.proxy.proxy_server import llm_router' should guard against ImportError specifically, not all exceptions, so that unexpected errors (e.g. AttributeError from partially-initialized state) stay visible. - get_remaining_model_group_usage failures are now logged via verbose_logger.exception (with traceback) instead of debug, matching the PR description's intent and avoiding silent loss of router-cache errors in production. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(prometheus): subtract in-flight delta in router-remaining fallback The router's TPM/RPM counter is incremented by Router.deployment_callback_on_success, which fires alongside this prometheus callback in the success-log fan-out. Prometheus wins the race, so get_remaining_model_group_usage returns the pre-decrement counter for the current request — while vendor headers (OpenAI/Anthropic/Azure) are already post-decrement. That broke parity between providers on the same gauge: dashboards plotting litellm_remaining_requests_metric showed Bedrock/Vertex perpetually one request behind Anthropic for the same throughput. Replay the in-flight increment before emit: subtract total_tokens from remaining_tokens and 1 from remaining_requests. * Revert "fix(prometheus): subtract in-flight delta in router-remaining fallback" This reverts commit 001ce95ecdd952b4b5a23dd2b1e62c4562c932bc. * fix(router): post-decrement router-derived ratelimit headers Router.set_response_headers injects x-ratelimit-remaining-{tokens, requests} for providers that don't return them natively (Bedrock, Vertex). The values come from get_remaining_model_group_usage, which reads the router's TPM/RPM counter — incremented post-response by deployment_callback_on_success. So the headers reflected the counter state before the current request was counted: pre-decrement. Vendor headers from OpenAI/Anthropic/Azure are post-decrement (the vendor counted the request before responding). Same metric name, two semantics — dashboards plotting litellm_remaining_requests_metric showed Bedrock/Vertex perpetually one request behind for the same throughput, and the HTTP response headers exposed the same skew to clients. Subtract the in-flight delta before writing: 1 from remaining-requests, response.usage.total_tokens from remaining-tokens. Fixes both the response headers and (transitively) the prometheus gauges that read from standard_logging_payload.additional_headers. --------- Co-authored-by: cursor <cursor@example.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* Update gpt-4o-transcribe price * Update test for gpt-4o-transcribe pricing fix * Update gpt-4o-mini-transcribe price

aws_sts_endpoint, aws_web_identity_token, and aws_bedrock_runtime_endpoint in ingest_options.vector_store were passed directly to the Bedrock ingestion class, which reads them into boto3 STS client construction. Any authenticated caller could redirect AssumeRole calls to an attacker-controlled server, leaking the proxy's instance profile credentials. Calls is_request_body_safe() on ingest_options["vector_store"] before forwarding to litellm.aingest(). Same banned-params list and admin opt-in escape hatch (allow_client_side_credentials) as the /chat/completions path. ValueError from the safety check is caught and re-raised as HTTP 400. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…verlay A guardrail entry's ``callbacks`` list (v1: ``{name: {callbacks:[...]}}``, v2: ``{guardrail_name, litellm_params: {callbacks: [...], guardrail: "module.path"}}``) is iterated during config load and threaded through ``get_instance_fn``. A PROXY_ADMIN persisting ``litellm_settings.guardrails[*].callbacks: ["s3://..."]`` or ``litellm_settings.guardrails[*].litellm_params.guardrail: "s3://..."`` via ``/config/update`` was not covered by the previous scrub matrix. Walk both v1 and v2 entry shapes and null out remote-URL callbacks / module-path values before the merge. Adds four regression tests.

…27726) * feat(mcp): support MCP access group names in URL-based namespacing Extends dynamic_mcp_route to resolve /{name}/mcp requests where {name} is an MCP access group tag or a comma-separated list of servers/groups, matching what the documentation promised but the handler did not implement. Resolution order: registered server alias → toolset → comma-separated list → single access group tag (404 if none match). Adds unit tests covering all four resolution paths plus 404 cases. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): address Greptile review comments on dynamic_mcp_route - Move comma-separated check before toolset DB lookup so comma names short-circuit without hitting the database - Cache access-group DB lookups via user_api_key_cache to avoid a raw find_many on every request (matches toolset caching pattern) - Remove unused response_started variable from _forward_as_mcp_path - Update tests to assert comma list skips toolset call and to mock cache Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(mcp): extract helpers to fix PLR0915 too-many-statements in dynamic_mcp_route Extract _mcp_forward_as_path and _is_mcp_access_group_cached as module-level helpers so dynamic_mcp_route stays under the 50-statement limit. Update tests to patch the new module-level symbols directly. Co-authored-by: Cursor <cursoragent@cursor.com> * Avoid caching missing MCP access groups * fix(mcp): stream MCP responses via _stream_mcp_asgi_response instead of buffering _mcp_forward_as_path previously accumulated the full response body in memory before sending it. Replace the buffering custom_send pattern with _stream_mcp_asgi_response, which uses an asyncio.Queue bridge so chunks are yielded to the client as they arrive, preventing unbounded memory growth on large or long-lived MCP responses. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): short-TTL negative cache for access-group existence lookup An unauthenticated caller could repeatedly request /<unknown>/mcp and force a fresh DB lookup for the access-group existence check on every request (only positive results were cached). Cache negative results for a short DEFAULT_MCP_ACCESS_GROUP_NEGATIVE_CACHE_TTL window (10s by default) so the DB is shielded from flooding while a transient DB error (which surfaces as an empty list) cannot hide a real group for long. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * fix(mcp): use plain int for access-group negative cache TTL Drop the os.getenv wrapper around DEFAULT_MCP_ACCESS_GROUP_NEGATIVE_CACHE_TTL to avoid the documentation_test_env_keys check failing on the new variable. The negative-cache window is a small internal tuning constant, not a user-facing knob, so a plain integer is clearer than an env override. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * fix(mcp): validate, dedupe, and cap CSV tokens in dynamic MCP route For /{name1,name2,...}/mcp, validate every token resolves to a known server alias or access group, dedupe case-insensitively, and cap at DEFAULT_MCP_NAMESPACE_CSV_MAX_TOKENS=16 before forwarding. - Bounds the per-request DB / cache fan-out an authenticated caller can trigger by stuffing the path with tokens (raised by veria-ai). - Returns 404 instead of forwarding when no token resolves, so the downstream server filter cannot silently fall back to the full allowed_mcp_servers list (raised by Cursor agentic security review). - Forwards only the resolved subset, so unknown tokens cannot ride along into the downstream filter. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(mcp): exact-match CSV token dedupe to preserve case-sensitive distinct tokens Bugbot flagged that case-insensitive dedup on `MyGroup,mygroup` could collapse to whichever case appeared first and silently drop the matching casing if the downstream resolver is case-sensitive. Switch to exact-match dedup so distinct casings survive; whitespace-only differences still collapse via the .strip() before comparison. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: mateo-berri <mateo@berri.ai> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

``extra_body`` is the OpenAI-SDK passthrough container. Provider modules read provider-auth fields out of it directly (Azure's ``extra_body.azure_ad_token``, Bedrock's ``extra_body.aws_web_identity_token``, etc.) without re-validating, so the boundary check has to walk it the same way it walks ``litellm_embedding_config``. Adding it to ``_NESTED_CONFIG_KEYS`` extends single-level banned-key descent into the container — top-level admin opt-ins (``allow_client_side_credentials`` / ``configurable_clientside_auth_params``) still apply. ``azure_ad_token`` was not in ``_BANNED_REQUEST_BODY_PARAMS`` despite being the bearer-token field the Azure transformer resolves through ``get_secret`` (same shape as ``aws_web_identity_token`` on the Bedrock STS path). Added so it can't be supplied per-request without an admin opt-in.

…27896) * fix(ui): fetch version + debug flag from /health/readiness/details The proxy moved `litellm_version`, `is_detailed_debug`, and other diagnostic fields off the public `/health/readiness` payload behind an auth-gated `/health/readiness/details` endpoint. The navbar version tag and the detailed-debug-mode banner stopped working because they were still reading those fields from the unauthed response, which no longer contains them. Replace `useHealthReadiness` with a `useHealthReadinessDetails` hook that takes an `accessToken` argument and sends a Bearer header to the auth-gated endpoint. The hook stays disabled while `accessToken` is falsy, so the navbar can keep rendering on the public model hub (where the token is null) without triggering an auth redirect or a 401-loop. * fix(ui): disable retries on readiness/details + cover token forwarding Two small follow-ups on the readiness/details migration: - Set `retry: false` on the query. The payload feeds a passive navbar tag and a debug banner; a 401 from an expired token shouldn't fan out into three retries against the proxy. - Add navbar specs that assert the `accessToken` prop is forwarded into the hook (matches the DebugWarningBanner spec). Without this, the navbar could silently regress to passing `undefined` and the existing tests wouldn't catch it.

``_NESTED_CONFIG_KEYS`` descent used ``isinstance(nested, dict)``, so a caller sending ``extra_body`` as a JSON-encoded string instead of an object (the same shape multipart/form-data clients use for ``litellm_metadata``) skipped the banned-key check entirely. Switched to ``_coerce_metadata_to_dict`` so the JSON-string path is parsed before descent — mirrors the existing handling on ``_NESTED_METADATA_KEYS``.

``test_azure_ad_token_is_in_banned_list`` only asserted tuple membership of a name the parametrized test already exercises end-to-end through ``is_request_body_safe``. Removed. Tightened the admin-opt-in test comment.

…over chore(proxy): cover extra_body + azure_ad_token in banned-params check

…-gate chore(proxy): refuse remote-URL instance-fn loads outside config-file path

Worktree fix mcp byok oauth

* fix: patch Host-header auth bypass in get_request_route Starlette reconstructs request.url from the Host header. A malformed Host like `localhost/?x=1` causes Starlette to build the full URL as `http://localhost/?x=1/health`, which url-parses to path="/". Since "/" is in LiteLLMRoutes.public_routes, all protected routes became reachable without authentication. Fix: read scope["path"] (set by uvicorn from the HTTP request line, not derivable from headers) instead of request.url.path. Sub-path deployments are handled via scope["app_root_path"] / scope["root_path"], mirroring Starlette's own base_url construction logic. Affected variants confirmed fixed: Host: localhost/?x=1 Host: localhost:4000/?x=1 Host: localhost/#test Host: localhost:4000/#test Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * style: reduce comments in route fix Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block credential fields in RAG ingest vector_store options Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.) in ingest_options.vector_store are now rejected at the API boundary with a 400 error. Credentials must be configured server-side. Previously any authenticated user could supply a vertex_credentials dict with type=external_account pointing credential_source.file at an arbitrary path (e.g. /proc/1/environ) and token_url at an attacker-controlled server. google-auth's identity_pool.Credentials refresh() would read the file and POST its contents to the attacker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block /key/update self-escalation by assigned users Non-admin users who were assigned a key (created_by != caller) could update any non-budget field — models, rpm_limit, guardrails, etc. — without admin authorization, allowing privilege self-escalation. Gate: only the key creator (created_by == caller) may edit their own key without admin check; budget changes always require admin regardless of creator status. All other callers must pass _check_key_admin_access. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block user-controlled api_base in RAG ingest vector_store options A user-supplied api_base in ingest_options.vector_store caused the server to forward its configured provider credentials (Gemini, OpenAI) to an attacker-controlled endpoint via SSRF. Add api_base to the blocked credential params set alongside api_key and the existing credential fields. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check Any authenticated internal_user could POST arbitrary provider config (aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have the server forward its credentials to an attacker-controlled endpoint. - Gate the endpoint on PROXY_ADMIN role (403 for all other roles) - Call is_request_body_safe() to reject banned params even for admins - Convert ValueError from safety check to HTTP 400 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: apply banned-param check to /utils/transform_request Without is_request_body_safe(), any authenticated user could pass aws_sts_endpoint, api_base, or aws_web_identity_token to /utils/transform_request and have the server forward its configured provider credentials to an attacker-controlled endpoint during SDK credential resolution. Applies the same banned-param blocklist already used by LLM endpoints. Endpoint remains accessible to all authenticated users. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter Any frontmatter key not in ["model","input","output"] flowed into optional_params and was merged into the LLM call data dict, bypassing is_request_body_safe. An attacker with any bearer key could set api_base in YAML to redirect the outbound LLM request — including the provider API key — to an attacker-controlled host. Fix: call is_request_body_safe on the constructed data dict after optional_params are merged, before invoking ProxyBaseLLMRequestProcessing. ValueError from the banned-param check is surfaced as HTTP 400. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Update litellm/proxy/rag_endpoints/endpoints.py Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * fix: coerce nested config strings before banned-param check _NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently skipped litellm_embedding_config when delivered as a JSON string via multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.) nested inside the stringified value were invisible to is_request_body_safe. _NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: replace substring match with prefix match in is_llm_api_route mapped_pass_through_routes used `_llm_passthrough_route in route` (substring) so any admin-only path whose URL contained a provider name (openai, anthropic, azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the admin gate in non_proxy_admin_allowed_routes_check. Confirmed live: non-admin key could GET /credentials/by_name/openai (read masked provider API key) and DELETE /credentials/openai (delete credential). Fix: use exact match or startswith(prefix + "/") — the same pattern used everywhere else in RouteChecks — so only routes that actually start with a passthrough prefix are allowed through. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize PR #27878 test failures - key_management_endpoints: extend can_skip_admin_check to team keys so team members with /key/update permission can update non-budget fields. can_team_member_execute_key_management_endpoint already validates team membership + permission and raises if unauthorized; reaching the admin check on a team key means the caller was authorized. - test: set created_by on mock key in test_update_key_non_budget_fields_allowed_for_internal_user so caller_is_creator resolves correctly (MagicMock default ≠ user_id). - auth_utils.get_request_route: guard against non-dict request.scope (e.g. MagicMock in unit tests) to prevent a MagicMock leaking into UserAPIKeyAuth.request_route and failing Pydantic validation. - ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard in test-unit-proxy-db.yml to satisfy the shard-coverage check. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(lint): add explicit str() cast in get_request_route for MyPy scope.get() returns Any|None which MyPy cannot coerce to str implicitly. Wrap both scope.get() calls in str() to satisfy the type checker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: guard bare-/ root_path strip + make total_spend migration idempotent auth_utils.get_request_route: when Starlette sets scope["app_root_path"] to "/" (e.g. behind some middleware), the old stripping logic would remove the leading slash from every path ("/team/new" → "team/new"), breaking route matching and causing auth to misclassify protected routes. Skip stripping when root_path is bare "/". migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration is safe to replay when a prior partial run already created the column. Without this guard, prisma migrate deploy fails on CI DBs that were partially migrated, causing all subsequent DB operations (including /team/new) to 500. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: require creator still owns key for personal-key bypass in /key/update caller_is_creator now requires both created_by == caller AND user_id == caller. Previously checking only created_by let a demoted admin who originally created a key for another user continue editing non-budget fields on it after reassignment, bypassing _check_key_admin_access. Adds regression test: creator whose key was reassigned is blocked (403). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: extract auth checks to fix PLR0915 + broaden max_budget assertion internal_user_endpoints._update_single_user_helper exceeded 50 statements (PLR0915). Extract authorization checks into _check_user_update_authz helper to bring statement count under the limit. test_validate_max_budget: assert "negative" (substring of both the local "cannot be negative" and the CI "non-negative finite number" messages) so the test is stable regardless of which exact wording the function uses. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>

greptile-apps · 2026-05-14T04:43:53Z

Too many files changed for review. (105 files found, 100 file limit)

CLAassistant · 2026-05-14T04:43:59Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
6 out of 10 committers have signed the CLA.

✅ stuxf
✅ mateo-berri
✅ Sameerlite
✅ yuneng-berri
✅ lmcdonald-godaddy
✅ milan-berri
❌ oss-agent-shin
❌ ishaan-berri
❌ krrish-berri-2
❌ yassin-berriai
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

mateo-berri

💯

codspeed-hq · 2026-05-14T04:45:46Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing litellm_internal_staging (de1747d) with main (7af0f05)}

codecov · 2026-05-14T04:46:53Z

Codecov Report

❌ Patch coverage is 89.41176% with 18 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/proxy/_experimental/mcp_server/server.py	75.47%	13 Missing ⚠️
litellm/llms/gemini/google_genai/transformation.py	90.90%	2 Missing ⚠️
...erimental/mcp_server/auth/user_api_key_auth_mcp.py	95.55%	2 Missing ⚠️
...ellm/llms/vertex_ai/google_genai/transformation.py	50.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

….20.2)

 from litellm.litellm_core_utils.url_utils import SSRFError, validate_url
 from litellm.proxy._types import *
 from litellm.types.router import CONFIGURABLE_CLIENTSIDE_AUTH_PARAMS
+from litellm.types.utils import CustomPricingLiteLLMParams


 )
-from litellm.proxy.auth.auth_utils import check_response_size_is_safe
+from litellm.proxy.auth.auth_utils import (
+    check_response_size_is_safe,


+                    or getattr(user_api_key_auth, "api_key", None)
+                )
+            )
+            if is_anonymous:


    _safe_get_request_headers,
    get_form_data,
 )
+from litellm.proxy.auth.auth_utils import is_request_body_safe


-from litellm.proxy.auth.auth_utils import check_response_size_is_safe
+from litellm.proxy.auth.auth_utils import (
+    check_response_size_is_safe,
+    is_request_body_safe,



 from litellm._logging import verbose_proxy_logger
 from litellm.proxy._types import CommonProxyErrors, LitellmUserRoles, UserAPIKeyAuth
+from litellm.proxy.auth.auth_utils import is_request_body_safe


+from litellm.llms.custom_httpx.http_handler import (
+    get_async_httpx_client,
+    httpxSpecialProvider,
+)


+from litellm.llms.vertex_ai.common_utils import (
+    _build_vertex_schema,
+    supports_response_json_schema,
+)


+from litellm.llms.azure.common_utils import BaseAzureLLM
 from litellm.llms.openai.image_edit.transformation import OpenAIImageEditConfig
 from litellm.secret_managers.main import get_secret_str
+from litellm.types.router import GenericLiteLLMParams


 import httpx

 import litellm
+from litellm.llms.azure.common_utils import BaseAzureLLM


[Infra] Bump Extras Version

[Infra] Build UI

#27689) Provider validation errors (e.g. OpenAI RateLimitError carrying 178 pydantic errors each with their own 'input': [...]) were stored verbatim in LiteLLM_SpendLogs.metadata.error_information.error_message via str(original_exception), producing rows >12 MB. Sanitize before metadata is serialized: - redact 'input'/'messages' values in both error_message and traceback when store_prompts_in_spend_logs is False (back-door leak paths) - always apply the MAX_STRING_LENGTH_PROMPT_IN_DB size cap to error_message and traceback (DB-storage safeguard) Value scanning uses a parser-based balanced-bracket walk that respects string quoting, so multi-modal payloads ('messages': [{'content': [...]}]) and user text containing literal brackets ("secret[123") are handled correctly instead of leaking past a depth-1 regex. Scoped to the spend-log path so OTEL/Datadog/etc. callbacks still receive the untruncated error per LITELLM_TRUNCATION_DB_SAFEGUARD_NOTE. Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

stuxf and others added 30 commits May 13, 2026 01:05

fix(responses): register cooldowns on failure + fail fast on stale en…

2e5ebf8

…crypted_content (#27820)

Merge pull request #27866 from BerriAI/litellm_/bold-kare-aebc4b

5e706f8

fix(proxy): expose db status on public /health/readiness

docs(budget_manager): add docstring to BudgetManager.reset_cost (#27867)

ca82761

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>

docs: add class docstring to _LoopWrapper (#27870)

714664f

Document the purpose of the daemon thread that backs the sync branch of the timeout decorator. Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>

fix(pricing): GPT-4o-Transcribe Pricing (#27875)

baa68eb

* Update gpt-4o-transcribe price * Update test for gpt-4o-transcribe pricing fix * Update gpt-4o-mini-transcribe price

yuneng-berri and others added 7 commits May 13, 2026 20:33

chore(tests): drop redundant membership check; trim test comment

626b768

``test_azure_ad_token_is_in_banned_list`` only asserted tuple membership of a name the parametrized test already exercises end-to-end through ``is_request_body_safe``. Removed. Tightened the admin-opt-in test comment.

Merge pull request #27898 from stuxf/chore/banned-params-extra-body-c…

a6a9d8e

…over chore(proxy): cover extra_body + azure_ad_token in banned-params check

Merge pull request #27801 from stuxf/chore/get-instance-fn-runtime-s3…

e3e5209

…-gate chore(proxy): refuse remote-URL instance-fn loads outside config-file path

Merge pull request #27892 from BerriAI/worktree-fix-mcp-byok-oauth

0c49820

Worktree fix mcp byok oauth

mateo-berri approved these changes May 14, 2026

View reviewed changes

chore: update Next.js build artifacts (2026-05-14 04:47 UTC, node v20…

2d578a7

….20.2)

github-advanced-security AI found potential problems May 14, 2026

View reviewed changes

yuneng-berri and others added 5 commits May 13, 2026 21:51

bump: version 0.4.71 → 0.4.72

0aa439d

uv lock

e838a40

Merge pull request #27908 from BerriAI/yj_bump_may13

a2bb7aa

[Infra] Bump Extras Version

Merge pull request #27907 from BerriAI/yj_build_2_may13

117b036

[Infra] Build UI

shin-berri approved these changes May 14, 2026

View reviewed changes

shin-berri merged commit e58a561 into main May 14, 2026
126 of 130 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Infra] Promote internal staging to main#27906

[Infra] Promote internal staging to main#27906
shin-berri merged 43 commits into
mainfrom
litellm_internal_staging

yuneng-berri commented May 14, 2026

Uh oh!

greptile-apps Bot commented May 14, 2026

Uh oh!

CLAassistant commented May 14, 2026 •

edited

Loading

Uh oh!

mateo-berri left a comment

Uh oh!

codspeed-hq Bot commented May 14, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Uh oh!

Conversation

yuneng-berri commented May 14, 2026

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

greptile-apps Bot commented May 14, 2026

Uh oh!

CLAassistant commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mateo-berri left a comment

Choose a reason for hiding this comment

Uh oh!

codspeed-hq Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

codecov Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

CLAassistant commented May 14, 2026 •

edited

Loading

codspeed-hq Bot commented May 14, 2026 •

edited

Loading

codecov Bot commented May 14, 2026 •

edited

Loading