Litellm oss staging 250526 by Sameerlite · Pull Request #28770 · BerriAI/litellm

Sameerlite · 2026-05-25T06:38:12Z

Relevant issues

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Note

High Risk
Arize Phoenix tracing and proxy metadata rules are security-sensitive and behavior-changing; MCP OAuth callback and pass-through HTTP method changes affect auth and routing paths.

Overview
This PR bundles observability, proxy, Helm, and model-catalog updates.

Arize Phoenix now routes traces via an LRU-cached per-project TracerProvider (shared span processor, project on OTEL Resource instead of span attrs). Project resolution adds override/env rules and proxy-safe selection: only user_api_key_auth_metadata on proxy requests; client phoenix_project_name in metadata is ignored on the proxy. flush_tracer_providers() is added for shutdown. Init no longer mutates OTEL_RESOURCE_ATTRIBUTES for a single global project.

Proxy security & ops: Request bodies cannot set phoenix_project_name_override or user_api_key_auth_metadata. MCP OAuth /callback accepts IdP error responses (RFC 6749) instead of 422 when code/state are missing, with HTML fallback or trusted redirect propagation. Management audit logs strip credential fields from responses. Pass-through uses the incoming HTTP method instead of hardcoded POST.

Metrics & small fixes: Datadog emits litellm.overhead.latency from hidden_params.litellm_overhead_time_ms. is_thinking_enabled tolerates thinking: None. Regional cost check uses a prebuilt residency frozenset.

Helm: podAnnotations are rendered with tpl so values like custom checksum/config can reference chart values (test added).

Data: Large refresh of github_copilot/* entries in model_prices_and_context_window_backup.json; example batch config uses placeholder AWS account IDs.

^{Reviewed by Cursor Bugbot for commit ccf368e. Bugbot is set up for automated code reviews on this repo. Configure here.}

[Infra] Promote internal staging to main

chore(ci): promote internal staging to main

Per RFC 6749 section 4.1.2.1, when the IdP rejects an OAuth authorization request it redirects back to the client with ?error=...&error_description=... and no code. The MCP /callback handler declared code and state as required query params, so FastAPI rejected such error responses with a 422 before the handler ran -- stranding the MCP client waiting on the loopback. This change: - Makes code and state optional and accepts the RFC-defined error, error_description, and error_uri params. - When state decodes to a trusted client redirect_uri, propagates the error params back to that URI with the client's original (un-wrapped) state preserved, so the client's OAuth library can surface the failure. - When state is missing/undecryptable or the encoded redirect_uri is no longer trusted, renders a 400 HTML page with the (HTML-escaped) error details instead of leaking to an attacker-controlled redirect. - Preserves the existing success path (code + state -> 302 to validated client redirect_uri with original state). Fixes LIT-2750.

…s (LIT-2750) Adds a new test module covering the LIT-2750 fix: the MCP OAuth /callback endpoint must accept IdP error responses (e.g. ?error=access_denied) per RFC 6749 section 4.1.2.1 instead of returning a 422 because ``code`` is missing. Coverage: - IdP error with no state -> 400 HTML page surfacing the error. - HTML escaping of user-controlled error / error_description fields. - IdP error with a trusted (loopback) state -> 302 propagating error / error_description / original client state to the client. - IdP error with an untrusted redirect_uri encoded in state -> 400 inline (no open-redirect to attacker-controlled origin). - IdP error with an undecryptable state -> 400 HTML fallback. - Bare GET /callback with no params -> 400 HTML (not Pydantic 422). - Success path (code + state) still 302 to validated client redirect_uri with the original (un-wrapped) state preserved.

The tuple was leftover scaffolding from an earlier draft of the LIT-2750 fix; nothing references it. The explanatory RFC 6749 §4.1.2.1 comment block above the callback handler covers the same intent.

…28743) * fix(mcp): handle OAuth IdP error responses in /callback (LIT-2750) Per RFC 6749 section 4.1.2.1, when the IdP rejects an OAuth authorization request it redirects back to the client with ?error=...&error_description=... and no code. The MCP /callback handler declared code and state as required query params, so FastAPI rejected such error responses with a 422 before the handler ran -- stranding the MCP client waiting on the loopback. This change: - Makes code and state optional and accepts the RFC-defined error, error_description, and error_uri params. - When state decodes to a trusted client redirect_uri, propagates the error params back to that URI with the client's original (un-wrapped) state preserved, so the client's OAuth library can surface the failure. - When state is missing/undecryptable or the encoded redirect_uri is no longer trusted, renders a 400 HTML page with the (HTML-escaped) error details instead of leaking to an attacker-controlled redirect. - Preserves the existing success path (code + state -> 302 to validated client redirect_uri with original state). Fixes LIT-2750. * test(mcp): regression tests for /callback handling IdP error responses (LIT-2750) Adds a new test module covering the LIT-2750 fix: the MCP OAuth /callback endpoint must accept IdP error responses (e.g. ?error=access_denied) per RFC 6749 section 4.1.2.1 instead of returning a 422 because ``code`` is missing. Coverage: - IdP error with no state -> 400 HTML page surfacing the error. - HTML escaping of user-controlled error / error_description fields. - IdP error with a trusted (loopback) state -> 302 propagating error / error_description / original client state to the client. - IdP error with an untrusted redirect_uri encoded in state -> 400 inline (no open-redirect to attacker-controlled origin). - IdP error with an undecryptable state -> 400 HTML fallback. - Bare GET /callback with no params -> 400 HTML (not Pydantic 422). - Success path (code + state) still 302 to validated client redirect_uri with the original (un-wrapped) state preserved. * refactor(mcp): drop unused _OAUTH_ERROR_PARAMS constant (Greptile P2) The tuple was leftover scaffolding from an earlier draft of the LIT-2750 fix; nothing references it. The explanatory RFC 6749 §4.1.2.1 comment block above the callback handler covers the same intent. --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>

CLAassistant · 2026-05-25T06:38:19Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
6 out of 9 committers have signed the CLA.

✅ yuneng-berri
✅ Sameerlite
✅ Terrajlz
✅ mubashir1osmani
✅ devauxbr
✅ ririnto
❌ shin-berri
❌ cursoragent
❌ oss-agent-shin
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

greptile-apps · 2026-05-25T06:41:32Z

Greptile Summary

This PR bundles several operational fixes and observability improvements: Arize Phoenix tracing is refactored to use a per-project LRU-cached TracerProvider architecture, the MCP OAuth /callback endpoint now handles IdP error responses per RFC 6749, and pass-through upstream calls are fixed to use the incoming HTTP method instead of always using POST.

Arize Phoenix: Replaces single-provider + span-level project naming with an LRU cache of per-project TracerProviders sharing one SpanProcessor; proxy-safe routing restricts project overrides to user_api_key_auth_metadata only; global OTEL_RESOURCE_ATTRIBUTES mutation at init is removed.
Management OTEL spans: _emit_management_endpoint_otel_span now serialises the full response dict onto the success span — but generate_key_fn returns a GenerateKeyResponse whose key field carries the plaintext API key, which will be written to every configured OTEL backend as response.key.
Other fixes: is_thinking_enabled handles None thinking param; Datadog emits litellm.overhead.latency; Helm podAnnotations use tpl; phoenix_project_name_override is banned from client request metadata.

Confidence Score: 4/5

Safe to merge after the management OTEL span change is revisited — all other changes are well-tested fixes.

The management endpoint OTEL span change serializes the full response of key-generation endpoints into telemetry, including the key field that holds the freshly-generated plaintext API key. Every OTEL backend configured for the proxy would receive this credential. The rest of the PR — Phoenix per-project tracing, MCP OAuth error handling, pass-through method fix — is well-structured and has good test coverage.

litellm/proxy/management_helpers/utils.py — the response serialization into OTEL spans needs to filter or denylist credential fields before this is safe to merge.

Security Review

Credential leak into OTEL telemetry (litellm/proxy/management_helpers/utils.py): _emit_management_endpoint_otel_span now calls dict(result) and sets every response field as a span attribute. generate_key_fn (wrapped by management_endpoint_wrapper) returns GenerateKeyResponse which includes key: str — the plaintext generated API key. This key is written to every configured OTEL backend as response.key, violating the principle of not storing credentials in telemetry.

Important Files Changed

Filename	Overview
litellm/proxy/management_helpers/utils.py	Adds response body to OTEL management spans, but `dict(result)` on GenerateKeyResponse exposes the plaintext `key` field (the generated API key) to every OTEL backend.
litellm/integrations/arize/arize_phoenix.py	Major refactor introducing per-project LRU-cached TracerProviders, proxy-safe project routing, and unified _handle_phoenix_trace. Architecture and thread safety look sound; double-checked locking pattern is correct.
litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py	New /callback error path accepts optional code/state/error per RFC 6749 §4.1.2.1, renders HTML 400s for untrusted redirects, and correctly HTML-escapes error strings before rendering.
litellm/proxy/pass_through_endpoints/pass_through_endpoints.py	Fixes pass-through upstream calls to use the incoming HTTP method instead of hardcoded POST — correct bug fix.
litellm/proxy/auth/auth_utils.py	Adds phoenix_project_name_override to the banned observability params set, preventing clients from injecting project routing overrides via request metadata.
litellm/integrations/datadog/datadog_metrics.py	Adds litellm.overhead.latency gauge metric from hidden_params.litellm_overhead_time_ms; straightforward and well-tested addition.
litellm/litellm_core_utils/litellm_logging.py	Removes the OTEL_RESOURCE_ATTRIBUTES mutation at logger init, deferring project name to per-request Resource attributes. Clean-up that prevents cross-request project pinning.
deploy/charts/litellm-helm/templates/deployment.yaml	Wraps podAnnotations with tpl to support Helm templating expressions in annotation values; standard Helm pattern with no issues.

_{Reviews (4): Last reviewed commit: "feat(arize): route Phoenix traces via pe..." | Re-trigger Greptile}

codecov · 2026-05-25T06:41:48Z

Codecov Report

❌ Patch coverage is 93.12169% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/integrations/arize/arize_phoenix.py	93.00%	10 Missing ⚠️
litellm/proxy/management_helpers/utils.py	60.00%	2 Missing ⚠️
..._experimental/mcp_server/discoverable_endpoints.py	96.87%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

cursor

Cursor Bugbot has reviewed your changes using high effort and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Misleading error message when only one parameter missing
- The error description is now built dynamically from the actually-missing parameter names so it no longer claims both are missing when only one is.
✅ Fixed: Error path drops empty-string state unlike success path
- Changed the error-path guard to if original_state is not None, matching the success path so an empty-string state is still echoed back per RFC 6749 §4.1.2.1.

Preview (3398a043dc)

diff --git a/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py b/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
--- a/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
+++ b/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
@@ -1,3 +1,4 @@
+import html as _html
 import json
 from typing import Any, Dict, Optional
 from urllib.parse import parse_qsl, urlencode, urlparse, urlunparse
@@ -618,8 +619,105 @@
     )
 
 
+# Per RFC 6749 §4.1.2.1, an IdP that rejects an OAuth authorization request
+# redirects back to the configured redirect URI with ``error`` /
+# ``error_description`` / ``error_uri`` query params and no ``code``. The MCP
+# loopback flow funnels that response through this /callback endpoint, so
+# the endpoint must accept either a successful (``code``+``state``) or an
+# error response. Declaring ``code``/``state`` as required would cause
+# FastAPI to reject the error response with a 422 before the handler runs,
+# which strands the MCP client waiting on the loopback (see LIT-2750).
+
+
+def _render_oauth_error_html(error: str, description: Optional[str]) -> HTMLResponse:
+    """Render an actionable HTML page for an IdP-reported OAuth error.
+
+    Used when we cannot propagate the error back to the registered
+    ``redirect_uri`` (state missing or undecryptable). Returned with a 400
+    status so the failure is observable to operators while still being a
+    human-readable page for the end user.
+    """
+    safe_error = _html.escape(error or "unknown_error")
+    safe_description = _html.escape(description) if description else ""
+    description_html = f"<p>{safe_description}</p>" if safe_description else ""
+    body = (
+        "<html><body>"
+        "<h2>Authentication failed</h2>"
+        f"<p><strong>Error:</strong> {safe_error}</p>"
+        f"{description_html}"
+        "<p>You can close this window and try again.</p>"
+        "</body></html>"
+    )
+    return HTMLResponse(body, status_code=400)
+
+
 @router.get("/callback")
-async def callback(request: Request, code: str, state: str):
+async def callback(
+    request: Request,
+    code: Optional[str] = None,
+    state: Optional[str] = None,
+    error: Optional[str] = None,
+    error_description: Optional[str] = None,
+    error_uri: Optional[str] = None,
+):
+    """OAuth 2.0 authorization response handler for MCP loopback clients.
+
+    Accepts either:
+
+    - A successful authorization response (``code`` + ``state``), which is
+      forwarded back to the validated client ``redirect_uri`` with the
+      original (un-wrapped) ``state``.
+    - An error response (``error``[+``error_description``/``error_uri``]), per
+      RFC 6749 §4.1.2.1. When ``state`` is present and decodes to a trusted
+      ``redirect_uri``, the error params are propagated back to the client so
+      its OAuth library can surface them. Otherwise we render an HTML error
+      page so the user is not left on an opaque 422 / blank screen.
+    """
+    # 1. IdP-reported error path (e.g. ``?error=access_denied``).
+    if error:
+        verbose_logger.info(
+            "MCP /callback received IdP error: error=%s, error_description=%s",
+            error,
+            error_description,
+        )
+        if state:
+            try:
+                state_data = decode_state_hash(state)
+                original_state = state_data.get("original_state")
+                redirect_uri = _get_validated_client_redirect_uri(request, state_data)
+            except HTTPException:
+                # Untrusted/invalid client redirect_uri — surface inline rather
+                # than blindly forwarding the error to an attacker-controlled URL.
+                return _render_oauth_error_html(error, error_description)
+            except Exception:
+                # State could not be decrypted (expired key, tampered, etc.).
+                return _render_oauth_error_html(error, error_description)
+
+            params: Dict[str, str] = {"error": error}
+            if error_description:
+                params["error_description"] = error_description
+            if error_uri:
+                params["error_uri"] = error_uri
+            if original_state is not None:
+                params["state"] = original_state
+            complete_returned_url = _append_query_params(redirect_uri, params)
+            return RedirectResponse(url=complete_returned_url, status_code=302)
+
+        # No state — nothing to round-trip to. Show the user the error.
+        return _render_oauth_error_html(error, error_description)
+
+    # 2. Neither success nor error parameters present — most likely a stray
+    #    GET / dropped SSO redirect chain. Surface a 400 instead of 422.
+    if not code or not state:
+        missing = [
+            name for name, value in (("code", code), ("state", state)) if not value
+        ]
+        return _render_oauth_error_html(
+            "invalid_request",
+            f"Missing authorization {' and '.join(repr(m) for m in missing)} parameter(s).",
+        )
+
+    # 3. Successful authorization response.
     try:
         state_data = decode_state_hash(state)
         original_state = state_data["original_state"]

diff --git a/tests/test_litellm/proxy/_experimental/mcp_server/test_callback_oauth_error_responses.py b/tests/test_litellm/proxy/_experimental/mcp_server/test_callback_oauth_error_responses.py
new file mode 100644
--- /dev/null
+++ b/tests/test_litellm/proxy/_experimental/mcp_server/test_callback_oauth_error_responses.py
@@ -1,0 +1,210 @@
+"""Regression tests for LIT-2750.
+
+The MCP OAuth ``/callback`` endpoint must handle IdP error responses
+(e.g. ``?error=access_denied``) gracefully instead of returning a 422
+because ``code`` and ``state`` were declared as required FastAPI query
+params. Per RFC 6749 §4.1.2.1 the IdP redirects to the configured
+redirect URI with ``error`` / ``error_description`` / ``error_uri``
+query params and no ``code`` when the user denies access.
+
+These tests cover both the propagate-to-client path (when state decodes
+to a trusted ``redirect_uri``) and the in-page fallback (when state is
+missing, undecryptable, or carries an untrusted redirect_uri). They also
+pin the success path (``code`` + ``state``) against accidental
+regressions.
+"""
+
+import pytest
+
+
+@pytest.fixture(autouse=True)
+def _mock_mcp_client_ip():
+    """Bypass IP-based access control for the in-process TestClient.
+
+    Mirrors the autouse fixture in ``test_discoverable_endpoints.py`` so
+    these tests don't require a real client IP context.
+    """
+    from unittest.mock import patch
+
+    with patch(
+        "litellm.proxy._experimental.mcp_server.discoverable_endpoints.IPAddressUtils.get_mcp_client_ip",
+        return_value=None,
+    ):
+        yield
+
+
+@pytest.fixture
+def callback_test_client(monkeypatch):
+    """FastAPI TestClient mounted with the MCP discoverable router.
+
+    Sets a deterministic ``LITELLM_SALT_KEY`` so encoded states minted
+    in-test can be decrypted by the handler.
+    """
+    from fastapi import FastAPI
+    from fastapi.testclient import TestClient
+
+    monkeypatch.setenv("LITELLM_SALT_KEY", "sk-test-salt-for-LIT-2750")
+
+    from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
+        router,
+    )
+
+    app = FastAPI()
+    app.include_router(router)
+    return TestClient(app)
+
+
+class TestCallbackOAuthErrorResponses:
+    """LIT-2750: IdP error responses to ``/callback`` must not 422."""
+
+    def test_idp_error_with_no_state_returns_400_html(self, callback_test_client):
+        """Pre-fix: 422 Pydantic. Post-fix: 400 HTML with the IdP's error."""
+        resp = callback_test_client.get(
+            "/callback",
+            params={
+                "error": "access_denied",
+                "error_description": "User declined access",
+            },
+            follow_redirects=False,
+        )
+        assert resp.status_code == 400
+        assert "text/html" in resp.headers["content-type"]
+        body = resp.text
+        assert "access_denied" in body
+        assert "User declined access" in body
+        # Sanity: must not leak the Pydantic validation error.
+        assert "Field required" not in body
+
+    def test_idp_error_html_escapes_user_controlled_fields(
+        self, callback_test_client
+    ):
+        """A malicious IdP must not be able to inject HTML/JS via error params."""
+        resp = callback_test_client.get(
+            "/callback",
+            params={
+                "error": "<script>alert(1)</script>",
+                "error_description": "<img src=x onerror=alert(2)>",
+            },
+            follow_redirects=False,
+        )
+        assert resp.status_code == 400
+        body = resp.text
+        # Raw tags must be escaped, not present verbatim.
+        assert "<script>alert(1)</script>" not in body
+        assert "<img src=x onerror=alert(2)>" not in body
+        assert "&lt;script&gt;alert(1)&lt;/script&gt;" in body
+
+    def test_idp_error_with_trusted_state_propagates_to_client_redirect_uri(
+        self, callback_test_client
+    ):
+        """When state decodes to a trusted (loopback) redirect_uri, propagate
+        the error back so the MCP client's OAuth library can surface it
+        instead of timing out waiting on the loopback."""
+        from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
+            encode_state_with_base_url,
+        )
+
+        state = encode_state_with_base_url(
+            base_url="http://localhost:3000/",
+            original_state="client-original-state-xyz",
+            client_redirect_uri="http://127.0.0.1:60108/callback",
+        )
+
+        resp = callback_test_client.get(
+            "/callback",
+            params={
+                "error": "access_denied",
+                "error_description": "User declined access",
+                "state": state,
+            },
+            follow_redirects=False,
+        )
+        assert resp.status_code == 302
+        location = resp.headers["location"]
+        assert location.startswith("http://127.0.0.1:60108/callback?")
+        assert "error=access_denied" in location
+        # Original client state must be round-tripped, not our wrapped state.
+        assert "state=client-original-state-xyz" in location
+        # error_description percent-encoded but present.
+        assert "error_description=User" in location
+        # Wrapped/encrypted state must NOT leak to the client.
+        assert state not in location
+
+    def test_idp_error_with_untrusted_redirect_uri_does_not_open_redirect(
+        self, callback_test_client
+    ):
+        """If the state minted earlier carries a redirect_uri that the proxy
+        no longer trusts, we must surface the error inline rather than
+        302-ing to an attacker-controlled URL (open-redirect)."""
+        from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
+            encode_state_with_base_url,
+        )
+
+        state = encode_state_with_base_url(
+            base_url="http://localhost:3000/",
+            original_state="x",
+            client_redirect_uri="https://attacker.example.com/steal",
+        )
+
+        resp = callback_test_client.get(
+            "/callback",
+            params={"error": "access_denied", "state": state},
+            follow_redirects=False,
+        )
+        # Must not 3xx — open redirect would defeat the redirect_uri allowlist.
+        assert resp.status_code == 400
+        assert "attacker.example.com" not in resp.headers.get("location", "")
+        assert "access_denied" in resp.text
+
+    def test_idp_error_with_undecryptable_state_falls_back_to_html(
+        self, callback_test_client
+    ):
+        resp = callback_test_client.get(
+            "/callback",
+            params={
+                "error": "server_error",
+                "error_description": "boom",
+                "state": "not-a-valid-encrypted-state",
+            },
+            follow_redirects=False,
+        )
+        assert resp.status_code == 400
+        assert "server_error" in resp.text
+        assert "boom" in resp.text
+
+    def test_bare_callback_with_no_params_returns_400_not_422(
+        self, callback_test_client
+    ):
+        """An SSO redirect chain that drops the original /authorize query
+        params should land on a human-readable 400, not a Pydantic 422."""
+        resp = callback_test_client.get("/callback", follow_redirects=False)
+        assert resp.status_code == 400
+        assert "invalid_request" in resp.text
+        assert "Field required" not in resp.text
+
+    def test_success_path_still_redirects_with_code_and_state(
+        self, callback_test_client
+    ):
+        """Regression: the successful (``code``+``state``) flow must still
+        redirect back to the trusted client redirect_uri with the original
+        state preserved."""
+        from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
+            encode_state_with_base_url,
+        )
+
+        state = encode_state_with_base_url(
+            base_url="http://localhost:3000/",
+            original_state="orig-state-success",
+            client_redirect_uri="http://127.0.0.1:60108/callback",
+        )
+
+        resp = callback_test_client.get(
+            "/callback",
+            params={"code": "auth-code-abc", "state": state},
+            follow_redirects=False,
+        )
+        assert resp.status_code == 302
+        location = resp.headers["location"]
+        assert location.startswith("http://127.0.0.1:60108/callback?")
+        assert "code=auth-code-abc" in location
+        assert "state=orig-state-success" in location

_{You can send follow-ups to the cloud agent here.}

…am error in /callback Co-authored-by: Yassin Kortam <yassin@berri.ai>

Sameerlite · 2026-05-25T09:21:49Z

@greptileai

) Squash-merged by litellm-agent from Terrajlz's PR.

Squash-merged by litellm-agent from devauxbr's PR.

Fix CI black --check failure on is_thinking_enabled return formatting. Co-authored-by: Cursor <cursoragent@cursor.com>

Sameerlite · 2026-05-25T13:04:00Z

@greptileai

* fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body (#27526) * Fix Bedrock KB pass-through SigV4 headers and signed body Coerce botocore HeadersDict to a dict for pass-through routes. When forward_headers is true, drop request headers that collide case-insensitively with signed headers so client Bearer auth does not shadow AWS SigV4. Send prepped.body as raw content so the outbound payload matches the signature after logging hooks mutate the parsed dict. Co-authored-by: Cursor <cursoragent@cursor.com> * Simplify pass-through raw body handling Read the SigV4-signed bytes directly from request.state inside pass_through_request instead of threading a custom_raw_body argument through three functions. Helper methods are restored to their original signatures, and the new branch lives in one place at each httpx call site. Co-authored-by: Cursor <cursoragent@cursor.com> * Harden pass-through raw body read from request.state Guard missing request.state (test fixtures) and ignore non-bytes/str values so MagicMock does not trigger the SigV4 raw-body path. Co-authored-by: Cursor <cursoragent@cursor.com> * Test pass_through_request state_raw_body uses httpx content= Cover non-streaming (async_client.request) and streaming (build_request) paths so SigV4 bytes on request.state are not replaced by json= of a hook-mutated dict. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728) * chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214 The original account (888602223428) was put under a security restriction by AWS after a root access key leaked in a PR comment. While that account works its way through the AWS Support unlock process, Bedrock-touching CI tests have been migrated to a fresh account (941277531214). Changes: - Replace 26 hardcoded references to 888602223428 with 941277531214 across 8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime ARNs, batch execution role ARN, and example proxy config). - The provisioned-model and imported-model ARNs are referenced only from mocked unit tests — no AWS resources to recreate. - The batch execution IAM role has been recreated in the new account with the same name and equivalent permissions. - The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC, hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account under the same names — see tools/agentcore-deploy/ in a follow-up. CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME were updated separately via the CircleCI API to point at the new account. Smoke-tested locally against the new account: aws bedrock-runtime converse --region us-west-2 \ --model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \ --messages '[{"role":"user","content":[{"text":"ping"}]}]' → 200, model returned 'pong' Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes The first migration commit replaced just the account ID, but AgentCore auto-assigns a random 10-char suffix to every runtime on creation — we can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the new account. Updated the AgentCore-runtime ARNs in the three files that reference real runtime IDs (not the mock-based unit-test ARNs). Deployed runtimes: arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy Both runtimes are status=READY and pass a smoke invoke: $ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}' → 200, {"result": "echo: ping"} The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the deploy artifacts). Tests that only verify the SDK wiring will pass; if any test asserts on agent output content, swap the echo for the real agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(tests): point Bedrock batch tests at new-account S3 bucket The account migration (888602223428 -> 941277531214) was a flat account-ID swap, which only rewrites ARNs that embed the account number. S3 bucket names carry no account ID, so the live Bedrock batch tests still uploaded to `litellm-proxy` — a bucket that lives in the old account. S3 names are globally unique, and the old account still holds that name, so it can't be recreated in the new account. Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees global uniqueness). The bucket must be created in 941277531214 and the batch execution role granted s3:GetObject/PutObject/ListBucket on it before this job is run in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(tests): point live S3 logging test at new-account bucket Same account-ID-free blind spot as the batch bucket: `load-testing-oct` lives in the old account and its name can't be reused globally. The `logging_testing` CI job is wired into the workflow and runs test_basic_s3_logging, which uploads to this bucket with the CI env creds, then lists and deletes objects — a live dependency. Rename to `load-testing-oct-941277531214`. The bucket must exist in the new account with the CI IAM principal granted s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(tests): repoint Bedrock guardrail IDs to new-account guardrails The migration left guardrail IDs untouched (no account ID in them), so all live guardrail tests failed with "guardrail identifier or version does not exist" against 941277531214. Recreated both guardrails in the new account and updated the hardcoded IDs: - wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD, with explicit inputAction=ANONYMIZE so masking applies to INPUT, which is the source litellm's moderation hook sends) - ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set to the exact string the tests assert on) Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the guardrailConfig in test_bedrock_completion.py. Verified locally: the 5 previously-failing guardrail tests now pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bedrock): migrate legacy models to current inference profiles The new CI account (941277531214) cannot invoke legacy Bedrock models (AWS gates them: "marked by provider as Legacy... not actively using in the last 30 days"). Migrated the live-call tests: - anthropic.claude-3-sonnet-20240229 -> us.anthropic.claude-sonnet-4-5-20250929-v1:0 - anthropic.claude-3-haiku-20240307 -> us.anthropic.claude-haiku-4-5-20251001-v1:0 Current Claude models on Bedrock require the us. inference-profile prefix (bare on-demand ids are rejected). cohere.command-r-plus has no working replacement (all Cohere is legacy- gated in the new account): swapped to claude-haiku-4-5 in provider- agnostic param lists. amazon.titan-image-generator skipped (no working replacement). Mocked/transformation/cost tests that reference the legacy strings are intentionally left unchanged. Verified live against the new account. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bedrock): repoint SageMaker + Knowledge Base to new-account resources These referenced account-scoped resources by hardcoded id that only existed in the old account, so the migration's account-ID swap missed them. Recreated in 941277531214 and repointed: - SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614 -> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge) - Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless vector store + titan-embed-text-v2, seeded with a LiteLLM doc) Verified live: test_sagemaker.py (12 passed) and test_bedrock_knowledgebase_hook.py (12 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214) claude-opus-4-7 is listed in the new Bedrock CI account's foundation models but invoke is denied (AccessDeniedException: "not available for this account"). Bedrock access to the flagship Opus requires an AWS Sales request, not the self-serve model-access toggle, so it can't be enabled inline with the rest of the account migration. Add an optional `skip_reason` to ModelEntry and set it on the bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip. Cell count (231) and route coverage are unchanged, so the structural asserts still pass. Restore coverage by deleting the one skip_reason line once access is granted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bedrock): swap/skip legacy-gated models unavailable on new CI account The migrated AWS account (941277531214) cannot access several models that the old account could, so the remaining red CI jobs were hitting real Bedrock "Access denied / Legacy" and "account not authorized" errors: - image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is legacy-gated), matching the existing titan skip. - batches: skip test_async_file_and_batch (Bedrock batch inference is not authorized on the new account; requires an AWS support case). - litellm_overhead: swap legacy claude-3-5-haiku for the active us.anthropic.claude-haiku-4-5 inference profile. - test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the active us.anthropic.claude-sonnet-4-5 inference profile. https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa * test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account - e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference is not authorized on account 941277531214) and migrate the missed s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214. - build_and_test: swap legacy bedrock claude-3-sonnet for the active us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured output e2e test. https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa * test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791) Replace the silent skips added for the new CI account with noisier behavior: - reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present) instead of skipping, so the missing entitlement stays visible in CI; they still skip when AWS creds are absent (local dev) - Bedrock batch inference tests: drop the skip so they run and fail until batch access is granted - Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the transform + cost-tracking path stays under test without live model access https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT Co-authored-by: Claude <noreply@anthropic.com> * test(bedrock): use pytest.xfail for known-failing opus-4-7 cells Replace pytest.fail with pytest.xfail when a model has a fail_reason, so known-broken cells stay visible as XFAIL without keeping CI red. Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(otel): export SERVER span on management-endpoint success without http_request (#28794) Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local> * chore(ci): merge dev branch (#28801) * chore(proxy): route path-dependent call sites through get_request_route Replace direct ``request.url.path`` reads in auth, ACL, routing, and audit-log decisions with ``get_request_route(request)`` — the helper already added in ``auth/auth_utils.py`` that returns the ASGI ``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs ``url.path`` from the Host header; ``scope["path"]`` is uvicorn's parse of the request line and matches what FastAPI dispatches on, so it's the authoritative route for any decision that should agree with the actual handler. Sites: - _experimental/mcp_server/auth/user_api_key_auth_mcp.py - management_endpoints/mcp_management_endpoints.py - vector_store_endpoints/utils.py - pass_through_endpoints/pass_through_endpoints.py - auth/route_checks.py - litellm_pre_call_utils.py - spend_tracking/spend_management_endpoints.py - common_utils/http_parsing_utils.py - management_helpers/utils.py - health_endpoints/_health_endpoints.py Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py that construct a Request with scope["path"] set to a benign route and the Host header crafted so url.path would resolve differently; each site's decision is asserted against scope["path"]. * chore(proxy): make get_request_route imports lazy at call sites Move the ``from litellm.proxy.auth.auth_utils import get_request_route`` imports added in the prior commit back to the function bodies that use them. The module-level form participates in a long-standing import cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL on the PR; the lazy form matches the pattern the proxy already uses for ``user_api_key_auth`` and related helpers elsewhere in these files. Also drop the ``RouteChecks._is_assistants_api_request`` delegation in ``_get_metadata_variable_name`` introduced in the prior commit — the delegation pulled ``RouteChecks`` into the same cycle, and the call site reuses the resolved route for its other branches, so inlining the substring check is both cycle-free and avoids a redundant second ``get_request_route`` call. Comment in test_proxy_routes.py acknowledges that the two MCP table entries exercise ``get_request_route`` directly rather than the full production handler (which needs ASGI scope + MCP state to invoke). --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> * chore(ci): merge dev branch (#28657) * feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543) * feat(dashboard): refine navbar zones and Agent Platform notice Restructure the admin navbar for production users: clear product vs community vs personal columns with vertical dividers, icon-only Slack/GitHub in a shared chip, and Docs/Blog typography aligned on an 8px rhythm. Add a notifications bell with popover linking to the LiteLLM Agent Platform repo and optional mark-as-read persistence. Promote the account control with initials avatar, single-line display name, and navDisplayName mapping for placeholder user ids (e.g. default_user_id). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex - Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock - Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages - Remove redundant equality checks in navDisplayName (regex already covers them) - Remove unused `lower` variable after simplification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(dashboard): drop dead useHealthReadiness import in navbar The module was removed in #27896 (replaced by useHealthReadinessDetails), but the import survived the rebase. The symbol is unused — only useHealthReadinessDetails is consumed in the file. Removing the dead import unblocks the UI TypeScript build. * fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels The component was refactored to an icon-only chip with aria-label='LiteLLM on GitHub' (squash #27543), but the test still asserted /star us on github/i. Update the query to match the rendered accessible name. * refactor(dashboard): drop unused props from NavbarProps The navbar refactor moved user identity + dark-mode state to internal hooks (useAuthorized, useWorker), but the NavbarProps interface still declared userID, userEmail, userRole, premiumUser, isDarkMode, and toggleDarkMode as required, forcing every caller to thread them through. Drop them from the interface and all four call sites (page.tsx, (dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also shrinks the destructure in layout.tsx so the now-unused locals stop being pulled out of useAuthorized(). * refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag Reads/writes of the litellmHideAgentPlatformBanner key were done directly inside NotificationsBell via a useEffect + useState pair. Every other localStorage-backed flag in the dashboard (Disable ShowPrompts, DisableBouncingIcon, DisableShowNewBadge, DisableUsageIndicator, DisableBlogPosts) is wrapped in a useSyncExternalStore hook over localStorageUtils so all mounted components stay in sync. Extract useHideAgentPlatformBanner to follow the same shape, swap NotificationsBell to consume it, and add a regression test that two sibling bells stay in sync without a remount when one is dismissed. * refactor: mask credential fields in proxy settings GET responses (#28682) * refactor: mask credential fields in proxy settings GET responses Brings SSO settings, cache settings, and the email/Slack alerting view in /get/config/callbacks in line with the HashiCorp Vault config-override pattern, so persisted credentials are not transported back to the UI in plaintext. * refactor: harden short-value masking and hoist alerting var constant Closes two review observations: - mask_sensitive_keys now replaces short values (below the visible prefix+suffix length) with an all-mask string instead of returning them unchanged, so a 1-7 character credential is no longer round-tripped verbatim. - _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level constant, matching the analogous _SSO_SENSITIVE_FIELDS and _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files. --------- Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(ui): show 2-decimal precision for max_budget on key overview (#28809) The Key Info Overview tab's Spend card truncated sub-dollar budgets to "$0" because formatNumberWithCommas defaults to 0 decimals. The Settings tab passes 2; align the overview so a $0.10 budget renders as "$0.10". Resolves LIT-2845 * feat(proxy): allow `llm_api_routes` virtual keys to list MCP servers (#28442) * feat(proxy): allow llm_api_routes virtual keys to list MCP servers Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET /v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that virtual keys configured with `allowed_routes=["llm_api_routes"]` can discover the MCP servers they have access to. Previously these calls failed with 'Virtual key is not allowed to call this route. Only allowed to call routes: [llm_api_routes]'. The GET handlers already sanitize the response for restricted virtual keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping credential-bearing fields (url, headers, env). Write methods (POST/PUT/DELETE) on the same paths remain gated by the existing handler-level admin role checks. The new discovery list is intentionally kept OUT of `mcp_inference_routes`, so `is_llm_api_route()` still returns False for these paths — this preserves the existing contract that DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP servers. Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> * refactor(proxy): make MCP discovery carve-out method-aware Replace the `mcp_discovery_routes` group in `llm_api_routes` with a method-aware special case inside `is_virtual_key_allowed_to_call_route`. Virtual keys with allowed_routes=["llm_api_routes"] are now permitted to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} — non-GET methods and multi-segment admin sub-paths fall through to the existing 403. This keeps the general llm_api_routes list free of management paths and avoids accidentally exposing POST/PUT/DELETE writes through the route-check layer. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> * chore(ci): merge dev branch (#28807) * chore(proxy): route path-dependent call sites through get_request_route Replace direct ``request.url.path`` reads in auth, ACL, routing, and audit-log decisions with ``get_request_route(request)`` — the helper already added in ``auth/auth_utils.py`` that returns the ASGI ``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs ``url.path`` from the Host header; ``scope["path"]`` is uvicorn's parse of the request line and matches what FastAPI dispatches on, so it's the authoritative route for any decision that should agree with the actual handler. Sites: - _experimental/mcp_server/auth/user_api_key_auth_mcp.py - management_endpoints/mcp_management_endpoints.py - vector_store_endpoints/utils.py - pass_through_endpoints/pass_through_endpoints.py - auth/route_checks.py - litellm_pre_call_utils.py - spend_tracking/spend_management_endpoints.py - common_utils/http_parsing_utils.py - management_helpers/utils.py - health_endpoints/_health_endpoints.py Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py that construct a Request with scope["path"] set to a benign route and the Host header crafted so url.path would resolve differently; each site's decision is asserted against scope["path"]. * chore(proxy): make get_request_route imports lazy at call sites Move the ``from litellm.proxy.auth.auth_utils import get_request_route`` imports added in the prior commit back to the function bodies that use them. The module-level form participates in a long-standing import cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL on the PR; the lazy form matches the pattern the proxy already uses for ``user_api_key_auth`` and related helpers elsewhere in these files. Also drop the ``RouteChecks._is_assistants_api_request`` delegation in ``_get_metadata_variable_name`` introduced in the prior commit — the delegation pulled ``RouteChecks`` into the same cycle, and the call site reuses the resolved route for its other branches, so inlining the substring check is both cycle-free and avoids a redundant second ``get_request_route`` call. Comment in test_proxy_routes.py acknowledges that the two MCP table entries exercise ``get_request_route`` directly rather than the full production handler (which needs ASGI scope + MCP state to invoke). --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> * fix(team): keep team_alias cache in sync on _cache_team_object writes (#28737) * fix(team): keep team_alias cache in sync on _cache_team_object writes _cache_team_object wrote only to the team_id:<id> cache key, but the JWT auth path that uses team_alias_jwt_field reads from a separate team_alias:<alias> key (get_team_object_by_alias caches under both keys on miss, but reads only the alias-keyed one). After any team-mutation endpoint (team_model_add, team_model_delete, update_team, the two access-group writes) the team_id cache was refreshed but the team_alias cache stayed stale until TTL — JWT callers using team_alias_jwt_field kept seeing the pre-mutation team for the full cache window. Mirror the write under the alias key inside _cache_team_object so every existing caller stays in sync without further changes. Skip the alias write when team_alias is None/empty so we don't collide across alias-less teams. Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the LIT-3244 fix correctly invalidated the team_id cache but the customer's JWT used team_alias_jwt_field, so they kept hitting the stale alias-keyed entry. * fix(team): delete (not overwrite) team_alias cache on _cache_team_object The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias> from _cache_team_object. team_alias is NOT unique in the schema (no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises). Writing the alias-keyed cache from the generic refresh path bypassed that check: a team admin renaming their team to collide with another team's alias could silently overwrite the cached team for JWT-by-alias auth, swapping the resolved team under that alias for the cache window. Switch the alias-keyed operation from a write to a delete (mirroring the dual-cache delete pattern in _delete_cache_key_object). After every team write, the next JWT-by-alias reader cache-misses and falls through to get_team_object_by_alias, which (a) re-fetches the fresh team from DB, closing the LIT-3244 staleness gap that motivated this PR, and (b) enforces alias uniqueness before populating either cache key. team_id:<id> writes are unchanged — team_id is the table PK and is guaranteed unique. Surfaced in veria-ai review on #28739. * fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)` which substring-matches the `model_id,` inside the file-ID encoding's `llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id then fed that deployment UUID back into the auth path as a model candidate via _extract_models_from_managed_resource_id, and every team-BYOK file attach 403'd with: team not allowed to access model. This team can only access models=['openai/*']. Tried to access <deployment-uuid> The team's models list correctly contains the public name (`openai/*`) that target_model_names matches, but the bogus UUID candidate fails the wildcard check first. Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it matches the legitimate top-level `model_id,<value>` field on vector_store unified IDs and skips substring matches inside other fields. File-IDs (which have no top-level `model_id` field) now return None and contribute no spurious UUID candidate. Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's exact flow: team with openai/* BYOK deployment, JWT-scoped user, POST /v1/vector_stores/{id}/files attaching a file uploaded with target_model_names=openai/gpt-4o. * fix(proxy): hydrate wildcard discovery credentials (#28284) (#28822) * fix(proxy): hydrate wildcard discovery credentials * fix(proxy): constrain wildcard credential hydration Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com> * ci: add daily oss-agent-shin branch creation workflow (#28829) Creates litellm_oss_agent_shin_MM_DD_YYYY from main every day at 00:00 UTC. Lets us retarget oss-agent-shin fork PRs onto a canonical branch so CircleCI runs with secrets, without granting the agent write access. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> * test(proxy): add harness for proxy_server.py behavior-pinning (#28827) * test(proxy): add harness for proxy_server.py behavior-pinning Creates tests/test_litellm/proxy/proxy_server/ with: - conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as, mock_router with parametrized response builders, normalize, etc.) - _coverage_check.py: per-PR coverage gate (line + branch) against a baseline, self-selects target by inspecting which placeholder files have been filled - _pin_check.py: AST-based gate that verifies every pin-list item has >=1 happy + >=1 error test with a real assertion (no status-only) - test_harness_smoke.py: 19 smoke tests covering every fixture + both scripts end-to-end - 26 placeholder test files (one docstring each) reserved for follow-up PRs per the directory ownership in the Notion plan - .coverage_baseline pinned at 0% so future PRs measure deltas against new-tests-only and aren't entangled with the broader scattered test suite Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml so this directory's runtime + coverage are tracked independently. Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc * ci(proxy-endpoints): allow workflow_dispatch Lets the workflow be triggered manually on a branch via `gh workflow run`, which is needed for the verify-first flow on workflow changes before opening a PR. * test(proxy): address review feedback on proxy_server harness - conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4]) instead of CWD-relative os.path.abspath("../../../../") which resolved to the wrong directory when pytest is launched from the repo root. - _coverage_check.py: actually read .coverage_baseline and use it as the floor (line_min = max(target, baseline)). Closes the gap between the PR description's "delta semantics" and what the script was doing. With baseline=0.0 today this is a no-op; future PRs that update the baseline cause regressions (test deletions etc.) to trip the gate even if the static PR target is still met. - _pin_check.py: drop unreachable startswith("_") guard (test_*.py glob never yields underscore-prefixed names) and read each test file once instead of twice. * feat(openai): apply regional-processing cost uplift for EU/US data residency (#28626) * feat(openai): apply regional-processing cost uplift for EU/US data residency OpenAI charges a 10% uplift on the latest GPT models when requests are served from a regionalized hostname (eu./us.api.openai.com). Infer the region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`, and multiply the computed cost by a per-model `regional_processing_uplift_multiplier_<region>` field. https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW * test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema * fix(cost): tighten data_residency inference and restore model_cost in tests - Only infer OpenAI data_residency when custom_llm_provider == "openai"; drop the implicit None fallback so non-OpenAI callers can't accidentally pick up a regional tag from a stray OpenAI hostname. - _local_model_cost_map fixture now snapshots and restores litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak state across the session. * refactor(openai): move data_residency helper under llms/openai * fix: thread data_residency through realtime stream cost calculation Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(cost): thread data_residency through batch_cost_calculator Apply the OpenAI regional-processing uplift multiplier to retrieve_batch cost paths so Batch API requests served via eu./us.api.openai.com are priced at the same uplifted token rates as completions/transcriptions. * refactor(openai): encapsulate provider check inside infer_openai_data_residency Move the custom_llm_provider == "openai" guard from get_litellm_params into the helper itself so the core utility no longer carries provider-specific dispatch logic. Callers pass through the provider unconditionally; the helper returns None for any non-OpenAI provider. * fix(responses): thread data_residency through Responses logging params The Responses API paths build their logging litellm_params dict after provider resolution but did not include data_residency, so cost calc saw None even when the effective api_base was a regional OpenAI host. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

- _emit_management_endpoint_otel_span now passes result as response on success - remove duplicate _CREDENTIAL_LITELLM_PARAM_FIELDS assignment in model_checks Co-authored-by: Yassin Kortam <yassin@berri.ai>

cursor

Cursor Bugbot has reviewed your changes using high effort and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

✅ Fixed: Streaming path hardcodes POST, non-streaming uses request.method
- Replaced the hardcoded "POST" in async_client.build_request with request.method, matching the non-streaming SigV4 path.
✅ Fixed: Set comprehension rebuilt on every uplift multiplier call
- Hoisted the DataResidency value set to a module-level frozenset (_VALID_DATA_RESIDENCIES) and reused it in _get_regional_uplift_multiplier.
✅ Fixed: Credential leak: example config contains real AWS ARN
- Replaced the real-looking AWS account ID 941277531214 with placeholder 123456789012 (and a generic role suffix) in the s3_bucket_name and aws_batch_role_arn example values.

Preview (57c84a22d7)

diff --git a/deploy/charts/litellm-helm/templates/deployment.yaml b/deploy/charts/litellm-helm/templates/deployment.yaml
--- a/deploy/charts/litellm-helm/templates/deployment.yaml
+++ b/deploy/charts/litellm-helm/templates/deployment.yaml
@@ -30,7 +30,7 @@
         checksum/config: {{ include (print $.Template.BasePath "/configmap-litellm.yaml") . | sha256sum }}
         {{- end }}
         {{- with .Values.podAnnotations }}
-        {{- toYaml . | nindent 8 }}
+        {{- tpl (toYaml .) $ | nindent 8 }}
         {{- end }}
       labels:
         {{- include "litellm.labels" . | nindent 8 }}

diff --git a/deploy/charts/litellm-helm/tests/deployment_tests.yaml b/deploy/charts/litellm-helm/tests/deployment_tests.yaml
--- a/deploy/charts/litellm-helm/tests/deployment_tests.yaml
+++ b/deploy/charts/litellm-helm/tests/deployment_tests.yaml
@@ -377,3 +377,28 @@
           content:
             name: sidecar-tpl
             image: "ghcr.io/berriai/litellm-database:test"
+  - it: should support tpl in podAnnotations
+    template: deployment.yaml
+    set:
+      image:
+        repository: ghcr.io/berriai/litellm-database
+        tag: test
+      # Mirrors the real-world scenario this feature unblocks:
+      # user disables the built-in ConfigMap (and its built-in checksum/config
+      # annotation) and re-implements checksum/config themselves via tpl.
+      proxyConfigMap:
+        create: false
+      podAnnotations:
+        checksum/config: "{{ .Values.image.tag }}"
+        example.com/some-key: "{{ .Values.image.repository }}"
+        example.com/literal: "plain-string-value"
+    asserts:
+      - equal:
+          path: spec.template.metadata.annotations["checksum/config"]
+          value: "test"
+      - equal:
+          path: spec.template.metadata.annotations["example.com/some-key"]
+          value: "ghcr.io/berriai/litellm-database"
+      - equal:
+          path: spec.template.metadata.annotations["example.com/literal"]
+          value: "plain-string-value"

diff --git a/litellm/litellm_core_utils/llm_cost_calc/utils.py b/litellm/litellm_core_utils/llm_cost_calc/utils.py
--- a/litellm/litellm_core_utils/llm_cost_calc/utils.py
+++ b/litellm/litellm_core_utils/llm_cost_calc/utils.py
@@ -30,7 +30,10 @@
     }
 )
 
+# Pre-resolved DataResidency enum values for fast membership checks
+_VALID_DATA_RESIDENCIES = frozenset(r.value for r in DataResidency)
 
+
 def _is_above_128k(tokens: float) -> bool:
     if tokens > 128000:
         return True
@@ -636,7 +639,7 @@
     if data_residency is None:
         return 1.0
     residency = data_residency.lower()
-    if residency not in {r.value for r in DataResidency}:
+    if residency not in _VALID_DATA_RESIDENCIES:
         return 1.0
     multiplier = model_info.get(f"regional_processing_uplift_multiplier_{residency}")
     if multiplier is None:

diff --git a/litellm/llms/base_llm/chat/transformation.py b/litellm/llms/base_llm/chat/transformation.py
--- a/litellm/llms/base_llm/chat/transformation.py
+++ b/litellm/llms/base_llm/chat/transformation.py
@@ -108,10 +108,9 @@
         return type_to_response_format_param(response_format=response_format)
 
     def is_thinking_enabled(self, non_default_params: dict) -> bool:
-        return (
-            non_default_params.get("thinking", {}).get("type") == "enabled"
-            or non_default_params.get("reasoning_effort") is not None
-        )
+        return (non_default_params.get("thinking") or {}).get(
+            "type"
+        ) == "enabled" or non_default_params.get("reasoning_effort") is not None
 
     def is_max_tokens_in_request(self, non_default_params: dict) -> bool:
         """

diff --git a/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py b/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
--- a/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
+++ b/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
@@ -1,3 +1,4 @@
+import html as _html
 import json
 from typing import Any, Dict, Optional
 from urllib.parse import parse_qsl, urlencode, urlparse, urlunparse
@@ -618,8 +619,105 @@
     )
 
 
+# Per RFC 6749 §4.1.2.1, an IdP that rejects an OAuth authorization request
+# redirects back to the configured redirect URI with ``error`` /
+# ``error_description`` / ``error_uri`` query params and no ``code``. The MCP
+# loopback flow funnels that response through this /callback endpoint, so
+# the endpoint must accept either a successful (``code``+``state``) or an
+# error response. Declaring ``code``/``state`` as required would cause
+# FastAPI to reject the error response with a 422 before the handler runs,
+# which strands the MCP client waiting on the loopback (see LIT-2750).
+
+
+def _render_oauth_error_html(error: str, description: Optional[str]) -> HTMLResponse:
+    """Render an actionable HTML page for an IdP-reported OAuth error.
+
+    Used when we cannot propagate the error back to the registered
+    ``redirect_uri`` (state missing or undecryptable). Returned with a 400
+    status so the failure is observable to operators while still being a
+    human-readable page for the end user.
+    """
+    safe_error = _html.escape(error or "unknown_error")
+    safe_description = _html.escape(description) if description else ""
+    description_html = f"<p>{safe_description}</p>" if safe_description else ""
+    body = (
+        "<html><body>"
+        "<h2>Authentication failed</h2>"
+        f"<p><strong>Error:</strong> {safe_error}</p>"
+        f"{description_html}"
+        "<p>You can close this window and try again.</p>"
+        "</body></html>"
+    )
+    return HTMLResponse(body, status_code=400)
+
+
 @router.get("/callback")
-async def callback(request: Request, code: str, state: str):
+async def callback(
+    request: Request,
+    code: Optional[str] = None,
+    state: Optional[str] = None,
+    error: Optional[str] = None,
+    error_description: Optional[str] = None,
+    error_uri: Optional[str] = None,
+):
+    """OAuth 2.0 authorization response handler for MCP loopback clients.
+
+    Accepts either:
+
+    - A successful authorization response (``code`` + ``state``), which is
+      forwarded back to the validated client ``redirect_uri`` with the
+      original (un-wrapped) ``state``.
+    - An error response (``error``[+``error_description``/``error_uri``]), per
+      RFC 6749 §4.1.2.1. When ``state`` is present and decodes to a trusted
+      ``redirect_uri``, the error params are propagated back to the client so
+      its OAuth library can surface them. Otherwise we render an HTML error
+      page so the user is not left on an opaque 422 / blank screen.
+    """
+    # 1. IdP-reported error path (e.g. ``?error=access_denied``).
+    if error:
+        verbose_logger.info(
+            "MCP /callback received IdP error: error=%s, error_description=%s",
+            error,
+            error_description,
+        )
+        if state:
+            try:
+                state_data = decode_state_hash(state)
+                original_state = state_data.get("original_state")
+                redirect_uri = _get_validated_client_redirect_uri(request, state_data)
+            except HTTPException:
+                # Untrusted/invalid client redirect_uri — surface inline rather
+                # than blindly forwarding the error to an attacker-controlled URL.
+                return _render_oauth_error_html(error, error_description)
+            except Exception:
+                # State could not be decrypted (expired key, tampered, etc.).
+                return _render_oauth_error_html(error, error_description)
+
+            params: Dict[str, str] = {"error": error}
+            if error_description:
+                params["error_description"] = error_description
+            if error_uri:
+                params["error_uri"] = error_uri
+            if original_state is not None:
+                params["state"] = original_state
+            complete_returned_url = _append_query_params(redirect_uri, params)
+            return RedirectResponse(url=complete_returned_url, status_code=302)
+
+        # No state — nothing to round-trip to. Show the user the error.
+        return _render_oauth_error_html(error, error_description)
+
+    # 2. Neither success nor error parameters present — most likely a stray
+    #    GET / dropped SSO redirect chain. Surface a 400 instead of 422.
+    if not code or not state:
+        missing = [
+            name for name, value in (("code", code), ("state", state)) if not value
+        ]
+        return _render_oauth_error_html(
+            "invalid_request",
+            f"Missing authorization {' and '.join(repr(m) for m in missing)} parameter(s).",
+        )
+
+    # 3. Successful authorization response.
     try:
         state_data = decode_state_hash(state)
         original_state = state_data["original_state"]

diff --git a/litellm/proxy/example_config_yaml/oai_misc_config.yaml b/litellm/proxy/example_config_yaml/oai_misc_config.yaml
--- a/litellm/proxy/example_config_yaml/oai_misc_config.yaml
+++ b/litellm/proxy/example_config_yaml/oai_misc_config.yaml
@@ -23,11 +23,11 @@
       model: bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0
       #########################################################
       ########## batch specific params ########################
-      s3_bucket_name: litellm-proxy-941277531214
+      s3_bucket_name: litellm-proxy-123456789012
       s3_region_name: us-west-2
       s3_access_key_id: os.environ/AWS_ACCESS_KEY_ID
       s3_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
-      aws_batch_role_arn: arn:aws:iam::941277531214:role/service-role/AmazonBedrockExecutionRoleForAgents_BB9HNW6V4CV
+      aws_batch_role_arn: arn:aws:iam::123456789012:role/service-role/AmazonBedrockExecutionRoleForAgents_EXAMPLE
     model_info: 
       mode: batch
 

diff --git a/litellm/proxy/management_helpers/utils.py b/litellm/proxy/management_helpers/utils.py
--- a/litellm/proxy/management_helpers/utils.py
+++ b/litellm/proxy/management_helpers/utils.py
@@ -471,10 +471,17 @@
         route = func.__name__
         request_body = {}
 
+    _response: Optional[dict] = None
+    if exception is None and result is not None:
+        try:
+            _response = dict(result)
+        except Exception:
+            _response = None
+
     logging_payload = ManagementEndpointLoggingPayload(
         route=route,
         request_data=request_body,
-        response=None,
+        response=_response,
         start_time=start_time,
         end_time=end_time,
         exception=exception,

diff --git a/litellm/proxy/pass_through_endpoints/pass_through_endpoints.py b/litellm/proxy/pass_through_endpoints/pass_through_endpoints.py
--- a/litellm/proxy/pass_through_endpoints/pass_through_endpoints.py
+++ b/litellm/proxy/pass_through_endpoints/pass_through_endpoints.py
@@ -908,7 +908,7 @@
                     else {"json": _parsed_body}
                 )
                 req = async_client.build_request(
-                    "POST",
+                    request.method,
                     url,
                     params=requested_query_params,
                     headers=headers,

diff --git a/tests/test_litellm/proxy/_experimental/mcp_server/test_callback_oauth_error_responses.py b/tests/test_litellm/proxy/_experimental/mcp_server/test_callback_oauth_error_responses.py
new file mode 100644
--- /dev/null
+++ b/tests/test_litellm/proxy/_experimental/mcp_server/test_callback_oauth_error_responses.py
@@ -1,0 +1,210 @@
+"""Regression tests for LIT-2750.
+
+The MCP OAuth ``/callback`` endpoint must handle IdP error responses
+(e.g. ``?error=access_denied``) gracefully instead of returning a 422
+because ``code`` and ``state`` were declared as required FastAPI query
+params. Per RFC 6749 §4.1.2.1 the IdP redirects to the configured
+redirect URI with ``error`` / ``error_description`` / ``error_uri``
+query params and no ``code`` when the user denies access.
+
+These tests cover both the propagate-to-client path (when state decodes
+to a trusted ``redirect_uri``) and the in-page fallback (when state is
+missing, undecryptable, or carries an untrusted redirect_uri). They also
+pin the success path (``code`` + ``state``) against accidental
+regressions.
+"""
+
+import pytest
+
+
+@pytest.fixture(autouse=True)
+def _mock_mcp_client_ip():
+    """Bypass IP-based access control for the in-process TestClient.
+
+    Mirrors the autouse fixture in ``test_discoverable_endpoints.py`` so
+    these tests don't require a real client IP context.
+    """
+    from unittest.mock import patch
+
+    with patch(
+        "litellm.proxy._experimental.mcp_server.discoverable_endpoints.IPAddressUtils.get_mcp_client_ip",
+        return_value=None,
+    ):
+        yield
+
+
+@pytest.fixture
+def callback_test_client(monkeypatch):
+    """FastAPI TestClient mounted with the MCP discoverable router.
+
+    Sets a deterministic ``LITELLM_SALT_KEY`` so encoded states minted
+    in-test can be decrypted by the handler.
+    """
+    from fastapi import FastAPI
+    from fastapi.testclient import TestClient
+
+    monkeypatch.setenv("LITELLM_SALT_KEY", "sk-test-salt-for-LIT-2750")
+
+    from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
+        router,
+    )
+
+    app = FastAPI()
+    app.include_router(router)
+    return TestClient(app)
+
+
+class TestCallbackOAuthErrorResponses:
+    """LIT-2750: IdP error responses to ``/callback`` must not 422."""
+
+    def test_idp_error_with_no_state_returns_400_html(self, callback_test_client):
+        """Pre-fix: 422 Pydantic. Post-fix: 400 HTML with the IdP's error."""
+        resp = callback_test_client.get(
+            "/callback",
+            params={
+                "error": "access_denied",
+                "error_description": "User declined access",
+            },
+            follow_redirects=False,
+        )
+        assert resp.status_code == 400
+        assert "text/html" in resp.headers["content-type"]
+        body = resp.text
+        assert "access_denied" in body
+        assert "User declined access" in body
+        # Sanity: must not leak the Pydantic validation error.
+        assert "Field required" not in body
+
+    def test_idp_error_html_escapes_user_controlled_fields(
+        self, callback_test_client
+    ):
+        """A malicious IdP must not be able to inject HTML/JS via error params."""
+        resp = callback_test_client.get(
+            "/callback",
+            params={
+                "error": "<script>alert(1)</script>",
+                "error_description": "<img src=x onerror=alert(2)>",
+            },
+            follow_redirects=False,
+        )
+        assert resp.status_code == 400
+        body = resp.text
+        # Raw tags must be escaped, not present verbatim.
+        assert "<script>alert(1)</script>" not in body
+        assert "<img src=x onerror=alert(2)>" not in body
+        assert "&lt;script&gt;alert(1)&lt;/script&gt;" in body
+
+    def test_idp_error_with_trusted_state_propagates_to_client_redirect_uri(
+        self, callback_test_client
+    ):
+        """When state decodes to a trusted (loopback) redirect_uri, propagate
+        the error back so the MCP client's OAuth library can surface it
+        instead of timing out waiting on the loopback."""
+        from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
+            encode_state_with_base_url,
+        )
+
+        state = encode_state_with_base_url(
+            base_url="http://localhost:3000/",
+            original_state="client-original-state-xyz",
+            client_redirect_uri="http://127.0.0.1:60108/callback",
+        )
+
+        resp = callback_test_client.get(
+            "/callback",
+            params={
+                "error": "access_denied",
+                "error_description": "User declined access",
+                "state": state,
+            },
+            follow_redirects=False,
+        )
+        assert resp.status_code == 302
+        location = resp.headers["location"]
+        assert location.startswith("http://127.0.0.1:60108/callback?")
+        assert "error=access_denied" in location
+        # Original client state must be round-tripped, not our wrapped state.
+        assert "state=client-original-state-xyz" in location
+        # error_description percent-encoded but present.
+        assert "error_description=User" in location
+        # Wrapped/encrypted state must NOT leak to the client.
+        assert state not in location
+
+    def test_idp_error_with_untrusted_redirect_uri_does_not_open_redirect(
+        self, callback_test_client
+    ):
+        """If the state minted earlier carries a redirect_uri that the proxy
+        no longer trusts, we must surface the error inline rather than
+        302-ing to an attacker-controlled URL (open-redirect)."""
+        from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
+            encode_state_with_base_url,
+        )
+
+        state = encode_state_with_base_url(
+            base_url="http://localhost:3000/",
+            original_state="x",
+            client_redirect_uri="https://attacker.example.com/steal",
+        )
+
+        resp = callback_test_client.get(
+            "/callback",
+            params={"error": "access_denied", "state": state},
+            follow_redirects=False,
+        )
+        # Must not 3xx — open redirect would defeat the redirect_uri allowlist.
+        assert resp.status_code == 400
+        assert "attacker.example.com" not in resp.headers.get("location", "")
+        assert "access_denied" in resp.text
+
+    def test_idp_error_with_undecryptable_state_falls_back_to_html(
+        self, callback_test_client
+    ):
+        resp = callback_test_client.get(
+            "/callback",
+            params={
+                "error": "server_error",
+                "error_description": "boom",
+                "state": "not-a-valid-encrypted-state",
+            },
+            follow_redirects=False,
+        )
+        assert resp.status_code == 400
+        assert "server_error" in resp.text
+        assert "boom" in resp.text
+
+    def test_bare_callback_with_no_params_returns_400_not_422(
+        self, callback_test_client
+    ):
+        """An SSO redirect chain that drops the original /authorize query
+        params should land on a human-readable 400, not a Pydantic 422."""
+        resp = callback_test_client.get("/callback", follow_redirects=False)
+        assert resp.status_code == 400
+        assert "invalid_request" in resp.text
+        assert "Field required" not in resp.text
+
+    def test_success_path_still_redirects_with_code_and_state(
+        self, callback_test_client
+    ):
+        """Regression: the successful (``code``+``state``) flow must still
+        redirect back to the trusted client redirect_uri with the original
+        state preserved."""
+        from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
+            encode_state_with_base_url,
+        )
+
+        state = encode_state_with_base_url(
+            base_url="http://localhost:3000/",
+            original_state="orig-state-success",
+            client_redirect_uri="http://127.0.0.1:60108/callback",
+        )
+
+        resp = callback_test_client.get(
+            "/callback",
+            params={"code": "auth-code-abc", "state": state},
+            follow_redirects=False,
+        )
+        assert resp.status_code == 302
+        location = resp.headers["location"]
+        assert location.startswith("http://127.0.0.1:60108/callback?")
+        assert "code=auth-code-abc" in location
+        assert "state=orig-state-success" in location

diff --git a/tests/test_litellm/test_thinking_enabled.py b/tests/test_litellm/test_thinking_enabled.py
new file mode 100644
--- /dev/null
+++ b/tests/test_litellm/test_thinking_enabled.py
@@ -1,0 +1,74 @@
+"""
+Unit tests for is_thinking_enabled method in BaseConfig.
+
+Tests the fix for issue #28576: handle None thinking param without crashing.
+"""
+
+import pytest
+from litellm.llms.base_llm.chat.transformation import BaseConfig
+
+
+class TestIsThinkingEnabled:
+    """Test is_thinking_enabled handles various thinking parameter values."""
+
+    @pytest.fixture
+    def transformer(self):
+        """Create a BaseConfig instance for testing."""
+        # BaseConfig is abstract, so we create a minimal concrete subclass
+        class ConcreteConfig(BaseConfig):
+            def __init__(self):
+                pass
+
+            def get_complete_url(self, *args, **kwargs):
+                return ""
+
+            def validate_environment(self, *args, **kwargs):
+                return {}
+
+            def transform_request(self, *args, **kwargs):
+                return {}, {}
+
+            def transform_response(self, *args, **kwargs):
+                return None
+
+            def get_supported_openai_params(self, model: str):
+                return []
+
+            def map_openai_params(self, *args, **kwargs):
+                return {}
+
+            def get_error_class(self, *args, **kwargs):
+                from litellm.llms.base_llm.chat.transformation import BaseLLMException
+                return BaseLLMException(500, "test error")
+
+        return ConcreteConfig()
+
+    @pytest.mark.parametrize(
+        "non_default_params,expected",
+        [
+            # thinking=None should not crash, returns False
+            ({"thinking": None}, False),
+            # thinking={'type': 'enabled'} returns True
+            ({"thinking": {"type": "enabled"}}, True),
+            # thinking key missing returns False
+            ({}, False),
+            # thinking={} returns False
+            ({"thinking": {}}, False),
+            # thinking with different type returns False
+            ({"thinking": {"type": "disabled"}}, False),
+            # reasoning_effort present returns True
+            ({"reasoning_effort": "medium"}, True),
+            # both thinking enabled and reasoning_effort returns True
+            ({"thinking": {"type": "enabled"}, "reasoning_effort": "high"}, True),
+            # falsy thinking values should not crash
+            ({"thinking": False}, False),
+            ({"thinking": 0}, False),
+            ({"thinking": ""}, False),
+        ],
+    )
+    def test_is_thinking_enabled(self, transformer, non_default_params, expected):
+        """Test is_thinking_enabled with various parameter combinations."""
+        result = transformer.is_thinking_enabled(non_default_params)
+        assert result == expected, (
+            f"Expected {expected} for params {non_default_params}, got {result}"
+        )

_{You can send follow-ups to the cloud agent here.}

- pass_through_endpoints: use request.method instead of hardcoded POST in streaming SigV4-signed request path for consistency with the non-streaming branch - llm_cost_calc/utils: hoist DataResidency value set to a module-level frozenset to avoid rebuilding it on every cost calculation - example_config_yaml/oai_misc_config: replace real-looking AWS account ID with placeholder 123456789012 in example bucket and role ARN Co-authored-by: Yassin Kortam <yassin@berri.ai>

veria-ai · 2026-05-26T04:21:13Z

+    _response: Optional[dict] = None
+    if exception is None and result is not None:
+        try:
+            _response = dict(result)


Medium: API keys in telemetry

dict(result) captures full management responses and the OpenTelemetry hook writes each field as a response.* span attribute. Wrapped endpoints such as /key/generate return GenerateKeyResponse.key, so anyone with access to exported traces can retrieve freshly generated virtual API keys; avoid logging management responses wholesale or redact secret fields like key, token, api_key, secret, and password before building the payload.

veria-ai · 2026-05-26T04:21:20Z

PR overview

There is one open security issue remaining, while one prior issue has already been addressed. The outstanding risk is that management API responses are being copied into telemetry span attributes, which can expose newly generated API keys to anyone with access to exported traces. This should be fixed by redacting secret fields or avoiding wholesale response logging before the PR is considered secure.

Open issues (1)

litellm/proxy/management_helpers/utils.py:477 — Medium: API keys in telemetry dict(result) captures full management responses and the OpenTelemetry hook writes each field as a `respo

Fixed/addressed: 1 · PR risk: 7/10

#28055) Aligns the github_copilot catalog with values returned by Copilot's public /models endpoint (capabilities.limits + capabilities.supports + model.supported_endpoints). - Adds 10 new model entries: claude-opus-4.7, claude-sonnet-4.6, gemini-3-flash-preview, gemini-3.1-pro-preview, gpt-4-0125-preview, gpt-5.2-codex, gpt-5.4, gpt-5.4-mini, gpt-5.5, oswe-vscode-prime. - Updates max_input_tokens for existing entries to reflect each model's true context window (e.g. gpt-4o-mini 64000 -> 128000, gpt-5-mini 128000 -> 264000, gpt-5.3-codex 128000 -> 400000, claude-haiku-4.5 128000 -> 200000). - Adds supports_reasoning, supports_response_schema, supports_function_calling, supports_parallel_function_calling, supports_vision based on capabilities.supports. - Declares supported_endpoints for entries missing it (e.g. gpt-3.5-turbo, gpt-4o, embeddings). - For responses-only models (gpt-5.2-codex, gpt-5.4, gpt-5.4-mini, gpt-5.5), sets mode to 'responses'. - gpt-41-copilot.mode changes from 'completion' to 'chat' because Copilot reports capabilities.type = 'chat'. Revertible on request. Pricing fields and other manually-curated values are preserved.

…metric (#28831) Adds a new `litellm.overhead.latency` gauge metric to `DatadogMetricsLogger` (the `/api/v2/series` path). The value is sourced from `hidden_params["litellm_overhead_time_ms"]` already computed in `ResponseMetadata` and exposed in `StandardLoggingPayload`. Matches the Prometheus integration which exposes the same value via `litellm_overhead_latency_metric`. Emitted in seconds (ms ÷ 1000) for consistency with the other latency series. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Shin <shin@litellm.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>

…8876) Use LRU-cached TracerProviders with project-scoped OTEL Resources so team/key metadata routes traces correctly. On the proxy, project selection is limited to server-controlled user_api_key_auth_metadata; client metadata fields stay banned.

Sameerlite · 2026-05-26T13:38:03Z

@greptileai

cursor

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Failure path now calls _emit_semantic_logs with None response
- Gated the _emit_semantic_logs call on success so failure paths (where response_obj is typically None) no longer invoke it and trigger AttributeError.

_{You can send follow-ups to the cloud agent here.}

Co-authored-by: Yassin Kortam <yassin@berri.ai>

cursor

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Failure path now logs raw request and metrics unexpectedly
- Gated _maybe_log_raw_request and _record_metrics calls behind if success: in _handle_phoenix_trace so the failure path no longer emits raw-request sub-spans or OTEL metrics, restoring the original _handle_failure behavior.

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 1ef30bf. Configure here.}

Restores pre-refactor behavior: _handle_failure no longer emits raw-request sub-spans or records OTEL metrics, matching the original _handle_failure that did not call these helpers. Co-authored-by: Yassin Kortam <yassin@berri.ai>

Issue 1 (arize_phoenix.py — caller-controlled telemetry routing): - _is_proxy_request no longer detects proxy mode by checking user_api_key_auth_metadata in request metadata. That field is user-supplied, so an authenticated caller could fake proxy-mode detection and have _project_from_metadata_dict read their own dict for project selection, routing telemetry to arbitrary Arize/Phoenix projects. Proxy mode is now determined solely by the server-set proxy_server_request field in litellm_params. - auth_utils.py adds user_api_key_auth_metadata to the banned request body params list so the proxy rejects any attempt to supply the field at the HTTP layer. The field is server-reserved: it is written exclusively by add_user_api_key_auth_to_request_metadata from the authenticated key's database record after the ban check runs. Issue 2 (management_helpers/utils.py — API key in OTEL span): - _emit_management_endpoint_otel_span stripped plaintext credential fields (key, token, api_key, secret, …) from the response dict before passing it to the OTEL success hook. dict(result) on a Pydantic GenerateKeyResponse includes the freshly-generated key field, which would previously be written as a span attribute to every configured OTEL collector/backend. Co-authored-by: Cursor <cursoragent@cursor.com>

Sameerlite · 2026-05-26T15:40:36Z

Both medium security issues are now fixed in commit ccf368e.

Issue 1 — Caller-controlled telemetry routing (arize_phoenix.py)

Two changes:

_is_proxy_request no longer inspects user_api_key_auth_metadata in request metadata to detect proxy mode. That field is user-supplied, so an authenticated caller could set metadata.user_api_key_auth_metadata = {…} in their request body to fake proxy-mode detection, then have _project_from_metadata_dict read their own dict for project selection. Proxy mode is now determined solely by the server-set proxy_server_request field in litellm_params.
auth_utils.py adds user_api_key_auth_metadata to _EXTRA_BANNED_OBSERVABILITY_PARAMS. The proxy now rejects any request that carries this field in the body or in metadata. The field is server-reserved (written exclusively by add_user_api_key_auth_to_request_metadata from the authenticated key's DB record after the ban check runs), so legitimate clients have no reason to supply it.

Issue 2 — Plaintext API key in OTEL span (management_helpers/utils.py)

_emit_management_endpoint_otel_span now strips a _CREDENTIAL_FIELDS set (key, token, api_key, secret, password, access_token, refresh_token, private_key, service_account_key) from dict(result) before the dict is handed to the OTEL success hook. Previously dict(result) on a Pydantic GenerateKeyResponse included the freshly-generated plaintext key field, which was then written as a span attribute to every configured OTEL collector/backend.

All 52 Arize Phoenix unit tests and 188 auth_utils tests pass with no changes to test expectations.

@greptileai

codecov · 2026-05-26T15:44:44Z

Codecov Report

❌ Patch coverage is 94.27083% with 11 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/integrations/arize/arize_phoenix.py	94.40%	8 Missing ⚠️
litellm/proxy/management_helpers/utils.py	75.00%	2 Missing ⚠️
..._experimental/mcp_server/discoverable_endpoints.py	96.87%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-26T15:58:38Z

Both fixes look correct. Here's what I verified:

Issue 1 — arize_phoenix.py:

_is_proxy_request now reads only litellm_params.proxy_server_request, which is set server-side — the docstring even explicitly calls out that inspecting user_api_key_auth_metadata is "intentionally avoided." The _project_from_metadata_dict path still correctly reads from user_api_key_auth_metadata for project selection in proxy mode (the server-populated version), while ignoring the top-level metadata dict in proxy mode. The trust boundary is cleanly enforced.

Issue 2 — management_helpers/utils.py:

The credential stripping is correctly implemented:

raw = dict(result)
_response = {k: v for k, v in raw.items() if k not in _CREDENTIAL_FIELDS}

The _CREDENTIAL_FIELDS set covers the right fields (key, api_key, token, secret, password, access_token, refresh_token, private_key, service_account_key). The try/except fallback to None on a non-iterable result is also a safe choice.

One nit: _CREDENTIAL_FIELDS is constructed inside the function body on every invocation. Since it's a constant, moving it to module level would be marginally more efficient and makes the intent clearer — but this is cosmetic and not a correctness or security concern.

Both original issues are resolved. The fixes are well-scoped and the explanations match the code.

mateo-berri

LGTM; thanks!

shin-berri and others added 7 commits May 13, 2026 22:37

Merge pull request #27906 from BerriAI/litellm_internal_staging

e58a561

[Infra] Promote internal staging to main

Merge pull request #28100 from BerriAI/litellm_internal_staging

a72414a

[Infra] Promote internal staging to main

Merge pull request #28292 from BerriAI/litellm_internal_staging

79b4578

chore(ci): promote internal staging to main

refactor(mcp): drop unused _OAUTH_ERROR_PARAMS constant (Greptile P2)

5192a4c

The tuple was leftover scaffolding from an earlier draft of the LIT-2750 fix; nothing references it. The explanatory RFC 6749 §4.1.2.1 comment block above the callback handler covers the same intent.

greptile-apps Bot reviewed May 25, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py Outdated

Comment thread litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py

Comment thread litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py

fix(mcp/oauth): preserve empty original_state and clarify missing-par…

3398a04

…am error in /callback Co-authored-by: Yassin Kortam <yassin@berri.ai>

Terrajlz and others added 3 commits May 25, 2026 18:33

fix(thinking): handle None thinking param in is_thinking_enabled (#28598

31664db

) Squash-merged by litellm-agent from Terrajlz's PR.

feat(helm): support tpl rendering in podAnnotations (#28609)

18ab237

Squash-merged by litellm-agent from devauxbr's PR.

fix: apply black formatting to base_llm chat transformation

43334f8

Fix CI black --check failure on is_thinking_enabled return formatting. Co-authored-by: Cursor <cursoragent@cursor.com>

Sameerlite mentioned this pull request May 25, 2026

[litellm-agent] Staging → litellm_internal_staging (5/22/2026) #28601

Closed

Sameerlite requested a review from a team May 26, 2026 03:43

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/proxy/management_helpers/utils.py Outdated

Comment thread litellm/proxy/auth/model_checks.py

fix: preserve OTEL response payload and remove duplicate constant

3c77cc3

- _emit_management_endpoint_otel_span now passes result as response on success - remove duplicate _CREDENTIAL_LITELLM_PARAM_FIELDS assignment in model_checks Co-authored-by: Yassin Kortam <yassin@berri.ai>

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/proxy/pass_through_endpoints/pass_through_endpoints.py

Comment thread litellm/litellm_core_utils/llm_cost_calc/utils.py

Comment thread litellm/proxy/example_config_yaml/oai_misc_config.yaml Outdated

Sameerlite and others added 2 commits May 26, 2026 09:24

Merge branch 'litellm_internal_staging' into litellm_oss_staging_250526

485492d

veria-ai Bot reviewed May 26, 2026

View reviewed changes

ririnto and others added 2 commits May 26, 2026 18:14

Sameerlite mentioned this pull request May 26, 2026

oss staging 05/25/2026 #28790

Closed

7 tasks

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/integrations/arize/arize_phoenix.py

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/proxy/management_helpers/utils.py

fix(arize_phoenix): skip _emit_semantic_logs on failure path

1ef30bf

Co-authored-by: Yassin Kortam <yassin@berri.ai>

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/integrations/arize/arize_phoenix.py Outdated

veria-ai Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/integrations/arize/arize_phoenix.py

mateo-berri enabled auto-merge (squash) May 26, 2026 18:57

mateo-berri approved these changes May 26, 2026

View reviewed changes

mateo-berri merged commit d52fbfb into litellm_internal_staging May 26, 2026
117 of 119 checks passed

mateo-berri deleted the litellm_oss_staging_250526 branch May 26, 2026 18:57

Uh oh!

Conversation

Sameerlite commented May 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

CLAassistant commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Security Review

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Sameerlite commented May 25, 2026

Uh oh!

Sameerlite commented May 25, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

veria-ai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Open issues (1)

Uh oh!

Sameerlite commented May 26, 2026

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Sameerlite commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

mateo-berri left a comment

Choose a reason for hiding this comment

Sameerlite commented May 25, 2026 •

edited by cursor Bot

Loading

CLAassistant commented May 25, 2026 •

edited

Loading

greptile-apps Bot commented May 25, 2026 •

edited

Loading

codecov Bot commented May 25, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading

cursor Bot left a comment •

edited

Loading

veria-ai Bot commented May 26, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading

cursor Bot left a comment •

edited

Loading

codecov Bot commented May 26, 2026 •

edited

Loading