fix(mcp): preserve auth errors + advertise upstream IdP for delegate-auth discovery#28550
Conversation
…ption
The ASGI MCP handlers caught ``HTTPException`` explicitly but let
``ProxyException`` (the rest-of-proxy convention raised by the auth
stack) fall through to the catch-all ``except Exception`` clause, which
collapsed every auth failure into a generic
``500 {"error":"MCP request failed","details":""}``. This stripped the
real status code (401/403) and any auth-bootstrap headers — most
importantly the ``WWW-Authenticate: Bearer authorization_uri=...`` hint
MCP OAuth 2.1 clients use to discover the upstream IdP. Most lethal in
delegate-auth mode, where the discovery hint is the only signal a client
gets to start its PKCE flow.
Add a ``_proxy_exception_to_http_exception`` helper that re-shapes the
ProxyException into an HTTPException, preserving status code, message
and headers. Wire it into both ``handle_streamable_http_mcp`` and
``handle_sse_mcp`` ahead of the catch-all clause. The unexpected-error
path (true 500s) is unchanged.
…h discovery When an MCP server is configured with delegate_auth_to_upstream=True, the proxy is a transparent network relay — the end-user authenticates directly with the upstream IdP via PKCE in their MCP client. The /.well-known/oauth-authorization-server response was unconditionally advertising LiteLLM's own /authorize and /token routes as the issuer, so spec-conforming MCP clients (OpenWebUI, Cursor, mcp-inspector) would attempt to drive the flow against endpoints that don't speak OAuth and either dead-end or mint tokens tied to the wrong issuer. Re-advertise the upstream values resolved at registration time (MCPServer.authorization_url / token_url / registration_url / scopes). Non- delegate servers continue to receive the existing LiteLLM-issuer response, since the proxy still mediates that flow.
…p PRM route Mirrors the AS-metadata fix to RFC 9728 protected-resource metadata so delegate-auth servers stay coherent end-to-end: - ``_build_oauth_protected_resource_response`` reuses ``_delegate_auth_authorization_server_response`` to advertise the upstream IdP issuer in ``authorization_servers``. Without this, a client following PRM still ends up at LiteLLM's AS metadata. - New route ``/.well-known/oauth-protected-resource/mcp/<name>/mcp``. Spec-conforming clients (mcp-inspector >= 0.21) construct PRM URLs by inserting the well-known segment between host and the streamable-http resource path. Without this route they 404 and fall back to root PRM. Verified end-to-end against mcp.atlassian.com via mcp-inspector: OAuth discovery resolves to Atlassian, DCR + PKCE complete, and 31 Atlassian tools list successfully under delegate auth.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR fixes two MCP OAuth discovery bugs: auth errors (401/403) were being collapsed to generic 500 responses in the ASGI handlers, and delegate-auth servers were advertising LiteLLM's own endpoints instead of the upstream IdP's metadata in both AS metadata and PRM responses. A new canonical
Confidence Score: 4/5Safe to merge for mainstream IdPs (Auth0, Atlassian, Okta SaaS); the two edge-case concerns in discoverable_endpoints.py do not affect the end-to-end test scenario but could surface with path-based issuers or strict RFC 9728 clients. The auth-error preservation fix in server.py is correct and thoroughly tested. The delegate-auth issuer override works well for the demonstrated Atlassian case. Two gaps remain in discoverable_endpoints.py: the canonical PRM route returns resource = /mcp/{name} while spec-conforming clients probing for the streamable-http resource at /mcp/{name}/mcp would expect the field to match; and the issuer is derived by stripping the path from the authorization endpoint, which silently produces the wrong issuer for path-based IdPs like Keycloak realms. Neither gap breaks the verified end-to-end scenario, but they could surface with different client or IdP configurations. discoverable_endpoints.py — the canonical PRM route's resource field and the issuer derivation logic warrant a second look
|
| Filename | Overview |
|---|---|
| litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py | Adds delegate-auth upstream-issuer override for AS metadata and PRM; adds canonical /.well-known/oauth-protected-resource/mcp/{name}/mcp route. The canonical route's resource field is derived with use_standard_pattern=True, which omits the trailing /mcp from the actual streamable-http endpoint URL, creating a potential RFC 9728 compliance gap. Issuer derivation from authorization_endpoint also drops path components, which breaks path-based IdPs like Keycloak realms. |
| litellm/proxy/_experimental/mcp_server/server.py | Adds _proxy_exception_to_http_exception helper and inserts ProxyException catch clauses in both handle_streamable_http_mcp and handle_sse_mcp. The fix correctly preserves status code, detail message, and auth headers. Defensive int(exc.code) coercion handles the None string edge case. Logic looks correct. |
| tests/test_litellm/proxy/_experimental/mcp_server/test_handler_proxy_exception_mapping.py | New test file covering ProxyException to HTTPException mapping with 6 targeted unit/integration tests; all mock-only, no real network calls. Good coverage of status preservation, header passthrough, non-numeric code defaulting, and graceful 500 fallback. |
| tests/test_litellm/proxy/_experimental/mcp_server/test_discoverable_endpoints.py | Adds 10 new tests for delegate-auth discovery paths; correctly uses global_mcp_server_manager.registry manipulation and ASGI transports (no real network calls). The canonical-route test does not assert the resource field, leaving the URL-mismatch issue undetected. |
Reviews (1): Last reviewed commit: "fix(mcp): point PRM at upstream issuer +..." | Re-trigger Greptile
0c7f860 to
bbd69b2
Compare
cff01fb to
01280db
Compare
| @router.get( | ||
| f"/.well-known/oauth-protected-resource{'' if get_server_root_path() == '/' else get_server_root_path()}/mcp/{{mcp_server_name}}/mcp" | ||
| ) | ||
| async def oauth_protected_resource_mcp_canonical( | ||
| request: Request, mcp_server_name: str | ||
| ): | ||
| return _build_oauth_protected_resource_response( | ||
| request=request, | ||
| mcp_server_name=mcp_server_name, | ||
| use_standard_pattern=True, | ||
| ) |
There was a problem hiding this comment.
Canonical PRM route returns wrong
resource identifier
The canonical route handles discovery for the streamable-http endpoint at /mcp/{server_name}/mcp, but _build_oauth_protected_resource_response with use_standard_pattern=True sets resource = {base}/mcp/{server_name} (missing the trailing /mcp). RFC 9728 §2 states that resource MUST equal the URL clients use to access the protected resource; a client probing /.well-known/oauth-protected-resource/mcp/atlassian1/mcp implicitly declares that its resource URL is {base}/mcp/atlassian1/mcp. Any spec-conforming client that validates the resource field against the URL it is connecting to will see a mismatch and reject the metadata. The test for this route (test_canonical_protected_resource_route_returns_200_for_streamable_path) does not assert on body["resource"], so this gap is undetected. Consider adding a third resource-URL mode (e.g. use_canonical_pattern) that appends the extra /mcp suffix, or pass resource_url_override to the builder.
| # Issuer is the upstream authorization server's origin per RFC 8414. | ||
| parsed = urlparse(authorization_endpoint) | ||
| if not parsed.scheme or not parsed.netloc: | ||
| return None | ||
| issuer = f"{parsed.scheme}://{parsed.netloc}" |
There was a problem hiding this comment.
Issuer derivation strips path for path-based issuers
issuer is constructed as scheme://netloc, dropping any path component of authorization_endpoint. For most popular IdPs (Auth0, Okta SaaS, Atlassian) the issuer really is the origin, so the derivation is correct. However, for multi-tenant or self-hosted deployments — Keycloak realms (https://keycloak.example.com/realms/myrealm/protocol/openid-connect/auth), Azure AD B2C tenants, or any RFC 8414 issuer that includes a path — the emitted issuer would be https://keycloak.example.com instead of https://keycloak.example.com/realms/myrealm. Clients that use this issuer to locate the upstream discovery document (via {issuer}/.well-known/oauth-authorization-server) would then 404. Storing the issuer explicitly at registration time (rather than re-deriving it each request) would be safer; alternatively, verify the expected issuer against the upstream discovery document response.
Summary
WWW-AuthenticateonProxyException. The MCP ASGI handlers caught all exceptions and collapsed them to500 {"error":"MCP request failed"}, so a 401 from the auth stack lost its status and discovery headers.authorization_endpoint+token_endpoint+registration_endpoint) for OAuth servers configured withdelegate_auth_to_upstream=true. Previously the proxy returned its own URLs and the client followed them right back to LiteLLM, defeating the delegate flow.authorization_serversadvertises the upstream issuer; add the canonical/.well-known/oauth-protected-resource/mcp/<name>/mcproute that spec-conforming clients (mcp-inspector ≥ 0.21) actually probe.Stacked on top of #28543. Once that merges, this PR auto-retargets to
litellm_internal_staging.Test plan
uv run pytest tests/test_litellm/proxy/_experimental/mcp_server/test_handler_proxy_exception_mapping.py— Bug G regression suite (6 tests covering 401/403 preservation, header passthrough, non-numericcodedefaulting to 500, empty-headers normalization, and graceful 500 fallback for arbitrary exceptions)uv run pytest tests/test_litellm/proxy/_experimental/mcp_server/test_discoverable_endpoints.py -k "delegate_auth or protected_resource"— 10 tests covering the helper, AS metadata, PRM, and the new canonical routemcp.atlassian.comviamcp-inspector: 401 +WWW-Authenticatefrom/mcp/atlassian1/mcptriggers discovery → PRM resolves to Atlassian → AS metadata resolves to Atlassian → DCR succeeds → user consents at Atlassian → token exchange succeeds → 31 Atlassian tools list successfully