feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery by Ar-maan05 · Pull Request #30273 · BerriAI/litellm

Ar-maan05 · 2026-06-12T07:01:28Z

Relevant issues

Pre-Submission checklist

I have added meaningful tests
My PR passes all unit tests on the affected directory
My PR's scope is as isolated as possible; it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🆕 New Feature

Changes

Claude Code 2.1.126+ added gateway model discovery: when ANTHROPIC_BASE_URL points at a gateway, it queries {base_url}/v1/models at startup and populates the /model picker with the discovered models. That discovery only parses the Anthropic-native Models API shape, so against litellm (which returns OpenAI's {id, object, created, owned_by} list) Claude Code finds nothing and the picker stays empty, even though /v1/messages already works.

This serves the Anthropic-native shape from the same /v1/models route via content negotiation on the anthropic-version header. Claude Code already sends that header for /v1/messages, so when it is present the endpoint returns the Anthropic Models envelope (type / display_name / created_at per entry, plus top-level has_more / first_id / last_id); otherwise the response is byte-for-byte the existing OpenAI shape, so aider and other OpenAI-compatible clients are unaffected. A separate endpoint was not used because Claude Code discovers at the gateway root's /v1/models, and a global config flag would break the OpenAI clients that share the route.

The full model list is returned and Claude Code applies its own claude/anthropic id-prefix filter client-side, so no server-side filtering is imposed (a model aliased to claude-* that points at any backend still shows up, which is the point for gateway users). display_name falls back to the model id, the stable label a gateway can offer for arbitrary upstream models, and created_at is the ISO 8601 form of the same timestamp the OpenAI shape already returns. Hidden/unhealthy models are filtered before formatting in both the normal and scope=expand branches, exactly as for the OpenAI shape.

Tests

In tests/test_litellm/llms/anthropic/test_anthropic_common_utils.py, two unit tests pin create_anthropic_model_list_response: the full envelope (per-entry type/display_name/created_at with a Z-suffixed ISO timestamp, top-level has_more/first_id/last_id, no object) and the empty-list case (first_id/last_id null). In tests/test_litellm/proxy/proxy_server/test_routes_models.py, a route test drives GET /v1/models (and /models) with the anthropic-version header and asserts the negotiated Anthropic shape, while the existing happy-path test pins that the default response stays OpenAI. Mutation-checked by hand: forcing the negotiation off, emitting object instead of type, and dropping the Z normalization each fail a test. The affected suites are green.

Screenshots / Proof of Fix

Live proxy on localhost:4000 with two claude-* deployments and one gpt-4o.

python litellm/proxy/proxy_cli.py --config litellm/proxy/dev_config_27180.yaml --port 4000

Default request (no header): OpenAI format, unchanged.

curl -s http://127.0.0.1:4000/v1/models -H "x-api-key: sk-1234" | python3 -m json.tool

{
    "data": [
        {"id": "claude-opus-4-6", "object": "model", "created": 1677610602, "owned_by": "openai"},
        {"id": "claude-haiku-4-5", "object": "model", "created": 1677610602, "owned_by": "openai"},
        {"id": "gpt-4o", "object": "model", "created": 1677610602, "owned_by": "openai"}
    ],
    "object": "list"
}

With the anthropic-version header (what Claude Code sends): Anthropic-native format.

curl -s http://127.0.0.1:4000/v1/models -H "x-api-key: sk-1234" -H "anthropic-version: 2023-06-01" | python3 -m json.tool

{
    "data": [
        {
            "type": "model",
            "id": "claude-opus-4-6",
            "display_name": "claude-opus-4-6",
            "created_at": "2023-02-28T18:56:42Z"
        },
        {
            "type": "model",
            "id": "claude-haiku-4-5",
            "display_name": "claude-haiku-4-5",
            "created_at": "2023-02-28T18:56:42Z"
        },
        {
            "type": "model",
            "id": "gpt-4o",
            "display_name": "gpt-4o",
            "created_at": "2023-02-28T18:56:42Z"
        }
    ],
    "has_more": false,
    "first_id": "claude-opus-4-6",
    "last_id": "gpt-4o"
}

The shape matches the Anthropic Models API. All models are returned and Claude Code keeps the claude/anthropic-prefixed ones for the picker; clients that do not send anthropic-version get the unchanged OpenAI list.

…y discovery

greptile-apps · 2026-06-12T07:04:50Z

Greptile Summary

This PR adds Anthropic-native model list content negotiation to the existing /v1/models endpoint: when the anthropic-version request header is present (as Claude Code sends for all Anthropic API calls), the endpoint returns the Anthropic Models API envelope (type/display_name/created_at per entry, plus has_more/first_id/last_id); all other callers continue to receive the unchanged OpenAI-compatible shape.

create_anthropic_model_list_response is correctly placed in litellm/llms/anthropic/common_utils.py, satisfying the rule against provider-specific code outside of llms/; it reuses the existing DEFAULT_MODEL_CREATED_AT_TIME constant and produces a Z-suffixed ISO 8601 timestamp consistent with the Anthropic Models API.
model_list in proxy_server.py gains a request: Request = None injection and a single wants_anthropic_format flag checked in both the scope=expand and normal branches — the existing OpenAI response path is untouched when the header is absent, so no backwards-incompatible change is introduced.
Tests cover the full envelope shape, the empty-list case, and both affected HTTP routes, all fully mocked with no real network calls.

Confidence Score: 5/5

Safe to merge — the change is additive and strictly backward-compatible; callers without the anthropic-version header receive a byte-for-byte identical response to today.

The implementation is a clean, additive content-negotiation layer. Provider-specific formatting logic is correctly placed in the llms/anthropic/ module rather than shared proxy utilities. The request: Request = None default guards all non-HTTP call sites. Both affected branches in model_list handle the new flag consistently, tests are fully mocked, and the existing OpenAI-format path is untouched.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/llms/anthropic/common_utils.py	Adds `create_anthropic_model_list_response` helper that builds the Anthropic-native `/v1/models` envelope; uses the existing `DEFAULT_MODEL_CREATED_AT_TIME` constant and ISO 8601 `Z`-suffix formatting. No fastapi imports, correctly placed in the provider-specific module.
litellm/proxy/proxy_server.py	Injects `request: Request = None` into `model_list` and performs content negotiation on the `anthropic-version` header in both the `scope=expand` and normal branches. Existing OpenAI-format response is byte-for-byte unchanged when the header is absent.
tests/test_litellm/llms/anthropic/test_anthropic_common_utils.py	Adds two unit tests for `create_anthropic_model_list_response`: full envelope shape (type/display_name/created_at with Z suffix, has_more/first_id/last_id, no `object`) and empty-list case. All mocked, no network calls.
tests/test_litellm/proxy/proxy_server/test_routes_models.py	Adds `test_get_models_anthropic_format_when_header_present` that drives both `/v1/models` and `/models` with the `anthropic-version` header and pins the Anthropic-native shape; existing happy-path test is untouched, confirming OpenAI format is unchanged without the header.

_{Reviews (3): Last reviewed commit: "fix(proxy): make model_list request para..." | Re-trigger Greptile}

codecov · 2026-06-12T07:06:58Z

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/proxy/proxy_server.py	83.33%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

…pic/common_utils

Ar-maan05 · 2026-06-12T07:12:29Z

@greptile-apps

Ar-maan05 · 2026-06-16T17:35:04Z

Hi, my PR is passing CI and has a 5/5 score from greptile. Please let me know if any changes are required.

Sameerlite · 2026-06-17T03:57:42Z

@greptileai

Ar-maan05 · 2026-06-17T06:57:58Z

Hi @Sameerlite, please let me know if any changes are needed. Always a pleasure to contribute to this repo.

…e gateway discovery (#30273)" This reverts commit 4e31885.

@Sameerlite

* fix(proxy): allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes (#30089) * fix(proxy): allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes Add the realtime WebRTC HTTP sub-routes (/realtime/client_secrets, /realtime/calls and their /v1 + /openai/v1 variants) to LiteLLMRoutes.openai_routes so is_llm_api_route() classifies them as LLM API routes. Without this, non-admin virtual keys received 401 'Only proxy admin can be used to generate, delete, update info for new keys/users/teams' when calling these endpoints. Fixes #29923 * fix(proxy): validate session.model for realtime routes in model-access check The GA Realtime WebRTC HTTP routes resolve the effective model from the nested session.model (falling back to the top-level model), but the auth layer's get_model_from_request() only extracted the top-level model. A model-restricted virtual key could therefore place a disallowed model in session.model, leave the top-level model unset, and skip can_key_call_model() entirely - obtaining an ephemeral token for a model it is not allowed to use. Extract session.model for the realtime client_secrets/calls routes so the model-access check runs against the model the request will actually use. Legitimate callers are unaffected; their permitted model still validates. Relates to #29923 * fix(proxy): classify realtime transcription_sessions routes as LLM API routes Add the GA Realtime WebRTC transcription_sessions HTTP routes to openai_routes so is_llm_api_route() returns True for them, matching the client_secrets and calls routes already fixed. These endpoints are registered with user_api_key_auth in realtime_endpoints/endpoints.py, so without this a non-admin virtual key calling POST /v1/realtime/transcription_sessions would hit the admin-only 401 branch. Extends the regression test parametrization accordingly. --------- Co-authored-by: habonlaci <4699494+habonlaci@users.noreply.github.com> * feat(proxy): surface max_input_tokens/max_output_tokens on /v1/models (#30272) * feat(proxy): surface max_input_tokens/max_output_tokens on /v1/models * fix(proxy): degrade /v1/models gracefully when model-group lookup fails --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix: sort tiered token-cost thresholds numerically (#30375) * fix: sort tiered token-cost thresholds numerically _get_token_base_cost iterated input_cost_per_token_above_<N>_tokens keys with a lexicographic sort, so for tiers whose thresholds have different digit lengths (e.g. 90k vs 128k) a request crossing both was billed at the lower tier that sorted first. Sort by the parsed numeric threshold instead, so the highest tier the request actually crosses is applied. * refactor: reuse _parse_above_token_threshold for inline threshold parse --------- Co-authored-by: Eric (GabiDevFamily) <271972409+santino18727-debug@users.noreply.github.com> * fix(openai): preserve cache_control for openai-compatible custom endpoints (#30387) * fix(openai): preserve cache_control for openai-compatible custom endpoints * fix(openai): use parsed hostname to detect real OpenAI for cache_control preservation * fix(proxy): drain all daily-spend batches per flush cycle (#30281) (#30505) * fix(types): prevent internal parallel_request_limiter fields from leaking to upstream providers (#30545) * fix(types): add internal parallel_request_limiter fields to all_litellm_params to prevent forwarding to upstream providers * test(types): add regression test for internal rate-limit fields in all_litellm_params * fix(init): add bool type annotation to suppress_debug_info (#30531) Module-level `suppress_debug_info = False` had no annotation, so strict type checkers (e.g. ty) infer it as `Literal[False]`. Reassigning it to `True` (as done in proxy_server.py and router.py) then fails with an invalid-assignment error. Annotate it as `bool` to match every other flag in this module. * fix: coalesce null aggregates in update_metrics for no-spend keys (#29945) * feat(team_endpoints): add query parameter `key_limit` to `/team/info` endpoint (#30006) * feat(team_endpoints): Add query parameter key_limit to /team/info * feat(team_endpoints): update schema.d.ts to include the new query parameter * feat(team_endpoints): add tests for limitting key count in /team/info response * feat(team_endpoints): Apply suggestions from greptile * Set greater-than constraint on key-limit * Fix type * fix(router): release aiohttp connection when stream iteration ends abnormally (#30271) * fix(router): release aiohttp connection when stream iteration ends abnormally A streaming response that terminates with a mid-stream read timeout, a task cancellation (client disconnect), or GeneratorExit never closed the underlying aiohttp ClientResponse. aiohttp only auto-releases the connector slot at body EOF, so each abnormally terminated stream permanently leaked one slot from the shared TCPConnector pool. During a backend traffic spike the pool drains; once exhausted every subsequent request to that host waits for a slot, times out and surfaces as a 408, indefinitely, even after the backend recovers. Only a proxy restart cleared the in-memory sessions, which matched the reported symptom of a router stuck returning 408 for a healthy vLLM backend. Close the response in a finally clause when iteration ends. On a fully read response the connection was already released at EOF and close() is a no-op, so keep-alive reuse for normal requests is unchanged. Fixes #30192 * test(aiohttp): cover GeneratorExit path with a mock instead of a live socket The previous slot-release test started a real aiohttp TCP server, which can flake in offline CI and does not exercise this fix's code path directly. Replace it with a dependency-injected mock that closes the stream generator (GeneratorExit) and asserts the response is closed, covering the third abnormal-exit path the finally block handles * feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery (#30273) * feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery * refactor(proxy): move Anthropic model-list formatter into llms/anthropic/common_utils * fix(proxy): make model_list request param optional for direct callers * feat(dashscope): add Responses API support (#30286) * feat(dashscope): add Responses API support DashScope's OpenAI-compatible endpoint serves /responses, so register a DashScopeResponsesAPIConfig that routes dashscope/* responses calls to {api_base}/responses without rewriting the upstream model id, instead of falling back to the chat-completions -> responses emulation pipeline. Closes #29780 * feat(dashscope): mark responses API as not supporting native websocket Matches the hosted_vllm/perplexity/openrouter responses configs, which all override supports_native_websocket() to False since the OpenAI-compatible endpoint has no native wss:// responses transport. --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix(spend-logs): preserve error_message on ProxyException failures (#30381) * fix(spend-logs): preserve error_message on ProxyException failures `StandardLoggingPayloadSetup.get_error_information` used `str(original_exception)` to populate the human-readable error message stored in `spend_logs.metadata.error_information.error_message`. `ProxyException` (litellm/proxy/_types.py:3453) sets `self.message` in its constructor but does NOT call `super().__init__(message)` and does NOT define `__str__`. As a result, `str(ProxyException(...))` returns the empty string, and every auth/budget/quota rejection was landing in spend_logs with `error_message=""` despite a fully populated traceback. Operator impact: dashboard "LLM Failure" rows became untriageable — the only way to tell a 401 from a 429 was to manually unpack the traceback JSON via psql. Burst failure patterns (e.g. a UI session polling with a stale token) produced 20-30 indistinguishable `error_code=401` rows per second. Fix: prefer the `.message` attribute (set by ProxyException and every litellm.exceptions.* class) over `str(exc)`. The `str(exc)` fallback is retained for non-litellm exception types, preserving prior behavior. Test plan: - 2 new unit tests in tests/test_litellm/litellm_core_utils/ test_litellm_logging.py: * test_get_error_information_prefers_message_attribute_over_str * test_get_error_information_falls_back_to_str_when_no_message_attr - Existing test_get_error_information_error_code_priority still passes - End-to-end verified: bad-key 401 now stores full "Authentication Error, Invalid proxy server token passed..." message in spend_logs.metadata.error_information.error_message * fix(spend-logs): preserve explicit empty .message + drop dead reference Greptile P2 on #30381. The truthiness check `if message_attr:` silently skipped an explicit empty-string `.message` and fell through to `str(original_exception)`. For ProxyException-shaped objects both produce empty, so the bug was latent; for other exception types it would inject a different string into error_information.error_message and corrupt the signal. Use `is not None` so an empty string survives verbatim. Also drop the stale `See e2e/cases/11.` comment reference — that path does not exist anywhere in the repo and confuses future readers. Regression test added: an exception with `.message=""` and a non-empty `super().__init__()` arg must yield error_message == "". * ci: retrigger workflows after base branch change to litellm_internal_staging * fix(anthropic): strip LiteLLM-injected total_tokens from /v1/messages response (#30382) * fix(anthropic): strip LiteLLM-injected total_tokens from /v1/messages response The non-streaming /v1/messages response carries a LiteLLM-injected usage.total_tokens = input_tokens + output_tokens that is not part of the Anthropic API spec. This caused three problems: 1. Shape divergence with streaming on the same endpoint. message_delta.usage in the SSE path never carries total_tokens. Clients parsing both paths get two different schemas from one endpoint. 2. Shape divergence with upstream. Direct calls to https://api.anthropic.com/v1/messages return no total_tokens field, so clients using the official Anthropic SDK couldn't rely on it, and clients that did rely on the LiteLLM-injected one broke when bypassing the proxy. 3. Numerical misuse. total = input + output undercounts when cache_read_input_tokens and cache_creation_input_tokens are non-zero, because cache tokens are reported in their own fields. A 100k-token cached prompt with 1 non-cache input token + 200 output tokens reports total_tokens = 201, off by ~99.8% from any reasonable definition of "total." Fix: add _strip_total_tokens_from_anthropic_response in litellm/proxy/anthropic_endpoints/endpoints.py and invoke it in the success path of anthropic_response right before returning. Only mutates dict-shaped responses; streaming (which already lacks the field) is left untouched. spend_logs / Prometheus continue to compute total_tokens internally for billing — this fix only strips the field from the wire response. Scope: only the Anthropic passthrough endpoint /v1/messages. The OpenAI-shape /v1/chat/completions is unaffected. * fix(anthropic): gate total_tokens strip behind flag + handle Pydantic .usage Two P1 greptile threads on #30382: P1 — **Backwards-incompatible removal without a feature flag** Stripping `usage.total_tokens` unconditionally breaks any client currently reading the LiteLLM-shaped non-streaming /v1/messages response. Per the codebase's policy (mirrors #30418), gate behind a new flag. - `litellm.strip_anthropic_total_tokens: bool = False` (default — backward-compat: clients keep seeing total_tokens). - Env override: `LITELLM_STRIP_ANTHROPIC_TOTAL_TOKENS=true`. - Docstring: planned to flip to True in a future major release; opt in early. P1 — **Silent no-op if `result` is a Pydantic model** `base_process_llm_request` may return a Pydantic-style object whose `.usage` is a plain dict (the most common shape — e.g. objects wrapping raw upstream JSON). The original `isinstance(response, dict)` guard skipped strip on those, so `total_tokens` would still hit the wire. Helper now also reads `getattr(response, "usage", None)` and strips when that's a dict. Strongly-typed Pydantic `Usage` sub-models with required `total_tokens` fields are still skipped — those impose type constraints the helper doesn't try to subvert. Tests: - `test_strips_total_tokens_on_pydantic_model_with_dict_usage` - `test_flag_defaults_off` 8/8 pass locally. * fix(anthropic): drop env var for strip flag (docs CI) Mirrors #30418's pattern (`expose_router_debug_in_errors: bool = True`, no `os.getenv`). The `LITELLM_STRIP_ANTHROPIC_TOTAL_TOKENS` env var introduced in the prior commit was flagged by `tests/documentation_tests/test_env_keys.py` because the documentation file `docs/my-website/docs/proxy/config_settings.md` lives in `BerriAI/litellm-docs` (separate repo) and registering a new env key requires a parallel docs PR — a friction we avoid here by exposing the flag only as a Python attribute + `litellm_settings` config key, both of which load through the existing proxy config plumbing without needing the env-var registry to be updated. No semantic change: default still False, behavior identical when set via `litellm.strip_anthropic_total_tokens = True` or `litellm_settings.strip_anthropic_total_tokens: true` in config.yaml. Verified locally: env scan no longer surfaces the key; 8/8 tests pass. * ci: retrigger workflows after base branch change to litellm_internal_staging * fix(pricing): correct swapped input/output token costs for command-r7b-12-2024 (#30413) * fix(pricing): correct swapped input/output token costs for command-r7b-12-2024 * test: resolve model prices JSON relative to test file for pip installs * fix(exception-mapping): map Gemini upstream-error body code 429 to RateLimitError (#30417) * fix(exception-mapping): map Gemini upstream-error body code 429 to RateLimitError Some Gemini-compatible gateways (e.g. new-api) wrap a 429 rate-limit signal from upstream inside an HTTP 500/503 envelope, with the real code only surfaced in the JSON body: {"error":{"message":"...high demand...","type":"upstream_error", "param":"","code":429}} Previously LiteLLM only looked at the HTTP status and mapped this to InternalServerError, which Router treats as non-retryable for many configs — so users got hard 500s instead of fallback/retry. Now the Gemini/Vertex exception mapper parses error.code from the body and routes code 429 to RateLimitError before falling through to the HTTP-status branches. Other body codes fall through unchanged. Tests cover: - new-api gateway's `code:429` payload now maps to RateLimitError - Genuine 500-body responses stay InternalServerError - Non-JSON body strings fall through to status-code mapping unchanged * fix(exception-mapping): scope body-code 429 promotion to 5xx envelopes Addresses greptile P1/P2 + @Sameerlite's review on #30417. The new elif branch was firing for any HTTP status, so a gateway response of HTTP 400 with body {"error":{"code":429,...}} would be incorrectly promoted to RateLimitError (retryable) instead of falling through to BadRequestError. Same trap for 401 -> AuthenticationError. Scoped the body-code 429 check to `500 <= status_code < 600` — covers 500/502/503/504 (gateways wrapping upstream 429 in any 5xx envelope) without inviting the 4xx misclassification. Tests: parametrized table now covers 5xx (500/502/503), 4xx (400/401), and the existing fall-through cases, asserting each maps to the exception type that matches the HTTP status code. 50/50 pass locally. * ci: retrigger workflows after base branch change to litellm_internal_staging * feat(router): add expose_router_debug_in_errors flag (default True) to redact internal model_group/fallback names (#30418) * feat(router)!: redact internal model_group/fallback names from exception messages The Router was unconditionally appending internal config names onto exception.message: - "Received Model Group=..." - "Available Model Group Fallbacks=..." - "No fallback model group found... Fallbacks={...}" - "context_window_fallbacks={...}" - Deployment-timeout messages including model_group - Fallback failure detail listing fallback chain ProxyException forwards .message verbatim to clients, so gateways were leaking their model_name / fallback wiring in every failed call. Fix: gate all five mutation sites on a new `litellm.expose_router_debug_in_errors` flag (default False). Set to True to restore upstream debug behavior for local debugging. Why: matches the redaction posture this codebase already has for upstream model identifiers (cf. _litellm_returned_model_name) and removes the last common error-path leak of internal model_group names. Breaking change marker (!): if anything parses "Received Model Group=" out of client error messages, flip the flag on or migrate to the x-litellm-* response headers instead. Tests: 7 cases covering each of the 5 redaction sites + the flag-on inverse path, plus a "default off" sanity check. * test(router): cover sites 1 + 3 of expose_router_debug_in_errors gate Addresses Greptile / codecov feedback on #30418: patch coverage was 55.6% with 4 lines uncovered in litellm/router.py. The existing tests exercised sites 2 (ContextWindowExceededError), 4 (no-fallback-found), and 5 (Received Model Group) — both default and flag-on. Sites 1 and 3 were declared in the PR description as covered by "site 5 also fires" but the gate body lines for each (the `e.message +=` inside the `if litellm.expose_router_debug_in_errors:` branch) only execute when the flag is on AND the specific exception path is taken, which neither existing test triggered. Added 4 new tests (default + flag-on × 2 sites): - test_default_does_not_leak_deployment_timeout_debug - test_flag_on_leaks_deployment_timeout_debug - test_default_does_not_leak_content_policy_fallback_hint - test_flag_on_leaks_content_policy_fallback_hint Trigger details: - Site 1 (litellm.Timeout in _acompletion) is reached via the Router-supported `mock_timeout=True` + `timeout=0.001` kwargs on `acompletion(...)`. Cannot embed a Timeout instance in model_list because Router.__init__ deep-copies it and Timeout.__reduce__ does not preserve the required positional args. - Site 3 (ContentPolicyViolationError without content_policy_fallbacks set, in async_function_with_fallbacks_common_utils) is reached by passing a `mock_response=litellm.ContentPolicyViolationError(...)` instance via the call-site kwarg — same deepcopy-avoidance reason. 11/11 tests pass locally. Patch coverage on litellm/router.py for this PR's diff should now be 100%. * chore(router): flip expose_router_debug_in_errors default to True Addresses @Sameerlite's review on #30418 — maintain backward compat on the wire. Redact becomes opt-in via setting the flag to False; the historical behavior (leak internal model_group / fallback wiring through exception messages) is preserved as the default. - litellm/__init__.py: default flipped to True, docstring rewritten with deprecation note pointing at a future flip to False (redact by default) in a major release. - tests/test_litellm/test_router_exception_redaction.py: fixture resets to True (was False); the "off" tests now explicitly set False; the "default_leaks_*" tests rely on the fixture default. test_flag_defaults_off -> test_flag_defaults_on. - No router.py change needed; the gate keys off the same flag, only the default changes. - PR title no longer needs the breaking-change `!` marker — no client sees a behavior change at default settings. 11/11 pass locally. * ci: retrigger workflows after base branch change to litellm_internal_staging * feat(guardrails): integrate Repelloai Argus guardrail (#30465) * feat(guardrails): add RepelloAI Argus guardrail integration (#1) * feat(guardrails): add RepelloAI Argus guardrail integration Add a new guardrail hook backed by RepelloAI Argus, with dashboard-managed asset policies enforced via an asset_id and X-API-Key auth. * fix(guardrails): harden RepelloAI Argus guardrail - scan streaming responses on output (was bypassing the guardrail) - log blocked verdicts as guardrail_intervened instead of success - treat auth/config errors (401/403/404/422) as misconfiguration that always blocks, not a fail-open-able unreachable error - default unreachable_fallback to fail_closed and read it directly; block on unknown/malformed verdicts so an API change can't silently disable enforcement - type unreachable_fallback as a Literal, drop the duplicate config model, expose unreachable_fallback in the config schema, and stop leaking the raw provider response / exception strings to the client * fix(guardrails): address RepelloAI Argus review feedback - support ARGUS_API_KEY (with REPELLOAI_API_KEY fallback) - make asset_id required in the config model - normalize unreachable_fallback so only fail_open opens; block on 400 misconfig - correct the shared unreachable_fallback field description * docs(guardrails): add RepelloAI Argus docs page and dashboard listing - add docs page covering config, env vars, modes, verdicts, failure semantics - list RepelloAI Argus in the Guardrail Garden with provider/logo mappings - add a regression test for the provider logo and display-name resolution * fix(guardrails): keep RepelloAI asset_id optional in config model A required asset_id leaked onto the shared LitellmParams (which inherits RepelloAIGuardrailConfigModel), breaking validation for every other guardrail. Keep it optional like sibling models; the guardrail __init__ still raises when asset_id is missing, which is the real enforcement. * Add comment for last user turn scanning * feat(guardrails): harden repelloai scanning * feat(guardrails): expand repelloai scanning to include tool definitions Add extraction of tool definitions and tool call arguments to the RepelloAI guardrail scanning. Improves detection coverage by including function schemas and parameters in the prompt sent to the guardrail service. Also captures detailed error responses in logs and adds guardrail header to streaming responses. * refactor(guardrails): fix and harden repelloai schema text extraction - Fix duplicate text in _iter_schema_text: previously all dict values were re-queued onto the stack even after scalar/list keys were already extracted explicitly, causing names/descriptions to appear twice in the scanned prompt - Extract schema key frozensets to module-level constants so they are not reconstructed on every call - Change _iter_schema_text from @classmethod to @staticmethod (cls unused) - Narrow _call_analyze stage param from str to Literal["prompt", "response"] - Add HttpxResponse type annotation to _raise_for_config_error - Add LLMResponseTypes annotation to async_post_call_success_hook response param * fix(guardrails): resolve pyright type errors in repelloai guardrail - Narrow async_handler.post return from Response|None to Response with explicit None guard before calling raise_for_status/json - Fix list comprehension returning str|None by switching to explicit loop with isinstance guard so pyright tracks the narrowing - Cast model_dump() result to Dict since hasattr does not narrow object type in pyright * fix(guardrails/repello): include Responses API instructions field in prompt scan The /v1/responses top-level `instructions` field was not included in _extract_prompt_text, allowing a caller to bypass guardrail policy checks by putting blocked content in `instructions` while keeping `input` benign. * feat: add api_key to config model and read prompt from data dict * fix(guardrails/repello): plug input_text and tool-call response bypass gaps Responses API input content parts with type 'input_text' were silently dropped by build_inspection_messages (which only handles type='text'), allowing callers to send blocked content via that path without triggering the pre-call scan. Fix: add _extract_input_text_parts to RepelloAIGuardrail and call it when walking the Responses API input messages. Post-call scanning skipped responses whose choices contained only tool_calls or function_call (message.content=None), letting models put blocked output in function arguments undetected. Fix: _extract_chat_completion_text now calls _extract_tool_call_args_from_message on each choice message. Also replace typing.Dict/List with builtin dict/list to clear TID251 strict ruff violations introduced by this file. * fix(guardrails/repello): scan Responses API function_call output arguments Output items with type 'function_call' in a /v1/responses response were skipped by _extract_responses_api_text; only 'message' items were walked. A model could return blocked content in function_call.arguments undetected. Now extract arguments from function_call output items before scanning. * fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients (#30486) * fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients When an Anthropic server-side tool (web_search, id `srvtoolu_...`) is used, its result is carried in `provider_specific_fields.web_search_results` — PRs #17746 / #17798 restore it for callers that round-trip provider_specific_fields. A generic OpenAI client that does NOT preserve provider_specific_fields (e.g. Open WebUI talking to a Vertex/Anthropic model over /chat/completions) drops it on replay and instead sends back an assistant `tool_call` + a `tool` message both keyed to the `srvtoolu_` id. The transform then produced a bare `server_tool_use` (with no following *_tool_result) plus a user `tool_result` for the same id — both invalid, so the next turn 400s: messages.N.content.0: unexpected `tool_use_id` found in `tool_result` blocks: srvtoolu_... Each `tool_result` block must have a corresponding `tool_use` block in the previous message. This is the commonly-reported vertex_ai symptom where Gemini works but Claude 400s on the 2nd turn of a web-search chat. Fix (litellm/litellm_core_utils/prompt_templates/factory.py): - convert_to_anthropic_tool_invoke: only emit a server_tool_use when its matching *_tool_result is available to pair with it; otherwise skip it (a bare server_tool_use is itself rejected). - anthropic_messages_pt: drop a replayed `tool`/`function` message whose tool_call_id starts with `srvtoolu_` (a server-executed tool produces no client result; a user tool_result for it is invalid). The existing reconstruction path (provider_specific_fields present, e.g. the litellm SDK) is unchanged, as is regular client tool_use/tool_result. Tests (tests/llm_translation/test_prompt_factory.py): - update test_convert_to_anthropic_tool_invoke_server_tool -> test_convert_to_anthropic_tool_invoke_server_tool_without_result_is_dropped - add test_anthropic_messages_pt_generic_client_drops_orphan_server_tool Follow-up to #17746 / #17798; addresses the generic-client (no provider_specific_fields) case of #17737. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(anthropic): cover the srvtoolu_ round-trip fix in the test_litellm unit suite The regression tests added in tests/llm_translation/test_prompt_factory.py aren't run by the coverage CI job (it runs tests/test_litellm), so the new factory.py branches showed as uncovered (codecov patch coverage). Add equivalent focused tests in the unit suite so both new branches are exercised there: - convert_to_anthropic_tool_invoke drops a srvtoolu_ server_tool_use when no matching *_tool_result is available. - anthropic_messages_pt drops the orphaned srvtoolu_ tool message a generic OpenAI client replays. Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(anthropic): cover the server_tool_use + result valid-pair path in unit suite Covers the remaining patch-coverage lines codecov flagged: convert_to_anthropic_tool_invoke emitting server_tool_use followed by its web_search_tool_result when the matching result is present (the litellm-SDK round-trip path). Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * style(anthropic): flatten srvtoolu_ tool-message guard to a negated if Addresses the Greptile style nit: replace the if-pass/else with a single negated `if not (...)` guard around the tool_result append. Behavior unchanged. Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(proxy): require premium only when enabling premium metadata fields (#30285) (#30506) Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix(perplexity): stop double-billing reasoning tokens in manual cost fallback (#30488) * fix(perplexity): stop double-billing reasoning tokens in manual cost fallback When perplexity_cost_per_token cannot use the API-provided usage.cost.total_cost short-circuit and falls back to manual calculation, it multiplies the full usage.completion_tokens by output_cost_per_token and then adds reasoning_tokens * output_cost_per_reasoning_token on top. Per the OpenAI/Perplexity usage convention codified for the central path in PR #18607, completion_tokens already INCLUDES reasoning_tokens, so the manual fallback double-bills reasoning at both the output and reasoning rate. Concrete impact on perplexity/sonar-deep-research (input 2e-6, output 8e-6, reasoning 3e-6): for the exact usage shape exercised by the live response fixture in tests/llm_translation/test_perplexity_reasoning.py (prompt_tokens=9, completion_tokens=20, reasoning_tokens=15) the current code charges 0.000223 vs the convention-correct 0.000103, a 2.165x overcharge. The bug is reachable whenever Perplexity omits the cost object (streaming chunks, fixture-driven paths, older API versions). Subtracts reasoning_tokens (clamped at zero) from completion_tokens before applying the output rate, mirroring how dashscope/cost_calculator.py and the central generic_cost_per_token already handle it. Preserves the existing fallback behaviour when output_cost_per_reasoning_token is unset (all completion_tokens stay at the output rate). Existing tests in tests/test_litellm/llms/perplexity/test_perplexity_cost_calculator.py asserted the buggy math and are updated to the convention-correct math. Adds a focused regression test using the exact usage shape from the live response fixture so this class of bug cannot be silently reintroduced. * style(perplexity): drop redundant type annotation on else branch to satisfy mypy mypy [no-redef] flagged 'completion_cost' as declared in both if and else arms; keeping the annotation only on the first declaration matches existing patterns in this file. * fix(perplexity): update integration test expected costs for non-double-billed math Three tests in test_perplexity_integration.py asserted the old buggy expectation that reasoning_tokens are billed in addition to the full completion_tokens count. After the fix in cost_per_token, reasoning_tokens are billed at the reasoning rate and the remaining (completion_tokens - reasoning_tokens) at the standard output rate, matching OpenAI/Perplexity convention (PR #18607). Updates: test_end_to_end_cost_calculation_with_transformation, test_main_cost_calculator_integration, test_high_volume_cost_calculation. The high-volume sanity threshold drops to 0.25 to reflect the corrected total. * fix(ui): use dynamic proxy base URL in MCP usage examples (#30487) Replace hardcoded http://localhost:4000 with getProxyBaseUrl() in the MCP server usage example and copy-to-clipboard snippet so the generated configuration works for non-local deployments. Fixes #30466 * feat: add missing UK PII entity types to Presidio guardrail (#30537) * feat: add missing UK PII entity types to Presidio guardrail Add UK_PASSPORT, UK_POSTCODE, and UK_VEHICLE_REGISTRATION to PiiEntityType enum and PII_ENTITY_CATEGORIES_MAP. These entity types are supported by Microsoft Presidio but were missing from litellm's type definitions, preventing users from configuring UK-specific PII detection. * test: remove fragile hardcoded entity count test Remove test_uk_category_entity_count which hardcodes len() == 5. The test_uk_entities_match_presidio_recognizers test already verifies exact set equality, making the count test redundant and fragile to future Presidio additions. * style: apply Black formatting to match CI requirements * fix: route volcengine (Doubao) tiered-pricing models to the tiered cost handler (#30357) Volcengine (Doubao) models define `tiered_pricing` but no flat per-token cost, so cost_per_token fell through to generic_cost_per_token (which only reads flat costs) and tracked them at $0 Route custom_llm_provider == "volcengine" to the shared tiered-pricing handler in litellm/llms/dashscope/cost_calculator.py, which already computes graduated tier costs. Make that handler provider-agnostic by adding a custom_llm_provider argument (default "dashscope" preserves existing behavior) so get_model_info resolves the correct model map entry Fixes #30346 * feat(mcp): make MCP gateway name and description configurable via env vars (#30473) * feat(mcp): make MCP gateway name and description configurable via env vars * Rename function _restore_env to _apply_env * docs(mcp): document import-time capture of env-backed identity constants Address Greptile review feedback: clarify that LITELLM_MCP_SERVER_NAME and LITELLM_MCP_SERVER_DESCRIPTION are read once at import and require a module reload to observe env changes after import. Generated with AI assistance Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Yevhen Luhovtsov <yevhen.luhovtsov@intapp.com> Co-authored-by: Claude <noreply@anthropic.com> * fix(mcp): preserve native tools in semantic filter hook (#26650) * fix(mcp): preserve native tools in semantic filter hook The SemanticToolFilterHook.async_pre_call_hook passed ALL tools (MCP + native) to filter_tools(), which only knows MCP-registered tool names. Native tools silently failed the name match in _get_tools_by_names() and were dropped from the request. Fix: partition tools into native and MCP-registered before filtering. Run the semantic filter only on MCP tools, then merge native tools back unconditionally. Changes: - Robust _is_mcp_tool() using shape-based detection for OpenAI-format dicts, safe regardless of future _extract_tool_info changes - Single-pass partition loop (no double _is_mcp_tool calls) - Preserve native tools in MCP expansion path (mixed requests) - Track MCP expansion to prevent expanded tools bypassing filtering - filter_stats reports MCP-only counts for accurate metrics - Extracted _emit_filter_metadata() helper - Skip spurious filter headers for all-native tool requests Closes #26212 * remove stale docstring note referencing tools_expanded_from_mcp * fix: handle Responses API name collision and preserve tool ordering - Classify Responses API tools ({type: 'function', name: '...'}) as native to prevent name collisions with MCP canonical names - Preserve original request tool ordering using id()-based merge instead of naive native+mcp concatenation - Add 2 regression tests: name collision and ordering preservation * style: apply black formatting * fix(mcp): harden semantic filter — preserve all native tool formats, safe metadata access, graceful expansion failure, name-based merge * lint: suppress PLR0915 on async_pre_call_hook (matches codebase convention) * ci: retrigger checks after rebase onto litellm_internal_staging * feat(fireworks): sync Fireworks AI model registry with current platform catalog (#30616) Adds 12 new Fireworks serverless models and updates 3 existing entries in model_prices_and_context_window.json and its bundled backup to match the current Fireworks platform model list. New direct models: glm-5p2, qwen3p7-plus, minimax-m3, minimax-m2p7, kimi-k2p7-code, kimi-k2p6, deepseek-v4-pro, deepseek-v4-flash. New router endpoints: glm-5p1-fast, kimi-k2p6-fast, kimi-k2p7-code-fast. Updated: glm-5p1, gpt-oss-120b, and gpt-oss-20b now carry correct output token caps, cache-read pricing, and explicit capability flags max_tokens is set equal to max_output_tokens (not the full context window) for models whose generation cap is below their context window. This avoids the shared input+output budget path in get_modified_max_tokens, which would otherwise let callers request output sizes the model cannot produce. The same fix corrects the pre-existing glm-5p1, gpt-oss-120b, and gpt-oss-20b entries that had max_tokens equal to the full context window Short-form aliases (fireworks_ai/<model>) are added for every direct accounts/fireworks/models/ entry so cost attribution works for callers using bare model names. Router endpoints get short-form aliases too, and transform_request now routes bare names ending in -fast to the accounts/fireworks/routers/ path instead of defaulting every bare name to models/. This keeps the kimi-k2p6-fast router from being misrouted to the nonexistent models/kimi-k2p6-fast endpoint kimi-k2p6-turbo is intentionally excluded; kimi-k2p6-fast is its replacement. Context windows for deepseek-v4 and kimi models use the power-of-two values (1048576 and 262144) published on the Fireworks model pages, matching the convention already used by existing entries Two regression tests in test_utils.py assert the exact per-token costs, token limits, capability flags, and short-form-to-long-form equality for all 15 models against both the main and backup cost maps. Two routing tests in test_fireworks_ai_chat_transformation.py verify bare -fast names route to routers/ and bare direct-model names route to models/ * fix(bedrock): handle role:"system" inside the messages array on /v1/messages (#29698) (#30443) * feat(anthropic): hoist leading in-array system to top-level (helper) * test(anthropic): cover _system_content_to_blocks edge cases; deepcopy cache_control * test(anthropic): mid-conversation system normalization cases * feat: add supports_mid_conversation_system flag to Claude Opus 4.8 Add supports_mid_conversation_system: true to all 9 claude-opus-4-8 cost-map entries (Anthropic-native, Bedrock, Vertex, Azure AI) in both the root cost map and the bundled package backup, since the runtime helper and tests read the backup in local/offline mode. Pin the mid-system passthrough regression test to the local cost map via the existing local_model_cost_map fixture so it reads the branch-local flag rather than the network-fetched main copy. * fix(bedrock): normalize in-array system in /v1/messages handler (#29698) Wire normalize_system_messages_for_anthropic into anthropic_messages_handler so all Bedrock /v1/messages paths (Invoke / Mantle / ClaudePlatform / Converse-bridge) hoist leading in-array system entries (and demote mid-conversation ones on models lacking supports_mid_conversation_system) into the top-level system field. The normalized messages/system are written back into the local_vars snapshot the base_llm branch reads from, otherwise the Invoke/Mantle fix would silently no-op. Also fix the helper to resolve supports_mid_conversation_system through the prefix-aware AnthropicModelInfo._supports_model_capability resolver. The raw _supports_factory could not see the flag once get_llm_provider left the invoke/ prefix on the model id, which would have wrongly demoted mid-conversation system on a Bedrock invoke opus-4-8 path. * fix(bedrock): resolve mid-conversation-system flag through mantle/invoke/converse route prefixes; drop unused param * fix(types): widen system param to Union[str, List] for hoisted system blocks * refactor(bedrock): drop dead local_vars messages writeback * fix(bedrock/converse): translate in-array system in anthropic->openai adapter (#29698) * fix(bedrock/converse): preserve cache_control on in-array system; test drop-empty * fix(bedrock/converse): rename colliding local to satisfy mypy; test handler system-merge branches * fix(types): register supports_mid_conversation_system in model-info schema The cost-map JSON-schema validation test (test_aaamodel_prices_and_context_window_json_is_valid) rejects unknown properties, so adding supports_mid_conversation_system to the opus-4-8 cost-map entries failed CI with 'Additional properties are not allowed'. Register the flag in the INTENDED_SCHEMA allow-list and in the ProviderSpecificModelInfo TypedDict so it is a typed, first-class capability flag alongside its peers (supports_output_config, etc.). --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix(bedrock/agentcore): optionally forward multimodal content blocks in InvokeAgentRuntime payload (#28885) * fix(bedrock/agentcore): optionally forward multimodal content blocks in InvokeAgentRuntime payload By default the agentcore provider flattens the last message to a text-only {"prompt": "..."} payload via convert_content_list_to_str, silently dropping OpenAI multimodal blocks (image_url, file, input_audio, ...). This adds an opt-in `forward_multimodal_content` litellm param. When truthy and the last message's content is a list containing a non-text block, the original OpenAI content list is forwarded verbatim under a new "content" field so an attachment-aware AgentCore agent can read it. Default off keeps the payload byte-identical to the legacy {"prompt": "..."} shape — existing agents are unaffected. The flag is read from optional_params (where other AgentCore params land) with a litellm_params fallback, and accepts a bool or a config/env string ('true', '1', ...). AgentCore Runtime is schemaless on the agent side — the agent's @app.entrypoint parses arbitrary JSON up to 100 MB (per https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-invoke-agent.html), so this is a purely upstream change; no AgentCore-side schema is asserted. * fix(bedrock/agentcore): shallow-copy forwarded multimodal content list Address review feedback (Sameerlite): payload["content"] = last_content aliased the caller's mutable messages[-1]["content"] list. Harmless today because the payload is JSON-serialized immediately, but a latent footgun if a future caller mutates the returned payload before serialization. Forward list(last_content) so the payload owns its own list. Block dicts stay shared on purpose — a deep copy would clone potentially large base64 media on the request hot path, and the flagged risk was the shared list, not the blocks. Update the passthrough tests to assert equality + distinct identity, and add a regression test that mutating the payload list can't leak back into the original message content. * Revert "fix(mcp): preserve native tools in semantic filter hook (#26650)" This reverts commit 438c825. * Revert "feat(guardrails): integrate Repelloai Argus guardrail (#30465)" This reverts commit 54da785. * Revert "feat(dashscope): add Responses API support (#30286)" This reverts commit 6766256. * Revert "fix(bedrock): handle role:"system" inside the messages array on /v1/messages (#29698) (#30443)" This reverts commit b8a8083. * Revert "fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients (#30486)" This reverts commit 6e9c0b0. * Revert "fix: route volcengine (Doubao) tiered-pricing models to the tiered cost handler (#30357)" This reverts commit 172e302. * Revert "feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery (#30273)" This reverts commit 4e31885. * fix: pass key_limit=None in team_member_update and patch model_cost in pricing test team_member_update called team_info without key_limit, so the fastapi.Query default object (not None) was passed through to get_data, which failed when serializing it. Pass key_limit=None explicitly to avoid this. test_get_model_info_costs patched litellm.model_cost from the local backup so the assertion holds before the PR is merged and the remote main URL is updated. * fix(security): validate resolved model in /realtime/client_secrets for non-transcription sessions (#30710) Omitting both model and session.model caused the endpoint to default to gpt-4o-realtime-preview without running can_key_call_resolved_model, so any key could access that model regardless of its allowed-model list. The transcription path already called can_key_call_resolved_model; this adds the same call for the realtime path before returning. * fix(lint): fix F821 undefined model_info and F841 unused metadata in create_model_info_response * fix: black formatting and stub get_model_group_info in third team translation test * fix: reformat utils.py with black 26.3.1 to match CI * fix: replace Optional[X] with X | None to satisfy UP045 ruff strict gate --------- Co-authored-by: Habon Laszlo <habonlaci@users.noreply.github.com> Co-authored-by: habonlaci <4699494+habonlaci@users.noreply.github.com> Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com> Co-authored-by: santino18727-debug <santino18727@gmail.com> Co-authored-by: Eric (GabiDevFamily) <271972409+santino18727-debug@users.noreply.github.com> Co-authored-by: Nitish Agarwal <1592163+nitishagar@users.noreply.github.com> Co-authored-by: jho1-godaddy <171078705+jho1-godaddy@users.noreply.github.com> Co-authored-by: 安妮的心动录 <74543653+anneheartrecord@users.noreply.github.com> Co-authored-by: Harshith Gujjeti <153299927+Harshxth@users.noreply.github.com> Co-authored-by: Tomoya Tabuchi <t@tomoyat1.com> Co-authored-by: Vedant Agarwal <43557509+Vedant-Agarwal@users.noreply.github.com> Co-authored-by: Prathamesh Jadhav <55660103+lollinng@users.noreply.github.com> Co-authored-by: songkuan-zheng <252822057+songkuan-zheng@users.noreply.github.com> Co-authored-by: Kropiunig <48442031+Kropiunig@users.noreply.github.com> Co-authored-by: Lavish Bansal <lavish.bansal619@gmail.com> Co-authored-by: Shane Emmons <27679+semmons99@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Anuj ojha <ojhaanuj224@gmail.com> Co-authored-by: Nahrin <nahrin@nahrinoda.com> Co-authored-by: Nbouyaa <67773915+FadelT@users.noreply.github.com> Co-authored-by: Vineeth Sai <vineethsai4444@gmail.com> Co-authored-by: Eugene Lugovtsov <34510252+EugeneLugovtsov@users.noreply.github.com> Co-authored-by: Yevhen Luhovtsov <yevhen.luhovtsov@intapp.com> Co-authored-by: Ayush Shekhar <106994833+ayushh0110@users.noreply.github.com> Co-authored-by: Ahmad Shahzad <107808273+shzdehmd@users.noreply.github.com> Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com> Co-authored-by: Jón Levy <levy@apro.is>

feat(proxy): serve Anthropic-native /v1/models for Claude Code gatewa…

2ab056f

…y discovery

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread litellm/proxy/utils.py Outdated

refactor(proxy): move Anthropic model-list formatter into llms/anthro…

3f4c6fe

…pic/common_utils

fix(proxy): make model_list request param optional for direct callers

fa4990e

Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_170626_1 June 17, 2026 11:23

Sameerlite approved these changes Jun 17, 2026

View reviewed changes

Sameerlite merged commit 4e31885 into BerriAI:litellm_oss_170626_1 Jun 17, 2026
73 checks passed

Sameerlite added a commit that referenced this pull request Jun 17, 2026

Revert "feat(proxy): serve Anthropic-native /v1/models for Claude Cod…

d14ee3f

…e gateway discovery (#30273)" This reverts commit 4e31885.

Ar-maan05 deleted the feat-anthropic-native-v1-models branch June 19, 2026 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery#30273

feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery#30273
Sameerlite merged 3 commits into
BerriAI:litellm_oss_170626_1from
Ar-maan05:feat-anthropic-native-v1-models

Ar-maan05 commented Jun 12, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

codecov Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Ar-maan05 commented Jun 12, 2026

Uh oh!

Ar-maan05 commented Jun 16, 2026

Uh oh!

Sameerlite commented Jun 17, 2026

Uh oh!

Ar-maan05 commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Ar-maan05 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Type

Changes

Tests

Screenshots / Proof of Fix

Uh oh!

greptile-apps Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

codecov Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Ar-maan05 commented Jun 12, 2026

Uh oh!

Ar-maan05 commented Jun 16, 2026

Uh oh!

Sameerlite commented Jun 17, 2026

Uh oh!

Ar-maan05 commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ar-maan05 commented Jun 12, 2026 •

edited

Loading

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading

codecov Bot commented Jun 12, 2026 •

edited

Loading