Preserve x-anthropic-billing-header system blocks for first-party Anthropic by PigeonMark · Pull Request #29584 · BerriAI/litellm

PigeonMark · 2026-06-03T13:24:19Z

Relevant issues

Fixes #29572

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have added meaningful tests
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible; it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Ran a LiteLLM proxy with an Anthropic model whose key is a Claude Max OAuth token, hitting the real Anthropic API, and sent a request shaped like Claude Code's auto-mode safety classifier: it carries the x-anthropic-billing-header system block but not the "You are Claude Code" identity block.

# config.yaml
model_list:
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY   # sk-ant-oat01-... (Max/Pro OAuth token)
general_settings:
  master_key: sk-1234

python litellm/proxy/proxy_cli.py --config config.yaml --port 4010

curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:4010/v1/messages \
  -H "x-api-key: sk-1234" -H "anthropic-version: 2023-06-01" -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":16,
       "system":[{"type":"text","text":"x-anthropic-billing-header: cc_version=2.1.160.abc; cc_entrypoint=cli; cch=00000;"},
                 {"type":"text","text":"You are a security monitor for autonomous AI coding agents. Reply only with OK."}],
       "messages":[{"role":"user","content":"Is reading a file safe?"}]}'

On litellm_internal_staging (before the fix), the billing block is stripped and Anthropic does not recognize the request, so it returns 429:

HTTP 429
{"error":{"message":"{\"type\":\"error\",\"error\":{\"type\":\"rate_limit_error\",\"message\":\"Error\"},\"request_id\":\"req_011CbgP97nD5zSyZdqhRokb7\"}. Received Model Group=claude-sonnet-4-6 ...","code":"429"}}

On this branch, the billing block is preserved and the request succeeds:

HTTP 200
{"model":"claude-sonnet-4-6","type":"message","role":"assistant","content":[{"type":"text","text":"OK"}],"stop_reason":"end_turn", ...}

Type

🐛 Bug Fix

Changes

AnthropicConfig.translate_system_message and AnthropicMessagesConfig strip any system block beginning with x-anthropic-billing-header: before forwarding upstream (added in #20951). On the first-party Anthropic API that block is how Claude Code subscription (OAuth) traffic is recognized, so dropping it makes any request that carries only that block return a misleading 429 rate_limit_error. The most visible victim is Claude Code's auto-mode tool-safety classifier, which carries no other identifying block; normal conversation turns keep working because they also include the "You are Claude Code" identity block. Recent Claude Code versions also run that classifier in plan mode and accept-edits mode, so for subscription users behind LiteLLM this breaks almost every workflow except "normal" mode.

The strip cannot simply be removed, because both methods are inherited by non-first-party providers (Bedrock platform, Vertex, Azure, Databricks, Minimax, DeepSeek) where the block is meaningless and, on Bedrock, rejected as a reserved keyword. This change gates the strip behind a new should_strip_billing_metadata() method that defaults to False on the first-party AnthropicConfig and AnthropicMessagesConfig, so the block is preserved there, and is overridden to True on the providers that reach these transforms and need the block removed: Bedrock platform, Vertex, and Azure on the chat path, and Minimax, Azure, and DeepSeek on the messages path. Behavior for those providers is unchanged; only first-party Anthropic now keeps the block.

Regression tests in test_anthropic_chat_transformation.py cover both transforms: first-party Anthropic keeps the billing block while Bedrock and Minimax strip it. The first-party assertions fail against the previous unconditional-strip behavior.

…hropic PR BerriAI#20951 strips system blocks beginning with "x-anthropic-billing-header:" for every Anthropic target. That block is how the first-party Anthropic API recognizes Claude Code subscription (OAuth) traffic, so dropping it makes requests that carry only that block, such as the auto-mode tool-safety classifier, fail with a misleading 429 rate_limit_error; normal turns still work because they also carry the "You are Claude Code" identity block. Gate the strip behind should_strip_billing_metadata(), defaulting to False on the first-party AnthropicConfig and AnthropicMessagesConfig so the block is kept, and overridden to True on the providers that reach these transforms and reject the block (Bedrock platform, Vertex, Azure for the chat path; Minimax, Azure, DeepSeek for the messages path). Behavior for those providers is unchanged.

greptile-apps · 2026-06-03T13:27:33Z

Greptile Summary

This PR fixes a regression where x-anthropic-billing-header system blocks were unconditionally stripped by AnthropicConfig.translate_system_message and AnthropicMessagesConfig.transform_anthropic_messages_request, causing Claude Code's auto-mode safety classifier requests (which carry only that block for attribution) to receive a 429 rate_limit_error from the first-party Anthropic API.

Introduces a should_strip_billing_metadata() hook on both base config classes that defaults to False, preserving the billing block for first-party Anthropic while all non-first-party subclasses (Bedrock, Vertex, Azure, Minimax, DeepSeek) override it to True to retain their existing strip behavior.
Adds six focused unit tests covering both transforms across all affected providers, with before/after evidence of HTTP 429 → 200 provided in the PR description.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to the billing-header strip logic, all non-first-party providers explicitly opt back in to stripping, and the only provider subclass that inherits the new default resolves transform_request through the OpenAI-like path and never reaches translate_system_message.

The fix correctly gates an existing side-effect behind an overridable hook rather than removing it globally, every affected provider is updated and covered by tests, and the unit tests are all mock-only with no network calls.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/llms/anthropic/chat/transformation.py	Adds `should_strip_billing_metadata()` hook returning `False` and gates the billing-header strip in `translate_system_message` behind it — preserves headers for first-party Anthropic.
litellm/llms/anthropic/experimental_pass_through/messages/transformation.py	Same hook added to `AnthropicMessagesConfig` and `transform_anthropic_messages_request` call guarded by it; first-party keeps billing headers.
litellm/llms/bedrock/chat/invoke_transformations/anthropic_claude3_transformation.py	Overrides `should_strip_billing_metadata()` to `True`; preserves existing Bedrock strip behavior.
litellm/llms/bedrock/claude_platform/transformation.py	Overrides `should_strip_billing_metadata()` to `True` for the Bedrock Claude platform path.
litellm/llms/vertex_ai/vertex_ai_partner_models/anthropic/transformation.py	Overrides `should_strip_billing_metadata()` to `True` for VertexAI Anthropic chat path.
litellm/llms/azure_ai/anthropic/transformation.py	Overrides `should_strip_billing_metadata()` to `True` for Azure AI Anthropic chat path.
litellm/llms/minimax/messages/transformation.py	Overrides `should_strip_billing_metadata()` to `True` for Minimax messages path.
litellm/llms/deepseek/messages/transformation.py	Overrides `should_strip_billing_metadata()` to `True` for DeepSeek messages path.
tests/test_litellm/llms/anthropic/chat/test_anthropic_chat_transformation.py	Adds 6 new unit tests covering both base classes and all overriding providers; all tests are mock-only with no network calls.

_{Reviews (2): Last reviewed commit: "Strip billing header on Bedrock invoke a..." | Re-trigger Greptile}

codecov · 2026-06-03T13:34:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Two more subclasses reach the gated strip but inherited keep-by-default. AmazonAnthropicClaudeConfig (Bedrock invoke) calls AnthropicConfig.transform_request, which calls translate_system_message, and VertexAIPartnerModelsAnthropicMessagesConfig (Vertex messages pass-through) calls super().transform_anthropic_messages_request. Override should_strip_billing_metadata() to True on both. Add a parametrized test asserting the flag for every first-party base (False) and provider subclass (True), covering all overrides, plus a translate_system_message regression test for the Bedrock invoke path.

veria-ai · 2026-06-03T13:48:03Z

+        The first-party Anthropic API uses these blocks for Claude Code attribution, so the
+        base config keeps them. Providers that reject them (e.g. Bedrock) override this to True.
+        """
+        return False


Medium: User-controlled billing attribution

messages and /v1/messages request bodies are client-controlled at the proxy boundary, so preserving this reserved x-anthropic-billing-header block by default lets any proxy client spoof Claude Code billing attribution sent to Anthropic under the proxy's API key. Keep the default stripping behavior and add an explicit trusted opt-in for requests where the proxy has verified that these attribution blocks should be forwarded; the same gate should be applied to AnthropicMessagesConfig.should_strip_billing_metadata().

This change restores behavior that predates #20951. The x-anthropic-billing-header system block was forwarded to Anthropic for a long time, and #20951 only recently began stripping it, which is what broke first-party OAuth requests. Preserving it again reverts to that prior default rather than opening a new path.

The block is also not the only client-controlled recognition signal. Anthropic treats a request as first-party Claude Code when the system prompt carries either the billing header or the You are Claude Code, Anthropic's official CLI for Claude. identity block, and nothing in LiteLLM strips that identity block. A client who wanted their request handled as Claude Code subscription traffic could already achieve that through the identity block, so stripping only the billing header does not actually close that vector.

On impact, forwarding the block does not redirect billing. The upstream credential configured on the deployment (the OAuth token or API key) determines whose subscription is charged, and the fields seen in practice (cc_version, cc_entrypoint, cch) are attribution and telemetry, not a cost-routing control.

For those reasons I have left the default as preserve, so auto mode, plan mode, and accept-edits mode all work out of the box for the common OAuth passthrough case. If the LiteLLM team prefers secure-by-default, I can gate preservation behind an explicit opt-in (for example a forward_anthropic_billing_metadata setting that defaults to stripping) wired through the same should_strip_billing_metadata() method.

veria-ai · 2026-06-03T13:48:11Z

PR overview

This PR changes Anthropic chat/message transformation logic to preserve x-anthropic-billing-header system blocks for first-party Anthropic requests instead of stripping them. It focuses on how those billing metadata blocks are carried through /v1/messages-style request handling.

There is one open security concern: the preserved billing metadata can come from client-controlled proxy requests, allowing a proxy client to spoof billing attribution sent to Anthropic under the proxy’s API key. The requested mitigation is to keep stripping as the default and only forward these blocks behind an explicit trusted opt-in, including the related should_strip_billing_metadata() path. No issues have been addressed yet, so the PR still has a concrete but bounded attribution-integrity risk.

Open issues (1)

Medium: User-controlled billing attribution — litellm/llms/anthropic/chat/transformation.py:1630

Fixed/addressed: 0 · PR risk: 5/10

PigeonMark · 2026-06-03T14:00:24Z

@greptileai

…ader-first-party

Sameerlite

LGTM

* feat(bedrock_mantle): add SigV4/IAM auth to Responses API route (fixes #29665) (#29788) * feat(responses): add default no-op sign_request to BaseResponsesAPIConfig * feat(responses): call sign_request after body is final, send signed bytes when signed * feat(bedrock_mantle): add SigV4 sign_request via composed BaseAWSLLM (bearer path) * test(bedrock_mantle): cover SigV4 access-key, AssumeRole, body bytes, region/auth consistency * feat(bedrock_mantle): defer auth to sign_request; validate_environment no longer requires bearer * docs(bedrock_mantle): document SigV4 + Bearer auth on Responses route * test(responses): cover fake-stream signing order and mantle bearer arg/env precedence * fix(bedrock_mantle): wrap all botocore credential errors with both-paths guidance * fix(bedrock_mantle): catch specific credential errors, not all BotoCoreError, so STS transport failures are not masked * fix(bedrock_mantle): sign the compact Responses route too, not just create * fix(github-copilot): route per-model on /v1/responses based on model info (#29747) * feat(focus): add GCS destination for FOCUS export (#29751) * test: add failing tests for FocusGCSDestination * feat: add FocusGCSDestination reusing GCSBucketBase auth * feat: register FocusGCSDestination in factory; export from __init__ * fix(focus): preserve GCS_PATH_SERVICE_ACCOUNT when service_account_json not in config * style: apply Black formatting to gcs_destination and tests * style: apply Black formatting to factory.py * fix(bedrock): omit empty additionalModelRequestFields and system from Converse API payload (#29565) Amazon Nova Pro (and other strict Bedrock models) return 400 Malformed input request when additionalModelRequestFields: {} or system: [] are present in the payload. Both fields are optional in CommonRequestObject (total=False) and must be omitted rather than sent as empty structures. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible in pass-through cost tracking (#29730) * fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible Azure OpenAI resources created via the newer "Azure AI Foundry" / Cognitive Services pathway live on `*.cognitiveservices.azure.com` subdomains, not the older `openai.azure.com`. Both are valid Azure OpenAI surfaces in production today. The OpenAI pass-through cost-tracking handler hard-codes only the older hostname in five places (four `is_openai_*_route` methods on OpenAIPassthroughLoggingHandler, plus is_openai_route on PassThroughEndpointLogging). As a result, calls from newer Azure deployments are silently classified as "not an OpenAI route", the dispatch into the cost-tracking handler is skipped, and tokens/cost never get extracted into LiteLLM_SpendLogs — the row gets written with prompt_tokens=0, completion_tokens=0, spend=0, model='unknown'. Reproduced 2026-06-04 against a real Azure OpenAI deployment on `*.cognitiveservices.azure.com` proxied through LiteLLM v1.88.0. Fix: factor the hostname check into a single helper `_is_openai_compatible_host` listing all three recognized surfaces (api.openai.com, openai.azure.com, cognitiveservices.azure.com), and have all five call sites delegate to it. Purely additive — never weakens recognition for the originally-supported hostnames. Adds a test `test_is_openai_route_recognizes_cognitiveservices_azure_com` that exercises all four `is_openai_*_route` static methods against `*.cognitiveservices.azure.com` URLs (positive cases per route + a small cross-route negative to confirm route-specific path matching still works on the new hostname). Out of scope for this PR (separate followup): - `openai_passthrough_handler` calls chat/completions `transform_response` on Responses API payloads (`output:` not `choices:`), which throws inside the dispatch and drops the SpendLogs row entirely. Recognized + tracked separately. * ci: trigger fresh run Empty commit to re-run checks. The previous auth-and-jwt failure was a transient HuggingFace Hub 429 rate-limit hitting tokenizer downloads in tests/proxy_unit_tests/test_custom_tokenizer_bug.py — unrelated to this PR's scope (hostname recognition in pass-through cost tracking). No code change. --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(responses): preserve forced-function tool_choice name in Responses to Chat transform (#29812) The Responses API forces a specific function with a top-level name ({"type": "function", "name": "X"}), but _transform_tool_choice only handled the nested Chat Completions shape and fell through to returning "required" for the flat form, silently dropping the function name and degrading a forced function call to force-any-tool. Map the flat Responses shape to the nested Chat shape, keeping the "required" fallback when no name is present. * Preserve x-anthropic-billing-header system blocks for first-party Anthropic (#29584) * Preserve x-anthropic-billing-header system blocks for first-party Anthropic PR #20951 strips system blocks beginning with "x-anthropic-billing-header:" for every Anthropic target. That block is how the first-party Anthropic API recognizes Claude Code subscription (OAuth) traffic, so dropping it makes requests that carry only that block, such as the auto-mode tool-safety classifier, fail with a misleading 429 rate_limit_error; normal turns still work because they also carry the "You are Claude Code" identity block. Gate the strip behind should_strip_billing_metadata(), defaulting to False on the first-party AnthropicConfig and AnthropicMessagesConfig so the block is kept, and overridden to True on the providers that reach these transforms and reject the block (Bedrock platform, Vertex, Azure for the chat path; Minimax, Azure, DeepSeek for the messages path). Behavior for those providers is unchanged. * Strip billing header on Bedrock invoke and Vertex messages pass-through Two more subclasses reach the gated strip but inherited keep-by-default. AmazonAnthropicClaudeConfig (Bedrock invoke) calls AnthropicConfig.transform_request, which calls translate_system_message, and VertexAIPartnerModelsAnthropicMessagesConfig (Vertex messages pass-through) calls super().transform_anthropic_messages_request. Override should_strip_billing_metadata() to True on both. Add a parametrized test asserting the flag for every first-party base (False) and provider subclass (True), covering all overrides, plus a translate_system_message regression test for the Bedrock invoke path. * fix(cache): log hashed cache keys (#29890) * fix(ui): save routing groups as list (#29889) * Revert "fix(ui): save routing groups as list (#29889)" (#29928) This reverts commit 9b1f78f. * feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider (#29842) * feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider Registers parasail in the openai_like JSON provider loader with both /v1/chat/completions and /v1/responses support. Parasail's Responses API rejects store:true and any request that omits store, so the loader gains a force_store_false special_handling flag; the parasail entry sets it and the generated Responses config overrides store=false on every call. This keeps callers from hitting "State storage not supported" and matches what Parasail's docs require. Adds the PARASAIL enum value, listing under openai_compatible_providers, provider documentation at docs/my-website/docs/providers/parasail.md, and a focused unit test file under tests/test_litellm/llms/parasail/ that covers JSON registration, chat URL construction, Responses URL construction with PARASAIL_API_BASE override, and the force_store_false regression in both the caller-sent-store=true and caller-omitted cases. * fix(parasail): register in provider_endpoints_support, drop in-repo docs Greptile review feedback. The provider doc belongs in the litellm-docs repo, not this one's docs/my-website tree; removing it here. Adds the parasail entry to provider_endpoints_support.json so the check_provider_folders_documented.py CI check passes (chat_completions and responses true; others false). * fix: normalize Anthropic passthrough server tool usage (#29827) * test(anthropic): cover server_tool_use dict cost tracking * fix: normalize Anthropic server tool usage (cherry picked from commit 982f726) * fix: keep server tool usage subscriptable (cherry picked from commit 70280b9) --------- Co-authored-by: Genmin <joey@joeyroth.com> * fix(proxy): fix typo generic_role_mappoings -> generic_role_mappings in ui_sso.py (#29753) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * feat(proxy): add disable_budget_reservation general setting (#27639) (#29493) * feat(proxy): add disable_budget_reservation general setting (#27639) * feat(proxy): register disable_budget_reservation in ConfigGeneralSettings (#27639) * docs(proxy): document disable_budget_reservation concurrency tradeoff (#27639) * ci: re-trigger flaky docker build (prisma generate ECONNRESET) * fix(proxy): warn and document budget enforcement tradeoff when disable_budget_reservation is set (#27639) * feat(gemini_tts): adding support to Gemini TTS languageCode parameters (#29623) * Adding support to Gemini TTS Language Code parameters * Mapping Gemini TTS languageCode param in Docstring * Use snake_case for language_code input keyMapping Gemini TTS languageCode param in Docstring * Restoring files modified under enterprise/litellm_enterprise due to lint/formatting checks --------- Co-authored-by: João Garrido <joaogarrido@google.com> * feat(guardrails): capture user and model metadata in CrowdStrike AIDR (#29517) * fix(proxy): require OpenAI path segment for shared Azure Cognitive Services domains Address Greptile review: the `*.cognitiveservices.azure.com` / `*.openai.azure.com` domains are shared by every Azure Cognitive Service (Speech, Vision, Language, ...), so a hostname-only substring match misclassified non-OpenAI Azure traffic as OpenAI routes. - Replace the substring host test with suffix matching (rejects look-alike domains like cognitiveservices.azure.com.attacker.example). - Add `_is_openai_compatible_url` that requires an OpenAI-style path marker (`/openai/` or `/v1/`) on the shared Azure domains, and use it in PassThroughEndpointLogging.is_openai_route (previously hostname-only). - Add negative tests for Azure Speech/Vision paths and look-alike domains. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: support Responses input in Redis semantic cache (#29581) * fix: support responses input in redis semantic cache * test: cover redis semantic prompt extraction * test: handle blank redis semantic text fallbacks * chore: remove async cache dead statement * test: cover redis semantic cache miss paths * fix: filter sensitive cache lookup kwargs * chore: rerun ci after huggingface rate limit * chore(ui): regenerate dashboard API types (npm run gen:api) Sync src/lib/http/schema.d.ts with the proxy OpenAPI spec: adds the disable_budget_reservation general-settings field and picks up the RateLimitError docstring reindent. Fixes the gen:api CI drift check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(bedrock): assert empty additionalModelRequestFields is omitted The Converse transformer now drops an empty additionalModelRequestFields block instead of sending it as `{}`. Update test_bedrock_top_k_param so models without top_k support (llama3) assert the key is absent rather than equal to an empty dict. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com> Co-authored-by: codgician <15964984+codgician@users.noreply.github.com> Co-authored-by: Praveen Ghuge <95286176+pghuge-cloudwiz@users.noreply.github.com> Co-authored-by: Roi <roytev@gmail.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Liam Scott <liam@uilliam.com> Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com> Co-authored-by: Ceder Dens <cederdens@gmail.com> Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com> Co-authored-by: Kai Huang <kaihuang724@gmail.com> Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com> Co-authored-by: Genmin <joey@joeyroth.com> Co-authored-by: Arnav Bhilwariya <arnavbhilwariya0408@gmail.com> Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com> Co-authored-by: João Garrido <48538534+johngarrido@users.noreply.github.com> Co-authored-by: João Garrido <joaogarrido@google.com> Co-authored-by: Kenan Yildirim <kenan@kenany.me> Co-authored-by: Dávid Balatoni <balcsida@gmail.com>

* feat(bedrock_mantle): add SigV4/IAM auth to Responses API route (fixes BerriAI#29665) (BerriAI#29788) * feat(responses): add default no-op sign_request to BaseResponsesAPIConfig * feat(responses): call sign_request after body is final, send signed bytes when signed * feat(bedrock_mantle): add SigV4 sign_request via composed BaseAWSLLM (bearer path) * test(bedrock_mantle): cover SigV4 access-key, AssumeRole, body bytes, region/auth consistency * feat(bedrock_mantle): defer auth to sign_request; validate_environment no longer requires bearer * docs(bedrock_mantle): document SigV4 + Bearer auth on Responses route * test(responses): cover fake-stream signing order and mantle bearer arg/env precedence * fix(bedrock_mantle): wrap all botocore credential errors with both-paths guidance * fix(bedrock_mantle): catch specific credential errors, not all BotoCoreError, so STS transport failures are not masked * fix(bedrock_mantle): sign the compact Responses route too, not just create * fix(github-copilot): route per-model on /v1/responses based on model info (BerriAI#29747) * feat(focus): add GCS destination for FOCUS export (BerriAI#29751) * test: add failing tests for FocusGCSDestination * feat: add FocusGCSDestination reusing GCSBucketBase auth * feat: register FocusGCSDestination in factory; export from __init__ * fix(focus): preserve GCS_PATH_SERVICE_ACCOUNT when service_account_json not in config * style: apply Black formatting to gcs_destination and tests * style: apply Black formatting to factory.py * fix(bedrock): omit empty additionalModelRequestFields and system from Converse API payload (BerriAI#29565) Amazon Nova Pro (and other strict Bedrock models) return 400 Malformed input request when additionalModelRequestFields: {} or system: [] are present in the payload. Both fields are optional in CommonRequestObject (total=False) and must be omitted rather than sent as empty structures. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible in pass-through cost tracking (BerriAI#29730) * fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible Azure OpenAI resources created via the newer "Azure AI Foundry" / Cognitive Services pathway live on `*.cognitiveservices.azure.com` subdomains, not the older `openai.azure.com`. Both are valid Azure OpenAI surfaces in production today. The OpenAI pass-through cost-tracking handler hard-codes only the older hostname in five places (four `is_openai_*_route` methods on OpenAIPassthroughLoggingHandler, plus is_openai_route on PassThroughEndpointLogging). As a result, calls from newer Azure deployments are silently classified as "not an OpenAI route", the dispatch into the cost-tracking handler is skipped, and tokens/cost never get extracted into LiteLLM_SpendLogs — the row gets written with prompt_tokens=0, completion_tokens=0, spend=0, model='unknown'. Reproduced 2026-06-04 against a real Azure OpenAI deployment on `*.cognitiveservices.azure.com` proxied through LiteLLM v1.88.0. Fix: factor the hostname check into a single helper `_is_openai_compatible_host` listing all three recognized surfaces (api.openai.com, openai.azure.com, cognitiveservices.azure.com), and have all five call sites delegate to it. Purely additive — never weakens recognition for the originally-supported hostnames. Adds a test `test_is_openai_route_recognizes_cognitiveservices_azure_com` that exercises all four `is_openai_*_route` static methods against `*.cognitiveservices.azure.com` URLs (positive cases per route + a small cross-route negative to confirm route-specific path matching still works on the new hostname). Out of scope for this PR (separate followup): - `openai_passthrough_handler` calls chat/completions `transform_response` on Responses API payloads (`output:` not `choices:`), which throws inside the dispatch and drops the SpendLogs row entirely. Recognized + tracked separately. * ci: trigger fresh run Empty commit to re-run checks. The previous auth-and-jwt failure was a transient HuggingFace Hub 429 rate-limit hitting tokenizer downloads in tests/proxy_unit_tests/test_custom_tokenizer_bug.py — unrelated to this PR's scope (hostname recognition in pass-through cost tracking). No code change. --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(responses): preserve forced-function tool_choice name in Responses to Chat transform (BerriAI#29812) The Responses API forces a specific function with a top-level name ({"type": "function", "name": "X"}), but _transform_tool_choice only handled the nested Chat Completions shape and fell through to returning "required" for the flat form, silently dropping the function name and degrading a forced function call to force-any-tool. Map the flat Responses shape to the nested Chat shape, keeping the "required" fallback when no name is present. * Preserve x-anthropic-billing-header system blocks for first-party Anthropic (BerriAI#29584) * Preserve x-anthropic-billing-header system blocks for first-party Anthropic PR BerriAI#20951 strips system blocks beginning with "x-anthropic-billing-header:" for every Anthropic target. That block is how the first-party Anthropic API recognizes Claude Code subscription (OAuth) traffic, so dropping it makes requests that carry only that block, such as the auto-mode tool-safety classifier, fail with a misleading 429 rate_limit_error; normal turns still work because they also carry the "You are Claude Code" identity block. Gate the strip behind should_strip_billing_metadata(), defaulting to False on the first-party AnthropicConfig and AnthropicMessagesConfig so the block is kept, and overridden to True on the providers that reach these transforms and reject the block (Bedrock platform, Vertex, Azure for the chat path; Minimax, Azure, DeepSeek for the messages path). Behavior for those providers is unchanged. * Strip billing header on Bedrock invoke and Vertex messages pass-through Two more subclasses reach the gated strip but inherited keep-by-default. AmazonAnthropicClaudeConfig (Bedrock invoke) calls AnthropicConfig.transform_request, which calls translate_system_message, and VertexAIPartnerModelsAnthropicMessagesConfig (Vertex messages pass-through) calls super().transform_anthropic_messages_request. Override should_strip_billing_metadata() to True on both. Add a parametrized test asserting the flag for every first-party base (False) and provider subclass (True), covering all overrides, plus a translate_system_message regression test for the Bedrock invoke path. * fix(cache): log hashed cache keys (BerriAI#29890) * fix(ui): save routing groups as list (BerriAI#29889) * Revert "fix(ui): save routing groups as list (BerriAI#29889)" (BerriAI#29928) This reverts commit 9b1f78f. * feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider (BerriAI#29842) * feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider Registers parasail in the openai_like JSON provider loader with both /v1/chat/completions and /v1/responses support. Parasail's Responses API rejects store:true and any request that omits store, so the loader gains a force_store_false special_handling flag; the parasail entry sets it and the generated Responses config overrides store=false on every call. This keeps callers from hitting "State storage not supported" and matches what Parasail's docs require. Adds the PARASAIL enum value, listing under openai_compatible_providers, provider documentation at docs/my-website/docs/providers/parasail.md, and a focused unit test file under tests/test_litellm/llms/parasail/ that covers JSON registration, chat URL construction, Responses URL construction with PARASAIL_API_BASE override, and the force_store_false regression in both the caller-sent-store=true and caller-omitted cases. * fix(parasail): register in provider_endpoints_support, drop in-repo docs Greptile review feedback. The provider doc belongs in the litellm-docs repo, not this one's docs/my-website tree; removing it here. Adds the parasail entry to provider_endpoints_support.json so the check_provider_folders_documented.py CI check passes (chat_completions and responses true; others false). * fix: normalize Anthropic passthrough server tool usage (BerriAI#29827) * test(anthropic): cover server_tool_use dict cost tracking * fix: normalize Anthropic server tool usage (cherry picked from commit 982f726) * fix: keep server tool usage subscriptable (cherry picked from commit 70280b9) --------- Co-authored-by: Genmin <joey@joeyroth.com> * fix(proxy): fix typo generic_role_mappoings -> generic_role_mappings in ui_sso.py (BerriAI#29753) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * feat(proxy): add disable_budget_reservation general setting (BerriAI#27639) (BerriAI#29493) * feat(proxy): add disable_budget_reservation general setting (BerriAI#27639) * feat(proxy): register disable_budget_reservation in ConfigGeneralSettings (BerriAI#27639) * docs(proxy): document disable_budget_reservation concurrency tradeoff (BerriAI#27639) * ci: re-trigger flaky docker build (prisma generate ECONNRESET) * fix(proxy): warn and document budget enforcement tradeoff when disable_budget_reservation is set (BerriAI#27639) * feat(gemini_tts): adding support to Gemini TTS languageCode parameters (BerriAI#29623) * Adding support to Gemini TTS Language Code parameters * Mapping Gemini TTS languageCode param in Docstring * Use snake_case for language_code input keyMapping Gemini TTS languageCode param in Docstring * Restoring files modified under enterprise/litellm_enterprise due to lint/formatting checks --------- Co-authored-by: João Garrido <joaogarrido@google.com> * feat(guardrails): capture user and model metadata in CrowdStrike AIDR (BerriAI#29517) * fix(proxy): require OpenAI path segment for shared Azure Cognitive Services domains Address Greptile review: the `*.cognitiveservices.azure.com` / `*.openai.azure.com` domains are shared by every Azure Cognitive Service (Speech, Vision, Language, ...), so a hostname-only substring match misclassified non-OpenAI Azure traffic as OpenAI routes. - Replace the substring host test with suffix matching (rejects look-alike domains like cognitiveservices.azure.com.attacker.example). - Add `_is_openai_compatible_url` that requires an OpenAI-style path marker (`/openai/` or `/v1/`) on the shared Azure domains, and use it in PassThroughEndpointLogging.is_openai_route (previously hostname-only). - Add negative tests for Azure Speech/Vision paths and look-alike domains. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: support Responses input in Redis semantic cache (BerriAI#29581) * fix: support responses input in redis semantic cache * test: cover redis semantic prompt extraction * test: handle blank redis semantic text fallbacks * chore: remove async cache dead statement * test: cover redis semantic cache miss paths * fix: filter sensitive cache lookup kwargs * chore: rerun ci after huggingface rate limit * chore(ui): regenerate dashboard API types (npm run gen:api) Sync src/lib/http/schema.d.ts with the proxy OpenAPI spec: adds the disable_budget_reservation general-settings field and picks up the RateLimitError docstring reindent. Fixes the gen:api CI drift check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(bedrock): assert empty additionalModelRequestFields is omitted The Converse transformer now drops an empty additionalModelRequestFields block instead of sending it as `{}`. Update test_bedrock_top_k_param so models without top_k support (llama3) assert the key is absent rather than equal to an empty dict. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com> Co-authored-by: codgician <15964984+codgician@users.noreply.github.com> Co-authored-by: Praveen Ghuge <95286176+pghuge-cloudwiz@users.noreply.github.com> Co-authored-by: Roi <roytev@gmail.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Liam Scott <liam@uilliam.com> Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com> Co-authored-by: Ceder Dens <cederdens@gmail.com> Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com> Co-authored-by: Kai Huang <kaihuang724@gmail.com> Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com> Co-authored-by: Genmin <joey@joeyroth.com> Co-authored-by: Arnav Bhilwariya <arnavbhilwariya0408@gmail.com> Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com> Co-authored-by: João Garrido <48538534+johngarrido@users.noreply.github.com> Co-authored-by: João Garrido <joaogarrido@google.com> Co-authored-by: Kenan Yildirim <kenan@kenany.me> Co-authored-by: Dávid Balatoni <balcsida@gmail.com>

veria-ai Bot reviewed Jun 3, 2026

View reviewed changes

Merge branch 'litellm_internal_staging' into fix-anthropic-billing-he…

08c9662

…ader-first-party

Sameerlite approved these changes Jun 8, 2026

View reviewed changes

Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_staging_080626 June 8, 2026 12:12

Sameerlite merged commit 164734c into BerriAI:litellm_oss_staging_080626 Jun 8, 2026
69 of 70 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preserve x-anthropic-billing-header system blocks for first-party Anthropic#29584

Preserve x-anthropic-billing-header system blocks for first-party Anthropic#29584
Sameerlite merged 3 commits into
BerriAI:litellm_oss_staging_080626from
PigeonMark:fix-anthropic-billing-header-first-party

PigeonMark commented Jun 3, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 3, 2026 •

edited

Loading

Important Files Changed

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

veria-ai Bot Jun 3, 2026

Uh oh!

PigeonMark Jun 4, 2026

Uh oh!

veria-ai Bot commented Jun 3, 2026

Uh oh!

PigeonMark commented Jun 3, 2026

Uh oh!

Sameerlite left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

PigeonMark commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

greptile-apps Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

veria-ai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

PigeonMark Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

veria-ai Bot commented Jun 3, 2026

PR overview

Open issues (1)

Uh oh!

PigeonMark commented Jun 3, 2026

Uh oh!

Sameerlite left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PigeonMark commented Jun 3, 2026 •

edited

Loading

greptile-apps Bot commented Jun 3, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading