Skip to content

fix: normalize Anthropic passthrough server tool usage#29827

Merged
Sameerlite merged 3 commits into
BerriAI:litellm_oss_staging_080626from
ririnto:fix-anthropic-server-tool-use-passthrough-cost
Jun 8, 2026
Merged

fix: normalize Anthropic passthrough server tool usage#29827
Sameerlite merged 3 commits into
BerriAI:litellm_oss_staging_080626from
ririnto:fix-anthropic-server-tool-use-passthrough-cost

Conversation

@ririnto

@ririnto ririnto commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Relevant issues

Fixes #26749

Related: #26153, #27346, #28980, and Genmin/#26904.

Linear ticket

N/A

Pre-Submission checklist

  • I have added meaningful tests
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible; it only solves 1 specific problem

Delays in PR merge?

N/A

CI (LiteLLM team)

  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Latest local verification after the follow-up cherry-pick:

  • Failed CI test + regressions: 5 passed
  • Expanded relevant tests: 75 passed, 1 warning
  • uv run --extra proxy ruff check --ignore PLR0915 litellm/types/utils.py: passed
  • git diff HEAD~3..HEAD --check: passed

Type

Bug Fix

Changes

This PR fixes Anthropic passthrough cost logging when usage.server_tool_use arrives as a raw dict.

The branch includes:

This PR does not claim to supersede #26904 or fix every possible server_tool_use dict path. The related streaming and broader defensive-access work in #26153, #27346, and #28980 remains separate context.

Reviewer Focus

Please focus review on:

  • Usage.__init__ normalizing dict server_tool_use values into ServerToolUse
  • ServerToolUse.__getitem__ preserving existing subscript access such as usage.server_tool_use["web_search_requests"]
  • Passthrough logging regression coverage proving response_cost is set for Anthropic-compatible responses with dict server_tool_use

@ririnto

ririnto commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

@greptileai

@greptile-apps

greptile-apps Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a cost-tracking bug in the Anthropic passthrough handler where usage.server_tool_use arrived as a raw dict instead of a ServerToolUse instance, causing downstream attribute access (usage.server_tool_use.web_search_requests) to fail silently and response_cost to stay zero.

  • Usage.__init__ now normalises any dict passed as server_tool_use into a ServerToolUse object, matching the existing pattern used for prompt_tokens_details and completion_tokens_details.
  • ServerToolUse.__getitem__ is added so that existing subscript access (usage.server_tool_use[\"web_search_requests\"]) continues to work, raising KeyError for unknown keys rather than AttributeError.
  • Three new unit tests and one integration-style passthrough test cover dict coercion, subscript semantics, model_dump round-trips, and the passthrough response_cost scenario.

Confidence Score: 5/5

The change is a narrow, additive normalisation inside Usage.init with no risk of breaking existing callers.

The fix mirrors the well-established dict-coercion pattern already used for prompt_tokens_details and completion_tokens_details, is guarded by an isinstance check, and is covered by four new tests including a round-trip model_dump test and an end-to-end passthrough cost test. No existing behaviour is removed or altered.

No files require special attention.

Important Files Changed

Filename Overview
litellm/types/utils.py Adds dict-to-ServerToolUse coercion in Usage.init and a getitem on ServerToolUse; logic is correct and consistent with existing dict-normalization patterns for prompt/completion_tokens_details.
tests/test_litellm/types/test_types_utils.py Adds three new tests covering dict coercion, subscript access, KeyError on unknown keys, and model_dump round-trips; removes unused imports.
tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py Adds a focused regression test verifying that Usage built from a dict server_tool_use is recognized by StandardBuiltInToolCostTracking; removes unused imports.
tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_anthropic_passthrough_logging_handler.py Adds an end-to-end passthrough test confirming response_cost is positive when usage contains a raw dict server_tool_use; covers the primary bug scenario.

Reviews (3): Last reviewed commit: "fix: keep server tool usage subscriptabl..." | Re-trigger Greptile

@greptile-apps

greptile-apps Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a bug where Anthropic passthrough responses could fail cost tracking because usage.server_tool_use arrived as a raw dict instead of a typed ServerToolUse instance, causing .web_search_requests attribute access to fail and response_cost to be omitted.

  • Usage.__init__ now accepts server_tool_use as Optional[Union[ServerToolUse, dict]] and coerces any dict to a ServerToolUse instance before assignment, following the same pattern already used for completion_tokens_details and prompt_tokens_details.
  • Three new test suites cover dict coercion, round-trip serialisation via model_dump, and the full passthrough logging path asserting a positive response_cost is emitted.

Confidence Score: 5/5

The change is a one-line normalisation inside Usage.init that is fully consistent with the existing dict-coercion pattern for completion_tokens_details and prompt_tokens_details; the class-level field annotation is intentionally left as Optional[ServerToolUse] since the invariant holds after init runs.

The production change is minimal and targeted, the three new test functions directly exercise the coercion, round-trip, and end-to-end passthrough cost paths, and no existing tests were weakened or removed.

No files require special attention.

Important Files Changed

Filename Overview
litellm/types/utils.py Adds dict → ServerToolUse coercion inside Usage.init before attribute assignment; consistent with the existing pattern used for completion_tokens_details and prompt_tokens_details.
tests/test_litellm/types/test_types_utils.py Adds two new unit tests covering dict coercion to ServerToolUse and round-trip serialisation via model_dump; also removes several unused imports (asyncio, json, Optional, AsyncMock, patch).
tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py Adds a focused regression test for the passthrough cost-tracking path with a dict server_tool_use; removes unused imports (json, MagicMock, TestClient, ModelInfo).
tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_anthropic_passthrough_logging_handler.py Adds an end-to-end integration test that constructs ModelResponse with a dict usage payload containing server_tool_use and asserts that response_cost is present and positive.

Reviews (2): Last reviewed commit: "fix: normalize Anthropic server tool usa..." | Re-trigger Greptile

@codecov

codecov Bot commented Jun 6, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@ririnto

ririnto commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

@greptile-apps

@Sameerlite Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_staging_080626 June 8, 2026 12:26
@Sameerlite Sameerlite merged commit 070fd5b into BerriAI:litellm_oss_staging_080626 Jun 8, 2026
71 checks passed
mateo-berri pushed a commit that referenced this pull request Jun 8, 2026
* feat(bedrock_mantle): add SigV4/IAM auth to Responses API route (fixes #29665) (#29788)

* feat(responses): add default no-op sign_request to BaseResponsesAPIConfig

* feat(responses): call sign_request after body is final, send signed bytes when signed

* feat(bedrock_mantle): add SigV4 sign_request via composed BaseAWSLLM (bearer path)

* test(bedrock_mantle): cover SigV4 access-key, AssumeRole, body bytes, region/auth consistency

* feat(bedrock_mantle): defer auth to sign_request; validate_environment no longer requires bearer

* docs(bedrock_mantle): document SigV4 + Bearer auth on Responses route

* test(responses): cover fake-stream signing order and mantle bearer arg/env precedence

* fix(bedrock_mantle): wrap all botocore credential errors with both-paths guidance

* fix(bedrock_mantle): catch specific credential errors, not all BotoCoreError, so STS transport failures are not masked

* fix(bedrock_mantle): sign the compact Responses route too, not just create

* fix(github-copilot): route per-model on /v1/responses based on model info (#29747)

* feat(focus): add GCS destination for FOCUS export (#29751)

* test: add failing tests for FocusGCSDestination

* feat: add FocusGCSDestination reusing GCSBucketBase auth

* feat: register FocusGCSDestination in factory; export from __init__

* fix(focus): preserve GCS_PATH_SERVICE_ACCOUNT when service_account_json not in config

* style: apply Black formatting to gcs_destination and tests

* style: apply Black formatting to factory.py

* fix(bedrock): omit empty additionalModelRequestFields and system from Converse API payload (#29565)

Amazon Nova Pro (and other strict Bedrock models) return 400 Malformed input
request when additionalModelRequestFields: {} or system: [] are present in the
payload. Both fields are optional in CommonRequestObject (total=False) and must
be omitted rather than sent as empty structures.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible in pass-through cost tracking (#29730)

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible

Azure OpenAI resources created via the newer "Azure AI Foundry" /
Cognitive Services pathway live on `*.cognitiveservices.azure.com`
subdomains, not the older `openai.azure.com`. Both are valid Azure
OpenAI surfaces in production today.

The OpenAI pass-through cost-tracking handler hard-codes only the older
hostname in five places (four `is_openai_*_route` methods on
OpenAIPassthroughLoggingHandler, plus is_openai_route on
PassThroughEndpointLogging). As a result, calls from newer Azure
deployments are silently classified as "not an OpenAI route", the
dispatch into the cost-tracking handler is skipped, and tokens/cost
never get extracted into LiteLLM_SpendLogs — the row gets written with
prompt_tokens=0, completion_tokens=0, spend=0, model='unknown'.

Reproduced 2026-06-04 against a real Azure OpenAI deployment on
`*.cognitiveservices.azure.com` proxied through LiteLLM v1.88.0.

Fix: factor the hostname check into a single helper
`_is_openai_compatible_host` listing all three recognized surfaces
(api.openai.com, openai.azure.com, cognitiveservices.azure.com), and
have all five call sites delegate to it. Purely additive — never
weakens recognition for the originally-supported hostnames.

Adds a test
`test_is_openai_route_recognizes_cognitiveservices_azure_com` that
exercises all four `is_openai_*_route` static methods against
`*.cognitiveservices.azure.com` URLs (positive cases per route + a
small cross-route negative to confirm route-specific path matching
still works on the new hostname).

Out of scope for this PR (separate followup):
  - `openai_passthrough_handler` calls chat/completions
    `transform_response` on Responses API payloads (`output:` not
    `choices:`), which throws inside the dispatch and drops the
    SpendLogs row entirely. Recognized + tracked separately.

* ci: trigger fresh run

Empty commit to re-run checks. The previous auth-and-jwt failure was
a transient HuggingFace Hub 429 rate-limit hitting tokenizer downloads
in tests/proxy_unit_tests/test_custom_tokenizer_bug.py — unrelated to
this PR's scope (hostname recognition in pass-through cost tracking).
No code change.

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(responses): preserve forced-function tool_choice name in Responses to Chat transform (#29812)

The Responses API forces a specific function with a top-level name
({"type": "function", "name": "X"}), but _transform_tool_choice only handled the
nested Chat Completions shape and fell through to returning "required" for the flat
form, silently dropping the function name and degrading a forced function call to
force-any-tool. Map the flat Responses shape to the nested Chat shape, keeping the
"required" fallback when no name is present.

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic (#29584)

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic

PR #20951 strips system blocks beginning with "x-anthropic-billing-header:" for
every Anthropic target. That block is how the first-party Anthropic API recognizes
Claude Code subscription (OAuth) traffic, so dropping it makes requests that carry
only that block, such as the auto-mode tool-safety classifier, fail with a
misleading 429 rate_limit_error; normal turns still work because they also carry
the "You are Claude Code" identity block.

Gate the strip behind should_strip_billing_metadata(), defaulting to False on the
first-party AnthropicConfig and AnthropicMessagesConfig so the block is kept, and
overridden to True on the providers that reach these transforms and reject the
block (Bedrock platform, Vertex, Azure for the chat path; Minimax, Azure, DeepSeek
for the messages path). Behavior for those providers is unchanged.

* Strip billing header on Bedrock invoke and Vertex messages pass-through

Two more subclasses reach the gated strip but inherited keep-by-default.
AmazonAnthropicClaudeConfig (Bedrock invoke) calls AnthropicConfig.transform_request,
which calls translate_system_message, and VertexAIPartnerModelsAnthropicMessagesConfig
(Vertex messages pass-through) calls super().transform_anthropic_messages_request.
Override should_strip_billing_metadata() to True on both.

Add a parametrized test asserting the flag for every first-party base (False) and
provider subclass (True), covering all overrides, plus a translate_system_message
regression test for the Bedrock invoke path.

* fix(cache): log hashed cache keys (#29890)

* fix(ui): save routing groups as list (#29889)

* Revert "fix(ui): save routing groups as list (#29889)" (#29928)

This reverts commit 9b1f78f.

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider (#29842)

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider

Registers parasail in the openai_like JSON provider loader with both
/v1/chat/completions and /v1/responses support. Parasail's Responses API
rejects store:true and any request that omits store, so the loader gains a
force_store_false special_handling flag; the parasail entry sets it and
the generated Responses config overrides store=false on every call. This
keeps callers from hitting "State storage not supported" and matches what
Parasail's docs require.

Adds the PARASAIL enum value, listing under openai_compatible_providers,
provider documentation at docs/my-website/docs/providers/parasail.md, and
a focused unit test file under tests/test_litellm/llms/parasail/ that
covers JSON registration, chat URL construction, Responses URL
construction with PARASAIL_API_BASE override, and the force_store_false
regression in both the caller-sent-store=true and caller-omitted cases.

* fix(parasail): register in provider_endpoints_support, drop in-repo docs

Greptile review feedback. The provider doc belongs in the litellm-docs
repo, not this one's docs/my-website tree; removing it here. Adds the
parasail entry to provider_endpoints_support.json so the
check_provider_folders_documented.py CI check passes (chat_completions
and responses true; others false).

* fix: normalize Anthropic passthrough server tool usage (#29827)

* test(anthropic): cover server_tool_use dict cost tracking

* fix: normalize Anthropic server tool usage

(cherry picked from commit 982f726)

* fix: keep server tool usage subscriptable

(cherry picked from commit 70280b9)

---------

Co-authored-by: Genmin <joey@joeyroth.com>

* fix(proxy): fix typo generic_role_mappoings -> generic_role_mappings in ui_sso.py (#29753)

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(proxy): add disable_budget_reservation general setting (#27639) (#29493)

* feat(proxy): add disable_budget_reservation general setting (#27639)

* feat(proxy): register disable_budget_reservation in ConfigGeneralSettings (#27639)

* docs(proxy): document disable_budget_reservation concurrency tradeoff (#27639)

* ci: re-trigger flaky docker build (prisma generate ECONNRESET)

* fix(proxy): warn and document budget enforcement tradeoff when disable_budget_reservation is set (#27639)

* feat(gemini_tts): adding support to Gemini TTS languageCode parameters (#29623)

* Adding support to Gemini TTS Language Code parameters

* Mapping Gemini TTS languageCode param in Docstring

* Use snake_case for language_code input keyMapping Gemini TTS languageCode param in Docstring

* Restoring files modified under enterprise/litellm_enterprise due to lint/formatting checks

---------

Co-authored-by: João Garrido <joaogarrido@google.com>

* feat(guardrails): capture user and model metadata in CrowdStrike AIDR (#29517)

* fix(proxy): require OpenAI path segment for shared Azure Cognitive Services domains

Address Greptile review: the `*.cognitiveservices.azure.com` /
`*.openai.azure.com` domains are shared by every Azure Cognitive Service
(Speech, Vision, Language, ...), so a hostname-only substring match
misclassified non-OpenAI Azure traffic as OpenAI routes.

- Replace the substring host test with suffix matching (rejects look-alike
  domains like cognitiveservices.azure.com.attacker.example).
- Add `_is_openai_compatible_url` that requires an OpenAI-style path marker
  (`/openai/` or `/v1/`) on the shared Azure domains, and use it in
  PassThroughEndpointLogging.is_openai_route (previously hostname-only).
- Add negative tests for Azure Speech/Vision paths and look-alike domains.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: support Responses input in Redis semantic cache (#29581)

* fix: support responses input in redis semantic cache

* test: cover redis semantic prompt extraction

* test: handle blank redis semantic text fallbacks

* chore: remove async cache dead statement

* test: cover redis semantic cache miss paths

* fix: filter sensitive cache lookup kwargs

* chore: rerun ci after huggingface rate limit

* chore(ui): regenerate dashboard API types (npm run gen:api)

Sync src/lib/http/schema.d.ts with the proxy OpenAPI spec: adds the
disable_budget_reservation general-settings field and picks up the
RateLimitError docstring reindent. Fixes the gen:api CI drift check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(bedrock): assert empty additionalModelRequestFields is omitted

The Converse transformer now drops an empty additionalModelRequestFields
block instead of sending it as `{}`. Update test_bedrock_top_k_param so
models without top_k support (llama3) assert the key is absent rather than
equal to an empty dict.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com>
Co-authored-by: codgician <15964984+codgician@users.noreply.github.com>
Co-authored-by: Praveen Ghuge <95286176+pghuge-cloudwiz@users.noreply.github.com>
Co-authored-by: Roi <roytev@gmail.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Liam Scott <liam@uilliam.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
Co-authored-by: Ceder Dens <cederdens@gmail.com>
Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com>
Co-authored-by: Kai Huang <kaihuang724@gmail.com>
Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com>
Co-authored-by: Genmin <joey@joeyroth.com>
Co-authored-by: Arnav Bhilwariya <arnavbhilwariya0408@gmail.com>
Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com>
Co-authored-by: João Garrido <48538534+johngarrido@users.noreply.github.com>
Co-authored-by: João Garrido <joaogarrido@google.com>
Co-authored-by: Kenan Yildirim <kenan@kenany.me>
Co-authored-by: Dávid Balatoni <balcsida@gmail.com>
utarn added a commit to utarn/litellm that referenced this pull request Jun 16, 2026
…assthrough cost crash

Anthropic passthrough cost logging builds Usage from a raw payload where
server_tool_use arrives as a plain dict. The built-in-tool cost tracker then
accessed usage.server_tool_use.web_search_requests as an attribute, raising
AttributeError: 'dict' object has no attribute 'web_search_requests' and
preventing response_cost from being recorded.

Normalize dict server_tool_use into ServerToolUse at construction time in
Usage.__init__, and add ServerToolUse.__getitem__ to preserve existing
subscript access. Mirrors upstream PR BerriAI#29827.

Co-Authored-By: Claude <noreply@anthropic.com>
michaelxer pushed a commit to michaelxer/litellm that referenced this pull request Jun 17, 2026
* feat(bedrock_mantle): add SigV4/IAM auth to Responses API route (fixes BerriAI#29665) (BerriAI#29788)

* feat(responses): add default no-op sign_request to BaseResponsesAPIConfig

* feat(responses): call sign_request after body is final, send signed bytes when signed

* feat(bedrock_mantle): add SigV4 sign_request via composed BaseAWSLLM (bearer path)

* test(bedrock_mantle): cover SigV4 access-key, AssumeRole, body bytes, region/auth consistency

* feat(bedrock_mantle): defer auth to sign_request; validate_environment no longer requires bearer

* docs(bedrock_mantle): document SigV4 + Bearer auth on Responses route

* test(responses): cover fake-stream signing order and mantle bearer arg/env precedence

* fix(bedrock_mantle): wrap all botocore credential errors with both-paths guidance

* fix(bedrock_mantle): catch specific credential errors, not all BotoCoreError, so STS transport failures are not masked

* fix(bedrock_mantle): sign the compact Responses route too, not just create

* fix(github-copilot): route per-model on /v1/responses based on model info (BerriAI#29747)

* feat(focus): add GCS destination for FOCUS export (BerriAI#29751)

* test: add failing tests for FocusGCSDestination

* feat: add FocusGCSDestination reusing GCSBucketBase auth

* feat: register FocusGCSDestination in factory; export from __init__

* fix(focus): preserve GCS_PATH_SERVICE_ACCOUNT when service_account_json not in config

* style: apply Black formatting to gcs_destination and tests

* style: apply Black formatting to factory.py

* fix(bedrock): omit empty additionalModelRequestFields and system from Converse API payload (BerriAI#29565)

Amazon Nova Pro (and other strict Bedrock models) return 400 Malformed input
request when additionalModelRequestFields: {} or system: [] are present in the
payload. Both fields are optional in CommonRequestObject (total=False) and must
be omitted rather than sent as empty structures.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible in pass-through cost tracking (BerriAI#29730)

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible

Azure OpenAI resources created via the newer "Azure AI Foundry" /
Cognitive Services pathway live on `*.cognitiveservices.azure.com`
subdomains, not the older `openai.azure.com`. Both are valid Azure
OpenAI surfaces in production today.

The OpenAI pass-through cost-tracking handler hard-codes only the older
hostname in five places (four `is_openai_*_route` methods on
OpenAIPassthroughLoggingHandler, plus is_openai_route on
PassThroughEndpointLogging). As a result, calls from newer Azure
deployments are silently classified as "not an OpenAI route", the
dispatch into the cost-tracking handler is skipped, and tokens/cost
never get extracted into LiteLLM_SpendLogs — the row gets written with
prompt_tokens=0, completion_tokens=0, spend=0, model='unknown'.

Reproduced 2026-06-04 against a real Azure OpenAI deployment on
`*.cognitiveservices.azure.com` proxied through LiteLLM v1.88.0.

Fix: factor the hostname check into a single helper
`_is_openai_compatible_host` listing all three recognized surfaces
(api.openai.com, openai.azure.com, cognitiveservices.azure.com), and
have all five call sites delegate to it. Purely additive — never
weakens recognition for the originally-supported hostnames.

Adds a test
`test_is_openai_route_recognizes_cognitiveservices_azure_com` that
exercises all four `is_openai_*_route` static methods against
`*.cognitiveservices.azure.com` URLs (positive cases per route + a
small cross-route negative to confirm route-specific path matching
still works on the new hostname).

Out of scope for this PR (separate followup):
  - `openai_passthrough_handler` calls chat/completions
    `transform_response` on Responses API payloads (`output:` not
    `choices:`), which throws inside the dispatch and drops the
    SpendLogs row entirely. Recognized + tracked separately.

* ci: trigger fresh run

Empty commit to re-run checks. The previous auth-and-jwt failure was
a transient HuggingFace Hub 429 rate-limit hitting tokenizer downloads
in tests/proxy_unit_tests/test_custom_tokenizer_bug.py — unrelated to
this PR's scope (hostname recognition in pass-through cost tracking).
No code change.

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(responses): preserve forced-function tool_choice name in Responses to Chat transform (BerriAI#29812)

The Responses API forces a specific function with a top-level name
({"type": "function", "name": "X"}), but _transform_tool_choice only handled the
nested Chat Completions shape and fell through to returning "required" for the flat
form, silently dropping the function name and degrading a forced function call to
force-any-tool. Map the flat Responses shape to the nested Chat shape, keeping the
"required" fallback when no name is present.

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic (BerriAI#29584)

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic

PR BerriAI#20951 strips system blocks beginning with "x-anthropic-billing-header:" for
every Anthropic target. That block is how the first-party Anthropic API recognizes
Claude Code subscription (OAuth) traffic, so dropping it makes requests that carry
only that block, such as the auto-mode tool-safety classifier, fail with a
misleading 429 rate_limit_error; normal turns still work because they also carry
the "You are Claude Code" identity block.

Gate the strip behind should_strip_billing_metadata(), defaulting to False on the
first-party AnthropicConfig and AnthropicMessagesConfig so the block is kept, and
overridden to True on the providers that reach these transforms and reject the
block (Bedrock platform, Vertex, Azure for the chat path; Minimax, Azure, DeepSeek
for the messages path). Behavior for those providers is unchanged.

* Strip billing header on Bedrock invoke and Vertex messages pass-through

Two more subclasses reach the gated strip but inherited keep-by-default.
AmazonAnthropicClaudeConfig (Bedrock invoke) calls AnthropicConfig.transform_request,
which calls translate_system_message, and VertexAIPartnerModelsAnthropicMessagesConfig
(Vertex messages pass-through) calls super().transform_anthropic_messages_request.
Override should_strip_billing_metadata() to True on both.

Add a parametrized test asserting the flag for every first-party base (False) and
provider subclass (True), covering all overrides, plus a translate_system_message
regression test for the Bedrock invoke path.

* fix(cache): log hashed cache keys (BerriAI#29890)

* fix(ui): save routing groups as list (BerriAI#29889)

* Revert "fix(ui): save routing groups as list (BerriAI#29889)" (BerriAI#29928)

This reverts commit 9b1f78f.

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider (BerriAI#29842)

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider

Registers parasail in the openai_like JSON provider loader with both
/v1/chat/completions and /v1/responses support. Parasail's Responses API
rejects store:true and any request that omits store, so the loader gains a
force_store_false special_handling flag; the parasail entry sets it and
the generated Responses config overrides store=false on every call. This
keeps callers from hitting "State storage not supported" and matches what
Parasail's docs require.

Adds the PARASAIL enum value, listing under openai_compatible_providers,
provider documentation at docs/my-website/docs/providers/parasail.md, and
a focused unit test file under tests/test_litellm/llms/parasail/ that
covers JSON registration, chat URL construction, Responses URL
construction with PARASAIL_API_BASE override, and the force_store_false
regression in both the caller-sent-store=true and caller-omitted cases.

* fix(parasail): register in provider_endpoints_support, drop in-repo docs

Greptile review feedback. The provider doc belongs in the litellm-docs
repo, not this one's docs/my-website tree; removing it here. Adds the
parasail entry to provider_endpoints_support.json so the
check_provider_folders_documented.py CI check passes (chat_completions
and responses true; others false).

* fix: normalize Anthropic passthrough server tool usage (BerriAI#29827)

* test(anthropic): cover server_tool_use dict cost tracking

* fix: normalize Anthropic server tool usage

(cherry picked from commit 982f726)

* fix: keep server tool usage subscriptable

(cherry picked from commit 70280b9)

---------

Co-authored-by: Genmin <joey@joeyroth.com>

* fix(proxy): fix typo generic_role_mappoings -> generic_role_mappings in ui_sso.py (BerriAI#29753)

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(proxy): add disable_budget_reservation general setting (BerriAI#27639) (BerriAI#29493)

* feat(proxy): add disable_budget_reservation general setting (BerriAI#27639)

* feat(proxy): register disable_budget_reservation in ConfigGeneralSettings (BerriAI#27639)

* docs(proxy): document disable_budget_reservation concurrency tradeoff (BerriAI#27639)

* ci: re-trigger flaky docker build (prisma generate ECONNRESET)

* fix(proxy): warn and document budget enforcement tradeoff when disable_budget_reservation is set (BerriAI#27639)

* feat(gemini_tts): adding support to Gemini TTS languageCode parameters (BerriAI#29623)

* Adding support to Gemini TTS Language Code parameters

* Mapping Gemini TTS languageCode param in Docstring

* Use snake_case for language_code input keyMapping Gemini TTS languageCode param in Docstring

* Restoring files modified under enterprise/litellm_enterprise due to lint/formatting checks

---------

Co-authored-by: João Garrido <joaogarrido@google.com>

* feat(guardrails): capture user and model metadata in CrowdStrike AIDR (BerriAI#29517)

* fix(proxy): require OpenAI path segment for shared Azure Cognitive Services domains

Address Greptile review: the `*.cognitiveservices.azure.com` /
`*.openai.azure.com` domains are shared by every Azure Cognitive Service
(Speech, Vision, Language, ...), so a hostname-only substring match
misclassified non-OpenAI Azure traffic as OpenAI routes.

- Replace the substring host test with suffix matching (rejects look-alike
  domains like cognitiveservices.azure.com.attacker.example).
- Add `_is_openai_compatible_url` that requires an OpenAI-style path marker
  (`/openai/` or `/v1/`) on the shared Azure domains, and use it in
  PassThroughEndpointLogging.is_openai_route (previously hostname-only).
- Add negative tests for Azure Speech/Vision paths and look-alike domains.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: support Responses input in Redis semantic cache (BerriAI#29581)

* fix: support responses input in redis semantic cache

* test: cover redis semantic prompt extraction

* test: handle blank redis semantic text fallbacks

* chore: remove async cache dead statement

* test: cover redis semantic cache miss paths

* fix: filter sensitive cache lookup kwargs

* chore: rerun ci after huggingface rate limit

* chore(ui): regenerate dashboard API types (npm run gen:api)

Sync src/lib/http/schema.d.ts with the proxy OpenAPI spec: adds the
disable_budget_reservation general-settings field and picks up the
RateLimitError docstring reindent. Fixes the gen:api CI drift check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(bedrock): assert empty additionalModelRequestFields is omitted

The Converse transformer now drops an empty additionalModelRequestFields
block instead of sending it as `{}`. Update test_bedrock_top_k_param so
models without top_k support (llama3) assert the key is absent rather than
equal to an empty dict.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com>
Co-authored-by: codgician <15964984+codgician@users.noreply.github.com>
Co-authored-by: Praveen Ghuge <95286176+pghuge-cloudwiz@users.noreply.github.com>
Co-authored-by: Roi <roytev@gmail.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Liam Scott <liam@uilliam.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
Co-authored-by: Ceder Dens <cederdens@gmail.com>
Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com>
Co-authored-by: Kai Huang <kaihuang724@gmail.com>
Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com>
Co-authored-by: Genmin <joey@joeyroth.com>
Co-authored-by: Arnav Bhilwariya <arnavbhilwariya0408@gmail.com>
Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com>
Co-authored-by: João Garrido <48538534+johngarrido@users.noreply.github.com>
Co-authored-by: João Garrido <joaogarrido@google.com>
Co-authored-by: Kenan Yildirim <kenan@kenany.me>
Co-authored-by: Dávid Balatoni <balcsida@gmail.com>
michaelxer pushed a commit to michaelxer/litellm that referenced this pull request Jun 17, 2026
* feat(bedrock_mantle): add SigV4/IAM auth to Responses API route (fixes BerriAI#29665) (BerriAI#29788)

* feat(responses): add default no-op sign_request to BaseResponsesAPIConfig

* feat(responses): call sign_request after body is final, send signed bytes when signed

* feat(bedrock_mantle): add SigV4 sign_request via composed BaseAWSLLM (bearer path)

* test(bedrock_mantle): cover SigV4 access-key, AssumeRole, body bytes, region/auth consistency

* feat(bedrock_mantle): defer auth to sign_request; validate_environment no longer requires bearer

* docs(bedrock_mantle): document SigV4 + Bearer auth on Responses route

* test(responses): cover fake-stream signing order and mantle bearer arg/env precedence

* fix(bedrock_mantle): wrap all botocore credential errors with both-paths guidance

* fix(bedrock_mantle): catch specific credential errors, not all BotoCoreError, so STS transport failures are not masked

* fix(bedrock_mantle): sign the compact Responses route too, not just create

* fix(github-copilot): route per-model on /v1/responses based on model info (BerriAI#29747)

* feat(focus): add GCS destination for FOCUS export (BerriAI#29751)

* test: add failing tests for FocusGCSDestination

* feat: add FocusGCSDestination reusing GCSBucketBase auth

* feat: register FocusGCSDestination in factory; export from __init__

* fix(focus): preserve GCS_PATH_SERVICE_ACCOUNT when service_account_json not in config

* style: apply Black formatting to gcs_destination and tests

* style: apply Black formatting to factory.py

* fix(bedrock): omit empty additionalModelRequestFields and system from Converse API payload (BerriAI#29565)

Amazon Nova Pro (and other strict Bedrock models) return 400 Malformed input
request when additionalModelRequestFields: {} or system: [] are present in the
payload. Both fields are optional in CommonRequestObject (total=False) and must
be omitted rather than sent as empty structures.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible in pass-through cost tracking (BerriAI#29730)

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible

Azure OpenAI resources created via the newer "Azure AI Foundry" /
Cognitive Services pathway live on `*.cognitiveservices.azure.com`
subdomains, not the older `openai.azure.com`. Both are valid Azure
OpenAI surfaces in production today.

The OpenAI pass-through cost-tracking handler hard-codes only the older
hostname in five places (four `is_openai_*_route` methods on
OpenAIPassthroughLoggingHandler, plus is_openai_route on
PassThroughEndpointLogging). As a result, calls from newer Azure
deployments are silently classified as "not an OpenAI route", the
dispatch into the cost-tracking handler is skipped, and tokens/cost
never get extracted into LiteLLM_SpendLogs — the row gets written with
prompt_tokens=0, completion_tokens=0, spend=0, model='unknown'.

Reproduced 2026-06-04 against a real Azure OpenAI deployment on
`*.cognitiveservices.azure.com` proxied through LiteLLM v1.88.0.

Fix: factor the hostname check into a single helper
`_is_openai_compatible_host` listing all three recognized surfaces
(api.openai.com, openai.azure.com, cognitiveservices.azure.com), and
have all five call sites delegate to it. Purely additive — never
weakens recognition for the originally-supported hostnames.

Adds a test
`test_is_openai_route_recognizes_cognitiveservices_azure_com` that
exercises all four `is_openai_*_route` static methods against
`*.cognitiveservices.azure.com` URLs (positive cases per route + a
small cross-route negative to confirm route-specific path matching
still works on the new hostname).

Out of scope for this PR (separate followup):
  - `openai_passthrough_handler` calls chat/completions
    `transform_response` on Responses API payloads (`output:` not
    `choices:`), which throws inside the dispatch and drops the
    SpendLogs row entirely. Recognized + tracked separately.

* ci: trigger fresh run

Empty commit to re-run checks. The previous auth-and-jwt failure was
a transient HuggingFace Hub 429 rate-limit hitting tokenizer downloads
in tests/proxy_unit_tests/test_custom_tokenizer_bug.py — unrelated to
this PR's scope (hostname recognition in pass-through cost tracking).
No code change.

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(responses): preserve forced-function tool_choice name in Responses to Chat transform (BerriAI#29812)

The Responses API forces a specific function with a top-level name
({"type": "function", "name": "X"}), but _transform_tool_choice only handled the
nested Chat Completions shape and fell through to returning "required" for the flat
form, silently dropping the function name and degrading a forced function call to
force-any-tool. Map the flat Responses shape to the nested Chat shape, keeping the
"required" fallback when no name is present.

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic (BerriAI#29584)

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic

PR BerriAI#20951 strips system blocks beginning with "x-anthropic-billing-header:" for
every Anthropic target. That block is how the first-party Anthropic API recognizes
Claude Code subscription (OAuth) traffic, so dropping it makes requests that carry
only that block, such as the auto-mode tool-safety classifier, fail with a
misleading 429 rate_limit_error; normal turns still work because they also carry
the "You are Claude Code" identity block.

Gate the strip behind should_strip_billing_metadata(), defaulting to False on the
first-party AnthropicConfig and AnthropicMessagesConfig so the block is kept, and
overridden to True on the providers that reach these transforms and reject the
block (Bedrock platform, Vertex, Azure for the chat path; Minimax, Azure, DeepSeek
for the messages path). Behavior for those providers is unchanged.

* Strip billing header on Bedrock invoke and Vertex messages pass-through

Two more subclasses reach the gated strip but inherited keep-by-default.
AmazonAnthropicClaudeConfig (Bedrock invoke) calls AnthropicConfig.transform_request,
which calls translate_system_message, and VertexAIPartnerModelsAnthropicMessagesConfig
(Vertex messages pass-through) calls super().transform_anthropic_messages_request.
Override should_strip_billing_metadata() to True on both.

Add a parametrized test asserting the flag for every first-party base (False) and
provider subclass (True), covering all overrides, plus a translate_system_message
regression test for the Bedrock invoke path.

* fix(cache): log hashed cache keys (BerriAI#29890)

* fix(ui): save routing groups as list (BerriAI#29889)

* Revert "fix(ui): save routing groups as list (BerriAI#29889)" (BerriAI#29928)

This reverts commit 9b1f78f.

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider (BerriAI#29842)

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider

Registers parasail in the openai_like JSON provider loader with both
/v1/chat/completions and /v1/responses support. Parasail's Responses API
rejects store:true and any request that omits store, so the loader gains a
force_store_false special_handling flag; the parasail entry sets it and
the generated Responses config overrides store=false on every call. This
keeps callers from hitting "State storage not supported" and matches what
Parasail's docs require.

Adds the PARASAIL enum value, listing under openai_compatible_providers,
provider documentation at docs/my-website/docs/providers/parasail.md, and
a focused unit test file under tests/test_litellm/llms/parasail/ that
covers JSON registration, chat URL construction, Responses URL
construction with PARASAIL_API_BASE override, and the force_store_false
regression in both the caller-sent-store=true and caller-omitted cases.

* fix(parasail): register in provider_endpoints_support, drop in-repo docs

Greptile review feedback. The provider doc belongs in the litellm-docs
repo, not this one's docs/my-website tree; removing it here. Adds the
parasail entry to provider_endpoints_support.json so the
check_provider_folders_documented.py CI check passes (chat_completions
and responses true; others false).

* fix: normalize Anthropic passthrough server tool usage (BerriAI#29827)

* test(anthropic): cover server_tool_use dict cost tracking

* fix: normalize Anthropic server tool usage

(cherry picked from commit 982f726)

* fix: keep server tool usage subscriptable

(cherry picked from commit 70280b9)

---------

Co-authored-by: Genmin <joey@joeyroth.com>

* fix(proxy): fix typo generic_role_mappoings -> generic_role_mappings in ui_sso.py (BerriAI#29753)

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(proxy): add disable_budget_reservation general setting (BerriAI#27639) (BerriAI#29493)

* feat(proxy): add disable_budget_reservation general setting (BerriAI#27639)

* feat(proxy): register disable_budget_reservation in ConfigGeneralSettings (BerriAI#27639)

* docs(proxy): document disable_budget_reservation concurrency tradeoff (BerriAI#27639)

* ci: re-trigger flaky docker build (prisma generate ECONNRESET)

* fix(proxy): warn and document budget enforcement tradeoff when disable_budget_reservation is set (BerriAI#27639)

* feat(gemini_tts): adding support to Gemini TTS languageCode parameters (BerriAI#29623)

* Adding support to Gemini TTS Language Code parameters

* Mapping Gemini TTS languageCode param in Docstring

* Use snake_case for language_code input keyMapping Gemini TTS languageCode param in Docstring

* Restoring files modified under enterprise/litellm_enterprise due to lint/formatting checks

---------

Co-authored-by: João Garrido <joaogarrido@google.com>

* feat(guardrails): capture user and model metadata in CrowdStrike AIDR (BerriAI#29517)

* fix(proxy): require OpenAI path segment for shared Azure Cognitive Services domains

Address Greptile review: the `*.cognitiveservices.azure.com` /
`*.openai.azure.com` domains are shared by every Azure Cognitive Service
(Speech, Vision, Language, ...), so a hostname-only substring match
misclassified non-OpenAI Azure traffic as OpenAI routes.

- Replace the substring host test with suffix matching (rejects look-alike
  domains like cognitiveservices.azure.com.attacker.example).
- Add `_is_openai_compatible_url` that requires an OpenAI-style path marker
  (`/openai/` or `/v1/`) on the shared Azure domains, and use it in
  PassThroughEndpointLogging.is_openai_route (previously hostname-only).
- Add negative tests for Azure Speech/Vision paths and look-alike domains.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: support Responses input in Redis semantic cache (BerriAI#29581)

* fix: support responses input in redis semantic cache

* test: cover redis semantic prompt extraction

* test: handle blank redis semantic text fallbacks

* chore: remove async cache dead statement

* test: cover redis semantic cache miss paths

* fix: filter sensitive cache lookup kwargs

* chore: rerun ci after huggingface rate limit

* chore(ui): regenerate dashboard API types (npm run gen:api)

Sync src/lib/http/schema.d.ts with the proxy OpenAPI spec: adds the
disable_budget_reservation general-settings field and picks up the
RateLimitError docstring reindent. Fixes the gen:api CI drift check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(bedrock): assert empty additionalModelRequestFields is omitted

The Converse transformer now drops an empty additionalModelRequestFields
block instead of sending it as `{}`. Update test_bedrock_top_k_param so
models without top_k support (llama3) assert the key is absent rather than
equal to an empty dict.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com>
Co-authored-by: codgician <15964984+codgician@users.noreply.github.com>
Co-authored-by: Praveen Ghuge <95286176+pghuge-cloudwiz@users.noreply.github.com>
Co-authored-by: Roi <roytev@gmail.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Liam Scott <liam@uilliam.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
Co-authored-by: Ceder Dens <cederdens@gmail.com>
Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com>
Co-authored-by: Kai Huang <kaihuang724@gmail.com>
Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com>
Co-authored-by: Genmin <joey@joeyroth.com>
Co-authored-by: Arnav Bhilwariya <arnavbhilwariya0408@gmail.com>
Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com>
Co-authored-by: João Garrido <48538534+johngarrido@users.noreply.github.com>
Co-authored-by: João Garrido <joaogarrido@google.com>
Co-authored-by: Kenan Yildirim <kenan@kenany.me>
Co-authored-by: Dávid Balatoni <balcsida@gmail.com>
koladefaj pushed a commit to koladefaj/litellm that referenced this pull request Jun 17, 2026
* feat(bedrock_mantle): add SigV4/IAM auth to Responses API route (fixes BerriAI#29665) (BerriAI#29788)

* feat(responses): add default no-op sign_request to BaseResponsesAPIConfig

* feat(responses): call sign_request after body is final, send signed bytes when signed

* feat(bedrock_mantle): add SigV4 sign_request via composed BaseAWSLLM (bearer path)

* test(bedrock_mantle): cover SigV4 access-key, AssumeRole, body bytes, region/auth consistency

* feat(bedrock_mantle): defer auth to sign_request; validate_environment no longer requires bearer

* docs(bedrock_mantle): document SigV4 + Bearer auth on Responses route

* test(responses): cover fake-stream signing order and mantle bearer arg/env precedence

* fix(bedrock_mantle): wrap all botocore credential errors with both-paths guidance

* fix(bedrock_mantle): catch specific credential errors, not all BotoCoreError, so STS transport failures are not masked

* fix(bedrock_mantle): sign the compact Responses route too, not just create

* fix(github-copilot): route per-model on /v1/responses based on model info (BerriAI#29747)

* feat(focus): add GCS destination for FOCUS export (BerriAI#29751)

* test: add failing tests for FocusGCSDestination

* feat: add FocusGCSDestination reusing GCSBucketBase auth

* feat: register FocusGCSDestination in factory; export from __init__

* fix(focus): preserve GCS_PATH_SERVICE_ACCOUNT when service_account_json not in config

* style: apply Black formatting to gcs_destination and tests

* style: apply Black formatting to factory.py

* fix(bedrock): omit empty additionalModelRequestFields and system from Converse API payload (BerriAI#29565)

Amazon Nova Pro (and other strict Bedrock models) return 400 Malformed input
request when additionalModelRequestFields: {} or system: [] are present in the
payload. Both fields are optional in CommonRequestObject (total=False) and must
be omitted rather than sent as empty structures.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible in pass-through cost tracking (BerriAI#29730)

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible

Azure OpenAI resources created via the newer "Azure AI Foundry" /
Cognitive Services pathway live on `*.cognitiveservices.azure.com`
subdomains, not the older `openai.azure.com`. Both are valid Azure
OpenAI surfaces in production today.

The OpenAI pass-through cost-tracking handler hard-codes only the older
hostname in five places (four `is_openai_*_route` methods on
OpenAIPassthroughLoggingHandler, plus is_openai_route on
PassThroughEndpointLogging). As a result, calls from newer Azure
deployments are silently classified as "not an OpenAI route", the
dispatch into the cost-tracking handler is skipped, and tokens/cost
never get extracted into LiteLLM_SpendLogs — the row gets written with
prompt_tokens=0, completion_tokens=0, spend=0, model='unknown'.

Reproduced 2026-06-04 against a real Azure OpenAI deployment on
`*.cognitiveservices.azure.com` proxied through LiteLLM v1.88.0.

Fix: factor the hostname check into a single helper
`_is_openai_compatible_host` listing all three recognized surfaces
(api.openai.com, openai.azure.com, cognitiveservices.azure.com), and
have all five call sites delegate to it. Purely additive — never
weakens recognition for the originally-supported hostnames.

Adds a test
`test_is_openai_route_recognizes_cognitiveservices_azure_com` that
exercises all four `is_openai_*_route` static methods against
`*.cognitiveservices.azure.com` URLs (positive cases per route + a
small cross-route negative to confirm route-specific path matching
still works on the new hostname).

Out of scope for this PR (separate followup):
  - `openai_passthrough_handler` calls chat/completions
    `transform_response` on Responses API payloads (`output:` not
    `choices:`), which throws inside the dispatch and drops the
    SpendLogs row entirely. Recognized + tracked separately.

* ci: trigger fresh run

Empty commit to re-run checks. The previous auth-and-jwt failure was
a transient HuggingFace Hub 429 rate-limit hitting tokenizer downloads
in tests/proxy_unit_tests/test_custom_tokenizer_bug.py — unrelated to
this PR's scope (hostname recognition in pass-through cost tracking).
No code change.

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(responses): preserve forced-function tool_choice name in Responses to Chat transform (BerriAI#29812)

The Responses API forces a specific function with a top-level name
({"type": "function", "name": "X"}), but _transform_tool_choice only handled the
nested Chat Completions shape and fell through to returning "required" for the flat
form, silently dropping the function name and degrading a forced function call to
force-any-tool. Map the flat Responses shape to the nested Chat shape, keeping the
"required" fallback when no name is present.

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic (BerriAI#29584)

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic

PR BerriAI#20951 strips system blocks beginning with "x-anthropic-billing-header:" for
every Anthropic target. That block is how the first-party Anthropic API recognizes
Claude Code subscription (OAuth) traffic, so dropping it makes requests that carry
only that block, such as the auto-mode tool-safety classifier, fail with a
misleading 429 rate_limit_error; normal turns still work because they also carry
the "You are Claude Code" identity block.

Gate the strip behind should_strip_billing_metadata(), defaulting to False on the
first-party AnthropicConfig and AnthropicMessagesConfig so the block is kept, and
overridden to True on the providers that reach these transforms and reject the
block (Bedrock platform, Vertex, Azure for the chat path; Minimax, Azure, DeepSeek
for the messages path). Behavior for those providers is unchanged.

* Strip billing header on Bedrock invoke and Vertex messages pass-through

Two more subclasses reach the gated strip but inherited keep-by-default.
AmazonAnthropicClaudeConfig (Bedrock invoke) calls AnthropicConfig.transform_request,
which calls translate_system_message, and VertexAIPartnerModelsAnthropicMessagesConfig
(Vertex messages pass-through) calls super().transform_anthropic_messages_request.
Override should_strip_billing_metadata() to True on both.

Add a parametrized test asserting the flag for every first-party base (False) and
provider subclass (True), covering all overrides, plus a translate_system_message
regression test for the Bedrock invoke path.

* fix(cache): log hashed cache keys (BerriAI#29890)

* fix(ui): save routing groups as list (BerriAI#29889)

* Revert "fix(ui): save routing groups as list (BerriAI#29889)" (BerriAI#29928)

This reverts commit 9b1f78f.

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider (BerriAI#29842)

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider

Registers parasail in the openai_like JSON provider loader with both
/v1/chat/completions and /v1/responses support. Parasail's Responses API
rejects store:true and any request that omits store, so the loader gains a
force_store_false special_handling flag; the parasail entry sets it and
the generated Responses config overrides store=false on every call. This
keeps callers from hitting "State storage not supported" and matches what
Parasail's docs require.

Adds the PARASAIL enum value, listing under openai_compatible_providers,
provider documentation at docs/my-website/docs/providers/parasail.md, and
a focused unit test file under tests/test_litellm/llms/parasail/ that
covers JSON registration, chat URL construction, Responses URL
construction with PARASAIL_API_BASE override, and the force_store_false
regression in both the caller-sent-store=true and caller-omitted cases.

* fix(parasail): register in provider_endpoints_support, drop in-repo docs

Greptile review feedback. The provider doc belongs in the litellm-docs
repo, not this one's docs/my-website tree; removing it here. Adds the
parasail entry to provider_endpoints_support.json so the
check_provider_folders_documented.py CI check passes (chat_completions
and responses true; others false).

* fix: normalize Anthropic passthrough server tool usage (BerriAI#29827)

* test(anthropic): cover server_tool_use dict cost tracking

* fix: normalize Anthropic server tool usage

(cherry picked from commit 982f726)

* fix: keep server tool usage subscriptable

(cherry picked from commit 70280b9)

---------

Co-authored-by: Genmin <joey@joeyroth.com>

* fix(proxy): fix typo generic_role_mappoings -> generic_role_mappings in ui_sso.py (BerriAI#29753)

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(proxy): add disable_budget_reservation general setting (BerriAI#27639) (BerriAI#29493)

* feat(proxy): add disable_budget_reservation general setting (BerriAI#27639)

* feat(proxy): register disable_budget_reservation in ConfigGeneralSettings (BerriAI#27639)

* docs(proxy): document disable_budget_reservation concurrency tradeoff (BerriAI#27639)

* ci: re-trigger flaky docker build (prisma generate ECONNRESET)

* fix(proxy): warn and document budget enforcement tradeoff when disable_budget_reservation is set (BerriAI#27639)

* feat(gemini_tts): adding support to Gemini TTS languageCode parameters (BerriAI#29623)

* Adding support to Gemini TTS Language Code parameters

* Mapping Gemini TTS languageCode param in Docstring

* Use snake_case for language_code input keyMapping Gemini TTS languageCode param in Docstring

* Restoring files modified under enterprise/litellm_enterprise due to lint/formatting checks

---------

Co-authored-by: João Garrido <joaogarrido@google.com>

* feat(guardrails): capture user and model metadata in CrowdStrike AIDR (BerriAI#29517)

* fix(proxy): require OpenAI path segment for shared Azure Cognitive Services domains

Address Greptile review: the `*.cognitiveservices.azure.com` /
`*.openai.azure.com` domains are shared by every Azure Cognitive Service
(Speech, Vision, Language, ...), so a hostname-only substring match
misclassified non-OpenAI Azure traffic as OpenAI routes.

- Replace the substring host test with suffix matching (rejects look-alike
  domains like cognitiveservices.azure.com.attacker.example).
- Add `_is_openai_compatible_url` that requires an OpenAI-style path marker
  (`/openai/` or `/v1/`) on the shared Azure domains, and use it in
  PassThroughEndpointLogging.is_openai_route (previously hostname-only).
- Add negative tests for Azure Speech/Vision paths and look-alike domains.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: support Responses input in Redis semantic cache (BerriAI#29581)

* fix: support responses input in redis semantic cache

* test: cover redis semantic prompt extraction

* test: handle blank redis semantic text fallbacks

* chore: remove async cache dead statement

* test: cover redis semantic cache miss paths

* fix: filter sensitive cache lookup kwargs

* chore: rerun ci after huggingface rate limit

* chore(ui): regenerate dashboard API types (npm run gen:api)

Sync src/lib/http/schema.d.ts with the proxy OpenAPI spec: adds the
disable_budget_reservation general-settings field and picks up the
RateLimitError docstring reindent. Fixes the gen:api CI drift check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(bedrock): assert empty additionalModelRequestFields is omitted

The Converse transformer now drops an empty additionalModelRequestFields
block instead of sending it as `{}`. Update test_bedrock_top_k_param so
models without top_k support (llama3) assert the key is absent rather than
equal to an empty dict.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com>
Co-authored-by: codgician <15964984+codgician@users.noreply.github.com>
Co-authored-by: Praveen Ghuge <95286176+pghuge-cloudwiz@users.noreply.github.com>
Co-authored-by: Roi <roytev@gmail.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Liam Scott <liam@uilliam.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
Co-authored-by: Ceder Dens <cederdens@gmail.com>
Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com>
Co-authored-by: Kai Huang <kaihuang724@gmail.com>
Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com>
Co-authored-by: Genmin <joey@joeyroth.com>
Co-authored-by: Arnav Bhilwariya <arnavbhilwariya0408@gmail.com>
Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com>
Co-authored-by: João Garrido <48538534+johngarrido@users.noreply.github.com>
Co-authored-by: João Garrido <joaogarrido@google.com>
Co-authored-by: Kenan Yildirim <kenan@kenany.me>
Co-authored-by: Dávid Balatoni <balcsida@gmail.com>
factnn pushed a commit to factnn/litellm that referenced this pull request Jun 18, 2026
* feat(bedrock_mantle): add SigV4/IAM auth to Responses API route (fixes BerriAI#29665) (BerriAI#29788)

* feat(responses): add default no-op sign_request to BaseResponsesAPIConfig

* feat(responses): call sign_request after body is final, send signed bytes when signed

* feat(bedrock_mantle): add SigV4 sign_request via composed BaseAWSLLM (bearer path)

* test(bedrock_mantle): cover SigV4 access-key, AssumeRole, body bytes, region/auth consistency

* feat(bedrock_mantle): defer auth to sign_request; validate_environment no longer requires bearer

* docs(bedrock_mantle): document SigV4 + Bearer auth on Responses route

* test(responses): cover fake-stream signing order and mantle bearer arg/env precedence

* fix(bedrock_mantle): wrap all botocore credential errors with both-paths guidance

* fix(bedrock_mantle): catch specific credential errors, not all BotoCoreError, so STS transport failures are not masked

* fix(bedrock_mantle): sign the compact Responses route too, not just create

* fix(github-copilot): route per-model on /v1/responses based on model info (BerriAI#29747)

* feat(focus): add GCS destination for FOCUS export (BerriAI#29751)

* test: add failing tests for FocusGCSDestination

* feat: add FocusGCSDestination reusing GCSBucketBase auth

* feat: register FocusGCSDestination in factory; export from __init__

* fix(focus): preserve GCS_PATH_SERVICE_ACCOUNT when service_account_json not in config

* style: apply Black formatting to gcs_destination and tests

* style: apply Black formatting to factory.py

* fix(bedrock): omit empty additionalModelRequestFields and system from Converse API payload (BerriAI#29565)

Amazon Nova Pro (and other strict Bedrock models) return 400 Malformed input
request when additionalModelRequestFields: {} or system: [] are present in the
payload. Both fields are optional in CommonRequestObject (total=False) and must
be omitted rather than sent as empty structures.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible in pass-through cost tracking (BerriAI#29730)

* fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible

Azure OpenAI resources created via the newer "Azure AI Foundry" /
Cognitive Services pathway live on `*.cognitiveservices.azure.com`
subdomains, not the older `openai.azure.com`. Both are valid Azure
OpenAI surfaces in production today.

The OpenAI pass-through cost-tracking handler hard-codes only the older
hostname in five places (four `is_openai_*_route` methods on
OpenAIPassthroughLoggingHandler, plus is_openai_route on
PassThroughEndpointLogging). As a result, calls from newer Azure
deployments are silently classified as "not an OpenAI route", the
dispatch into the cost-tracking handler is skipped, and tokens/cost
never get extracted into LiteLLM_SpendLogs — the row gets written with
prompt_tokens=0, completion_tokens=0, spend=0, model='unknown'.

Reproduced 2026-06-04 against a real Azure OpenAI deployment on
`*.cognitiveservices.azure.com` proxied through LiteLLM v1.88.0.

Fix: factor the hostname check into a single helper
`_is_openai_compatible_host` listing all three recognized surfaces
(api.openai.com, openai.azure.com, cognitiveservices.azure.com), and
have all five call sites delegate to it. Purely additive — never
weakens recognition for the originally-supported hostnames.

Adds a test
`test_is_openai_route_recognizes_cognitiveservices_azure_com` that
exercises all four `is_openai_*_route` static methods against
`*.cognitiveservices.azure.com` URLs (positive cases per route + a
small cross-route negative to confirm route-specific path matching
still works on the new hostname).

Out of scope for this PR (separate followup):
  - `openai_passthrough_handler` calls chat/completions
    `transform_response` on Responses API payloads (`output:` not
    `choices:`), which throws inside the dispatch and drops the
    SpendLogs row entirely. Recognized + tracked separately.

* ci: trigger fresh run

Empty commit to re-run checks. The previous auth-and-jwt failure was
a transient HuggingFace Hub 429 rate-limit hitting tokenizer downloads
in tests/proxy_unit_tests/test_custom_tokenizer_bug.py — unrelated to
this PR's scope (hostname recognition in pass-through cost tracking).
No code change.

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(responses): preserve forced-function tool_choice name in Responses to Chat transform (BerriAI#29812)

The Responses API forces a specific function with a top-level name
({"type": "function", "name": "X"}), but _transform_tool_choice only handled the
nested Chat Completions shape and fell through to returning "required" for the flat
form, silently dropping the function name and degrading a forced function call to
force-any-tool. Map the flat Responses shape to the nested Chat shape, keeping the
"required" fallback when no name is present.

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic (BerriAI#29584)

* Preserve x-anthropic-billing-header system blocks for first-party Anthropic

PR BerriAI#20951 strips system blocks beginning with "x-anthropic-billing-header:" for
every Anthropic target. That block is how the first-party Anthropic API recognizes
Claude Code subscription (OAuth) traffic, so dropping it makes requests that carry
only that block, such as the auto-mode tool-safety classifier, fail with a
misleading 429 rate_limit_error; normal turns still work because they also carry
the "You are Claude Code" identity block.

Gate the strip behind should_strip_billing_metadata(), defaulting to False on the
first-party AnthropicConfig and AnthropicMessagesConfig so the block is kept, and
overridden to True on the providers that reach these transforms and reject the
block (Bedrock platform, Vertex, Azure for the chat path; Minimax, Azure, DeepSeek
for the messages path). Behavior for those providers is unchanged.

* Strip billing header on Bedrock invoke and Vertex messages pass-through

Two more subclasses reach the gated strip but inherited keep-by-default.
AmazonAnthropicClaudeConfig (Bedrock invoke) calls AnthropicConfig.transform_request,
which calls translate_system_message, and VertexAIPartnerModelsAnthropicMessagesConfig
(Vertex messages pass-through) calls super().transform_anthropic_messages_request.
Override should_strip_billing_metadata() to True on both.

Add a parametrized test asserting the flag for every first-party base (False) and
provider subclass (True), covering all overrides, plus a translate_system_message
regression test for the Bedrock invoke path.

* fix(cache): log hashed cache keys (BerriAI#29890)

* fix(ui): save routing groups as list (BerriAI#29889)

* Revert "fix(ui): save routing groups as list (BerriAI#29889)" (BerriAI#29928)

This reverts commit 9b1f78f.

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider (BerriAI#29842)

* feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider

Registers parasail in the openai_like JSON provider loader with both
/v1/chat/completions and /v1/responses support. Parasail's Responses API
rejects store:true and any request that omits store, so the loader gains a
force_store_false special_handling flag; the parasail entry sets it and
the generated Responses config overrides store=false on every call. This
keeps callers from hitting "State storage not supported" and matches what
Parasail's docs require.

Adds the PARASAIL enum value, listing under openai_compatible_providers,
provider documentation at docs/my-website/docs/providers/parasail.md, and
a focused unit test file under tests/test_litellm/llms/parasail/ that
covers JSON registration, chat URL construction, Responses URL
construction with PARASAIL_API_BASE override, and the force_store_false
regression in both the caller-sent-store=true and caller-omitted cases.

* fix(parasail): register in provider_endpoints_support, drop in-repo docs

Greptile review feedback. The provider doc belongs in the litellm-docs
repo, not this one's docs/my-website tree; removing it here. Adds the
parasail entry to provider_endpoints_support.json so the
check_provider_folders_documented.py CI check passes (chat_completions
and responses true; others false).

* fix: normalize Anthropic passthrough server tool usage (BerriAI#29827)

* test(anthropic): cover server_tool_use dict cost tracking

* fix: normalize Anthropic server tool usage

(cherry picked from commit 982f726)

* fix: keep server tool usage subscriptable

(cherry picked from commit 70280b9)

---------

Co-authored-by: Genmin <joey@joeyroth.com>

* fix(proxy): fix typo generic_role_mappoings -> generic_role_mappings in ui_sso.py (BerriAI#29753)

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(proxy): add disable_budget_reservation general setting (BerriAI#27639) (BerriAI#29493)

* feat(proxy): add disable_budget_reservation general setting (BerriAI#27639)

* feat(proxy): register disable_budget_reservation in ConfigGeneralSettings (BerriAI#27639)

* docs(proxy): document disable_budget_reservation concurrency tradeoff (BerriAI#27639)

* ci: re-trigger flaky docker build (prisma generate ECONNRESET)

* fix(proxy): warn and document budget enforcement tradeoff when disable_budget_reservation is set (BerriAI#27639)

* feat(gemini_tts): adding support to Gemini TTS languageCode parameters (BerriAI#29623)

* Adding support to Gemini TTS Language Code parameters

* Mapping Gemini TTS languageCode param in Docstring

* Use snake_case for language_code input keyMapping Gemini TTS languageCode param in Docstring

* Restoring files modified under enterprise/litellm_enterprise due to lint/formatting checks

---------

Co-authored-by: João Garrido <joaogarrido@google.com>

* feat(guardrails): capture user and model metadata in CrowdStrike AIDR (BerriAI#29517)

* fix(proxy): require OpenAI path segment for shared Azure Cognitive Services domains

Address Greptile review: the `*.cognitiveservices.azure.com` /
`*.openai.azure.com` domains are shared by every Azure Cognitive Service
(Speech, Vision, Language, ...), so a hostname-only substring match
misclassified non-OpenAI Azure traffic as OpenAI routes.

- Replace the substring host test with suffix matching (rejects look-alike
  domains like cognitiveservices.azure.com.attacker.example).
- Add `_is_openai_compatible_url` that requires an OpenAI-style path marker
  (`/openai/` or `/v1/`) on the shared Azure domains, and use it in
  PassThroughEndpointLogging.is_openai_route (previously hostname-only).
- Add negative tests for Azure Speech/Vision paths and look-alike domains.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: support Responses input in Redis semantic cache (BerriAI#29581)

* fix: support responses input in redis semantic cache

* test: cover redis semantic prompt extraction

* test: handle blank redis semantic text fallbacks

* chore: remove async cache dead statement

* test: cover redis semantic cache miss paths

* fix: filter sensitive cache lookup kwargs

* chore: rerun ci after huggingface rate limit

* chore(ui): regenerate dashboard API types (npm run gen:api)

Sync src/lib/http/schema.d.ts with the proxy OpenAPI spec: adds the
disable_budget_reservation general-settings field and picks up the
RateLimitError docstring reindent. Fixes the gen:api CI drift check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(bedrock): assert empty additionalModelRequestFields is omitted

The Converse transformer now drops an empty additionalModelRequestFields
block instead of sending it as `{}`. Update test_bedrock_top_k_param so
models without top_k support (llama3) assert the key is absent rather than
equal to an empty dict.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com>
Co-authored-by: codgician <15964984+codgician@users.noreply.github.com>
Co-authored-by: Praveen Ghuge <95286176+pghuge-cloudwiz@users.noreply.github.com>
Co-authored-by: Roi <roytev@gmail.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Liam Scott <liam@uilliam.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
Co-authored-by: Ceder Dens <cederdens@gmail.com>
Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com>
Co-authored-by: Kai Huang <kaihuang724@gmail.com>
Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com>
Co-authored-by: Genmin <joey@joeyroth.com>
Co-authored-by: Arnav Bhilwariya <arnavbhilwariya0408@gmail.com>
Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com>
Co-authored-by: João Garrido <48538534+johngarrido@users.noreply.github.com>
Co-authored-by: João Garrido <joaogarrido@google.com>
Co-authored-by: Kenan Yildirim <kenan@kenany.me>
Co-authored-by: Dávid Balatoni <balcsida@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Anthropic passthrough: server_tool_use parsed as dict instead of ServerToolUse in usage

3 participants