Skip to content

fix(thinking): handle None thinking param in is_thinking_enabled#28598

Merged
oss-pr-review-agent-shin[bot] merged 4 commits into
BerriAI:shin_agent_oss_staging_05_22_2026from
Terrajlz:fix/is-thinking-enabled-none-crash
May 22, 2026
Merged

fix(thinking): handle None thinking param in is_thinking_enabled#28598
oss-pr-review-agent-shin[bot] merged 4 commits into
BerriAI:shin_agent_oss_staging_05_22_2026from
Terrajlz:fix/is-thinking-enabled-none-crash

Conversation

@Terrajlz

Copy link
Copy Markdown
Contributor

Summary

  • Fix AttributeError: 'NoneType' object has no attribute 'get' when non_default_params["thinking"] is explicitly None
  • Change non_default_params.get("thinking", {}).get("type") to (non_default_params.get("thinking") or {}).get("type") — handles both missing key AND explicit None
  • Add unit tests covering: None, {"type": "enabled"}, missing key, empty dict, and {"type": "disabled"}

Fixes #28576

Test plan

  • py -3.12 -m pytest tests/test_litellm/test_thinking_enabled.py -v — all 5 tests pass
  • Verify with Anthropic-backed model via proxy with thinking: null in request

🤖 Generated with Claude Code + Cynthia

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

✅ yuneng-berri
❌ Cynthia Code
❌ shin-berri


Cynthia Code seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps

greptile-apps Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes an AttributeError crash in is_thinking_enabled when the thinking key exists in non_default_params with an explicit None value — a case not handled by the previous .get("thinking", {}) pattern, which only substitutes the default when the key is absent.

  • transformation.py: Replaces non_default_params.get("thinking", {}).get("type") with (non_default_params.get("thinking") or {}).get("type"), correctly short-circuiting any falsy value (None, False, 0, "") to an empty dict before calling .get("type").
  • test_thinking_enabled.py: Adds 9 parameterized unit tests covering None, {"type": "enabled"}, missing key, empty dict, {"type": "disabled"}, reasoning_effort, combined params, and other falsy values — all exercised without real network calls.

Confidence Score: 5/5

Safe to merge — the change is a minimal, targeted guard for a well-understood crash path with no behavioral changes for existing valid inputs.

The one-line fix correctly handles the None case without touching any other logic, and the new tests cover the full range of falsy inputs as well as the happy path. No existing tests were modified, no network calls are made in the test suite, and the change is confined to a single non-critical utility method.

No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/base_llm/chat/transformation.py One-line fix: replaces .get("thinking", {}) with (... or {}) so an explicit None value no longer causes AttributeError
tests/test_litellm/test_thinking_enabled.py New unit test file covering 9 parameterized cases (None, enabled, missing, empty dict, disabled, reasoning_effort, combined, and falsy values); no real network calls

Reviews (1): Last reviewed commit: "fix(thinking): handle None thinking para..." | Re-trigger Greptile

@codecov

codecov Bot commented May 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@oss-pr-review-agent-shin oss-pr-review-agent-shin Bot changed the base branch from litellm_internal_staging to shin_agent_oss_staging_05_22_2026 May 22, 2026 13:45
@oss-pr-review-agent-shin oss-pr-review-agent-shin Bot merged commit a5939c8 into BerriAI:shin_agent_oss_staging_05_22_2026 May 22, 2026
43 of 44 checks passed
@oss-pr-review-agent-shin

Copy link
Copy Markdown
Contributor

🤖 litellm-agent: Squash-merged into staging branch shin_agent_oss_staging_05_22_2026. Staging PR: #28601


Triage Summary
Guards is_thinking_enabled in BaseConfig against an AttributeError crash when non_default_params["thinking"] is explicitly None. Replaces .get("thinking", {}) with (... or {}) so any falsy value short-circuits to an empty dict before .get("type") is called. Adds a new test file with 9 parameterized cases covering None, enabled, missing key, empty dict, disabled, reasoning_effort, and other falsy values.

Merge Confidence: 5/5 ✅ READY
⚠️ 1 check failing: lint
Ready to ship.

Greptile 5/5, no blocking pattern findings, no CircleCI runs (OSS-typical). 1 check failing but unrelated to this diff: lint. 1 check also red on neighboring PRs (lint) — infra-wide noise, no penalty.

Sameerlite pushed a commit that referenced this pull request May 25, 2026
)

Squash-merged by litellm-agent from Terrajlz's PR.
mateo-berri added a commit that referenced this pull request May 26, 2026
* fix(mcp): handle OAuth IdP error responses in /callback (LIT-2750)

Per RFC 6749 section 4.1.2.1, when the IdP rejects an OAuth authorization
request it redirects back to the client with ?error=...&error_description=...
and no code. The MCP /callback handler declared code and state as required
query params, so FastAPI rejected such error responses with a 422 before
the handler ran -- stranding the MCP client waiting on the loopback.

This change:
- Makes code and state optional and accepts the RFC-defined error,
  error_description, and error_uri params.
- When state decodes to a trusted client redirect_uri, propagates the
  error params back to that URI with the client's original (un-wrapped)
  state preserved, so the client's OAuth library can surface the failure.
- When state is missing/undecryptable or the encoded redirect_uri is no
  longer trusted, renders a 400 HTML page with the (HTML-escaped) error
  details instead of leaking to an attacker-controlled redirect.
- Preserves the existing success path (code + state -> 302 to validated
  client redirect_uri with original state).

Fixes LIT-2750.

* test(mcp): regression tests for /callback handling IdP error responses (LIT-2750)

Adds a new test module covering the LIT-2750 fix: the MCP OAuth /callback
endpoint must accept IdP error responses (e.g. ?error=access_denied) per
RFC 6749 section 4.1.2.1 instead of returning a 422 because ``code`` is missing.

Coverage:
- IdP error with no state -> 400 HTML page surfacing the error.
- HTML escaping of user-controlled error / error_description fields.
- IdP error with a trusted (loopback) state -> 302 propagating
  error / error_description / original client state to the client.
- IdP error with an untrusted redirect_uri encoded in state -> 400 inline
  (no open-redirect to attacker-controlled origin).
- IdP error with an undecryptable state -> 400 HTML fallback.
- Bare GET /callback with no params -> 400 HTML (not Pydantic 422).
- Success path (code + state) still 302 to validated client redirect_uri
  with the original (un-wrapped) state preserved.

* refactor(mcp): drop unused _OAUTH_ERROR_PARAMS constant (Greptile P2)

The tuple was leftover scaffolding from an earlier draft of the LIT-2750
fix; nothing references it. The explanatory RFC 6749 §4.1.2.1 comment block
above the callback handler covers the same intent.

* fix(mcp/oauth): preserve empty original_state and clarify missing-param error in /callback

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* fix: apply black formatting to base_llm chat transformation

Fix CI black --check failure on is_thinking_enabled return formatting.

Co-authored-by: Cursor <cursoragent@cursor.com>

* merge main (#28836)

* fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body (#27526)

* Fix Bedrock KB pass-through SigV4 headers and signed body

Coerce botocore HeadersDict to a dict for pass-through routes. When
forward_headers is true, drop request headers that collide case-insensitively
with signed headers so client Bearer auth does not shadow AWS SigV4.
Send prepped.body as raw content so the outbound payload matches the
signature after logging hooks mutate the parsed dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Simplify pass-through raw body handling

Read the SigV4-signed bytes directly from request.state inside
pass_through_request instead of threading a custom_raw_body argument
through three functions. Helper methods are restored to their original
signatures, and the new branch lives in one place at each httpx call site.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Harden pass-through raw body read from request.state

Guard missing request.state (test fixtures) and ignore non-bytes/str
values so MagicMock does not trigger the SigV4 raw-body path.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Test pass_through_request state_raw_body uses httpx content=

Cover non-streaming (async_client.request) and streaming (build_request)
paths so SigV4 bytes on request.state are not replaced by json= of a
hook-mutated dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728)

* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214

The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).

Changes:
  - Replace 26 hardcoded references to 888602223428 with 941277531214 across
    8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
    ARNs, batch execution role ARN, and example proxy config).
  - The provisioned-model and imported-model ARNs are referenced only from
    mocked unit tests — no AWS resources to recreate.
  - The batch execution IAM role has been recreated in the new account with
    the same name and equivalent permissions.
  - The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
    hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
    under the same names — see tools/agentcore-deploy/ in a follow-up.

CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.

Smoke-tested locally against the new account:
  aws bedrock-runtime converse --region us-west-2 \
    --model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
    --messages '[{"role":"user","content":[{"text":"ping"}]}]'
  → 200, model returned 'pong'

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes

The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).

Deployed runtimes:
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy

Both runtimes are status=READY and pass a smoke invoke:
  $ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
  → 200, {"result": "echo: ping"}

The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): point Bedrock batch tests at new-account S3 bucket

The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.

Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): point live S3 logging test at new-account bucket

Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.

Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails

The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
  - wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
    with explicit inputAction=ANONYMIZE so masking applies to INPUT,
    which is the source litellm's moderation hook sends)
  - ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
    to the exact string the tests assert on)

Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): migrate legacy models to current inference profiles

The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
  - anthropic.claude-3-sonnet-20240229    -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
  - anthropic.claude-3-haiku-20240307     -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).

cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources

These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
  - SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
    -> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
  - Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
    vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)

claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.

Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): swap/skip legacy-gated models unavailable on new CI account

The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:

- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
  legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
  authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
  us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
  active us.anthropic.claude-sonnet-4-5 inference profile.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account

- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
  is not authorized on account 941277531214) and migrate the missed
  s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
  us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
  output e2e test.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)

Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
  instead of skipping, so the missing entitlement stays visible in CI; they
  still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
  batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
  transform + cost-tracking path stays under test without live model access

https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT

Co-authored-by: Claude <noreply@anthropic.com>

* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells

Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(otel): export SERVER span on management-endpoint success without http_request (#28794)

Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>

* chore(ci): merge dev branch (#28801)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* chore(ci): merge dev branch (#28657)

* feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543)

* feat(dashboard): refine navbar zones and Agent Platform notice

Restructure the admin navbar for production users: clear product vs community
vs personal columns with vertical dividers, icon-only Slack/GitHub in a
shared chip, and Docs/Blog typography aligned on an 8px rhythm.

Add a notifications bell with popover linking to the LiteLLM Agent Platform
repo and optional mark-as-read persistence.

Promote the account control with initials avatar, single-line display name,
and navDisplayName mapping for placeholder user ids (e.g. default_user_id).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex

- Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock
- Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages
- Remove redundant equality checks in navDisplayName (regex already covers them)
- Remove unused `lower` variable after simplification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(dashboard): drop dead useHealthReadiness import in navbar

The module was removed in #27896 (replaced by useHealthReadinessDetails),
but the import survived the rebase. The symbol is unused — only
useHealthReadinessDetails is consumed in the file. Removing the dead
import unblocks the UI TypeScript build.

* fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels

The component was refactored to an icon-only chip with aria-label='LiteLLM
on GitHub' (squash #27543), but the test still asserted /star us on
github/i. Update the query to match the rendered accessible name.

* refactor(dashboard): drop unused props from NavbarProps

The navbar refactor moved user identity + dark-mode state to internal
hooks (useAuthorized, useWorker), but the NavbarProps interface still
declared userID, userEmail, userRole, premiumUser, isDarkMode, and
toggleDarkMode as required, forcing every caller to thread them through.

Drop them from the interface and all four call sites (page.tsx,
(dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also
shrinks the destructure in layout.tsx so the now-unused locals stop
being pulled out of useAuthorized().

* refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag

Reads/writes of the litellmHideAgentPlatformBanner key were done
directly inside NotificationsBell via a useEffect + useState pair.
Every other localStorage-backed flag in the dashboard (Disable
ShowPrompts, DisableBouncingIcon, DisableShowNewBadge,
DisableUsageIndicator, DisableBlogPosts) is wrapped in a
useSyncExternalStore hook over localStorageUtils so all mounted
components stay in sync.

Extract useHideAgentPlatformBanner to follow the same shape, swap
NotificationsBell to consume it, and add a regression test that
two sibling bells stay in sync without a remount when one is
dismissed.

* refactor: mask credential fields in proxy settings GET responses (#28682)

* refactor: mask credential fields in proxy settings GET responses

Brings SSO settings, cache settings, and the email/Slack alerting view in
/get/config/callbacks in line with the HashiCorp Vault config-override
pattern, so persisted credentials are not transported back to the UI in
plaintext.

* refactor: harden short-value masking and hoist alerting var constant

Closes two review observations:

- mask_sensitive_keys now replaces short values (below the visible
  prefix+suffix length) with an all-mask string instead of returning them
  unchanged, so a 1-7 character credential is no longer round-tripped
  verbatim.
- _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level
  constant, matching the analogous _SSO_SENSITIVE_FIELDS and
  _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files.

---------

Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ui): show 2-decimal precision for max_budget on key overview (#28809)

The Key Info Overview tab's Spend card truncated sub-dollar budgets to
"$0" because formatNumberWithCommas defaults to 0 decimals. The Settings
tab passes 2; align the overview so a $0.10 budget renders as "$0.10".

Resolves LIT-2845

* feat(proxy): allow `llm_api_routes` virtual keys to list MCP servers (#28442)

* feat(proxy): allow llm_api_routes virtual keys to list MCP servers

Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET
/v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that
virtual keys configured with `allowed_routes=["llm_api_routes"]` can
discover the MCP servers they have access to. Previously these calls
failed with 'Virtual key is not allowed to call this route. Only allowed
to call routes: [llm_api_routes]'.

The GET handlers already sanitize the response for restricted virtual
keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping
credential-bearing fields (url, headers, env). Write methods
(POST/PUT/DELETE) on the same paths remain gated by the existing
handler-level admin role checks.

The new discovery list is intentionally kept OUT of
`mcp_inference_routes`, so `is_llm_api_route()` still returns False
for these paths — this preserves the existing contract that
DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP
servers.

Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* refactor(proxy): make MCP discovery carve-out method-aware

Replace the `mcp_discovery_routes` group in `llm_api_routes` with a
method-aware special case inside `is_virtual_key_allowed_to_call_route`.
Virtual keys with allowed_routes=["llm_api_routes"] are now permitted
to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} —
non-GET methods and multi-segment admin sub-paths fall through to the
existing 403. This keeps the general llm_api_routes list free of
management paths and avoids accidentally exposing POST/PUT/DELETE
writes through the route-check layer.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* chore(ci): merge dev branch (#28807)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* fix(team): keep team_alias cache in sync on _cache_team_object writes (#28737)

* fix(team): keep team_alias cache in sync on _cache_team_object writes

_cache_team_object wrote only to the team_id:<id> cache key, but the
JWT auth path that uses team_alias_jwt_field reads from a separate
team_alias:<alias> key (get_team_object_by_alias caches under both
keys on miss, but reads only the alias-keyed one). After any
team-mutation endpoint (team_model_add, team_model_delete,
update_team, the two access-group writes) the team_id cache was
refreshed but the team_alias cache stayed stale until TTL — JWT
callers using team_alias_jwt_field kept seeing the pre-mutation
team for the full cache window.

Mirror the write under the alias key inside _cache_team_object so
every existing caller stays in sync without further changes. Skip
the alias write when team_alias is None/empty so we don't collide
across alias-less teams.

Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the
LIT-3244 fix correctly invalidated the team_id cache but the
customer's JWT used team_alias_jwt_field, so they kept hitting the
stale alias-keyed entry.

* fix(team): delete (not overwrite) team_alias cache on _cache_team_object

The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias>
from _cache_team_object. team_alias is NOT unique in the schema
(no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias
enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises).
Writing the alias-keyed cache from the generic refresh path bypassed
that check: a team admin renaming their team to collide with another
team's alias could silently overwrite the cached team for JWT-by-alias
auth, swapping the resolved team under that alias for the cache window.

Switch the alias-keyed operation from a write to a delete (mirroring
the dual-cache delete pattern in _delete_cache_key_object). After every
team write, the next JWT-by-alias reader cache-misses and falls through
to get_team_object_by_alias, which (a) re-fetches the fresh team from
DB, closing the LIT-3244 staleness gap that motivated this PR, and
(b) enforces alias uniqueness before populating either cache key.

team_id:<id> writes are unchanged — team_id is the table PK and is
guaranteed unique.

Surfaced in veria-ai review on #28739.

* fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id

extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)`
which substring-matches the `model_id,` inside the file-ID encoding's
`llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id
then fed that deployment UUID back into the auth path as a model
candidate via _extract_models_from_managed_resource_id, and every
team-BYOK file attach 403'd with:

    team not allowed to access model. This team can only access
    models=['openai/*']. Tried to access <deployment-uuid>

The team's models list correctly contains the public name (`openai/*`)
that target_model_names matches, but the bogus UUID candidate fails
the wildcard check first.

Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it
matches the legitimate top-level `model_id,<value>` field on
vector_store unified IDs and skips substring matches inside other
fields. File-IDs (which have no top-level `model_id` field) now
return None and contribute no spurious UUID candidate.

Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's
exact flow: team with openai/* BYOK deployment, JWT-scoped user,
POST /v1/vector_stores/{id}/files attaching a file uploaded with
target_model_names=openai/gpt-4o.

* fix(proxy): hydrate wildcard discovery credentials (#28284) (#28822)

* fix(proxy): hydrate wildcard discovery credentials

* fix(proxy): constrain wildcard credential hydration

Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>

* ci: add daily oss-agent-shin branch creation workflow (#28829)

Creates litellm_oss_agent_shin_MM_DD_YYYY from main every day at 00:00 UTC.
Lets us retarget oss-agent-shin fork PRs onto a canonical branch so CircleCI runs with secrets, without granting the agent write access.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* test(proxy): add harness for proxy_server.py behavior-pinning (#28827)

* test(proxy): add harness for proxy_server.py behavior-pinning

Creates tests/test_litellm/proxy/proxy_server/ with:
- conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as,
  mock_router with parametrized response builders, normalize, etc.)
- _coverage_check.py: per-PR coverage gate (line + branch) against a
  baseline, self-selects target by inspecting which placeholder files
  have been filled
- _pin_check.py: AST-based gate that verifies every pin-list item has
  >=1 happy + >=1 error test with a real assertion (no status-only)
- test_harness_smoke.py: 19 smoke tests covering every fixture +
  both scripts end-to-end
- 26 placeholder test files (one docstring each) reserved for
  follow-up PRs per the directory ownership in the Notion plan
- .coverage_baseline pinned at 0% so future PRs measure deltas
  against new-tests-only and aren't entangled with the broader
  scattered test suite

Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml
so this directory's runtime + coverage are tracked independently.

Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc

* ci(proxy-endpoints): allow workflow_dispatch

Lets the workflow be triggered manually on a branch via
`gh workflow run`, which is needed for the verify-first
flow on workflow changes before opening a PR.

* test(proxy): address review feedback on proxy_server harness

- conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4])
  instead of CWD-relative os.path.abspath("../../../../") which resolved
  to the wrong directory when pytest is launched from the repo root.
- _coverage_check.py: actually read .coverage_baseline and use it as
  the floor (line_min = max(target, baseline)). Closes the gap between
  the PR description's "delta semantics" and what the script was doing.
  With baseline=0.0 today this is a no-op; future PRs that update the
  baseline cause regressions (test deletions etc.) to trip the gate
  even if the static PR target is still met.
- _pin_check.py: drop unreachable startswith("_") guard
  (test_*.py glob never yields underscore-prefixed names) and read
  each test file once instead of twice.

* feat(openai): apply regional-processing cost uplift for EU/US data residency (#28626)

* feat(openai): apply regional-processing cost uplift for EU/US data residency

OpenAI charges a 10% uplift on the latest GPT models when requests are
served from a regionalized hostname (eu./us.api.openai.com).  Infer the
region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`,
and multiply the computed cost by a per-model
`regional_processing_uplift_multiplier_<region>` field.

https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW

* test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema

* fix(cost): tighten data_residency inference and restore model_cost in tests

- Only infer OpenAI data_residency when custom_llm_provider == "openai";
  drop the implicit None fallback so non-OpenAI callers can't accidentally
  pick up a regional tag from a stray OpenAI hostname.
- _local_model_cost_map fixture now snapshots and restores
  litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak
  state across the session.

* refactor(openai): move data_residency helper under llms/openai

* fix: thread data_residency through realtime stream cost calculation

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(cost): thread data_residency through batch_cost_calculator

Apply the OpenAI regional-processing uplift multiplier to retrieve_batch
cost paths so Batch API requests served via eu./us.api.openai.com are
priced at the same uplifted token rates as completions/transcriptions.

* refactor(openai): encapsulate provider check inside infer_openai_data_residency

Move the custom_llm_provider == "openai" guard from get_litellm_params
into the helper itself so the core utility no longer carries
provider-specific dispatch logic. Callers pass through the provider
unconditionally; the helper returns None for any non-OpenAI provider.

* fix(responses): thread data_residency through Responses logging params

The Responses API paths build their logging litellm_params dict after
provider resolution but did not include data_residency, so cost calc
saw None even when the effective api_base was a regional OpenAI host.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* fix: preserve OTEL response payload and remove duplicate constant

- _emit_management_endpoint_otel_span now passes result as response on success
- remove duplicate _CREDENTIAL_LITELLM_PARAM_FIELDS assignment in model_checks

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: address bug detection findings

- pass_through_endpoints: use request.method instead of hardcoded POST
  in streaming SigV4-signed request path for consistency with the
  non-streaming branch
- llm_cost_calc/utils: hoist DataResidency value set to a module-level
  frozenset to avoid rebuilding it on every cost calculation
- example_config_yaml/oai_misc_config: replace real-looking AWS account
  ID with placeholder 123456789012 in example bucket and role ARN

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* chore(github_copilot): refresh model catalog from upstream /models API (#28055)

Aligns the github_copilot catalog with values returned by Copilot's
public /models endpoint (capabilities.limits + capabilities.supports +
model.supported_endpoints).

- Adds 10 new model entries: claude-opus-4.7, claude-sonnet-4.6,
  gemini-3-flash-preview, gemini-3.1-pro-preview, gpt-4-0125-preview,
  gpt-5.2-codex, gpt-5.4, gpt-5.4-mini, gpt-5.5, oswe-vscode-prime.
- Updates max_input_tokens for existing entries to reflect each
  model's true context window (e.g. gpt-4o-mini 64000 -> 128000,
  gpt-5-mini 128000 -> 264000, gpt-5.3-codex 128000 -> 400000,
  claude-haiku-4.5 128000 -> 200000).
- Adds supports_reasoning, supports_response_schema,
  supports_function_calling, supports_parallel_function_calling,
  supports_vision based on capabilities.supports.
- Declares supported_endpoints for entries missing it
  (e.g. gpt-3.5-turbo, gpt-4o, embeddings).
- For responses-only models (gpt-5.2-codex, gpt-5.4, gpt-5.4-mini,
  gpt-5.5), sets mode to 'responses'.
- gpt-41-copilot.mode changes from 'completion' to 'chat' because
  Copilot reports capabilities.type = 'chat'. Revertible on request.

Pricing fields and other manually-curated values are preserved.

* feat(datadog): emit litellm.overhead.latency as a standalone Datadog metric (#28831)

Adds a new `litellm.overhead.latency` gauge metric to `DatadogMetricsLogger`
(the `/api/v2/series` path). The value is sourced from
`hidden_params["litellm_overhead_time_ms"]` already computed in
`ResponseMetadata` and exposed in `StandardLoggingPayload`.

Matches the Prometheus integration which exposes the same value via
`litellm_overhead_latency_metric`. Emitted in seconds (ms ÷ 1000) for
consistency with the other latency series.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Shin <shin@litellm.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>

* feat(arize): route Phoenix traces via per-project TracerProviders (#28876)

Use LRU-cached TracerProviders with project-scoped OTEL Resources so team/key
metadata routes traces correctly. On the proxy, project selection is limited to
server-controlled user_api_key_auth_metadata; client metadata fields stay banned.

* fix(arize_phoenix): skip _emit_semantic_logs on failure path

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(arize_phoenix): skip raw request logging and metrics on failure path

Restores pre-refactor behavior: _handle_failure no longer emits raw-request
sub-spans or records OTEL metrics, matching the original _handle_failure
that did not call these helpers.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(security): close two medium telemetry trust-boundary issues

Issue 1 (arize_phoenix.py — caller-controlled telemetry routing):
- _is_proxy_request no longer detects proxy mode by checking
  user_api_key_auth_metadata in request metadata.  That field is
  user-supplied, so an authenticated caller could fake proxy-mode
  detection and have _project_from_metadata_dict read their own dict
  for project selection, routing telemetry to arbitrary Arize/Phoenix
  projects.  Proxy mode is now determined solely by the server-set
  proxy_server_request field in litellm_params.
- auth_utils.py adds user_api_key_auth_metadata to the banned request
  body params list so the proxy rejects any attempt to supply the field
  at the HTTP layer.  The field is server-reserved: it is written
  exclusively by add_user_api_key_auth_to_request_metadata from the
  authenticated key's database record after the ban check runs.

Issue 2 (management_helpers/utils.py — API key in OTEL span):
- _emit_management_endpoint_otel_span stripped plaintext credential
  fields (key, token, api_key, secret, …) from the response dict before
  passing it to the OTEL success hook.  dict(result) on a Pydantic
  GenerateKeyResponse includes the freshly-generated key field, which
  would previously be written as a span attribute to every configured
  OTEL collector/backend.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com>
Co-authored-by: Shin <shin@litellm.ai>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Sameerlite pushed a commit that referenced this pull request Jun 1, 2026
#28875)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput

Changes vs main:
- BedrockFilesConfig now detects OpenAI batch JSONL lines whose `url`
  is /v1/embeddings (with body-shape fallback) and routes them through
  a new `_map_openai_embedding_to_bedrock_params` helper instead of the
  chat-completion transformer that silently produces an invalid body.
- The embedding helper currently supports Amazon Titan Text Embeddings V2
  only. Other embed models (Titan G1, Titan Multimodal, Cohere Embed,
  Nova Multimodal Embeddings) raise NotImplementedError with a clear
  message; each will get a dedicated branch + tests in follow-up PRs to
  keep schema-specific risks isolated.
- Validation refuses pre-tokenized inputs (List[int], List[List[int]])
  and multi-element string lists with explicit errors so callers emit
  one JSONL line per embedding instead of relying on us to fan out.
- Titan v2 model id match tolerates "bedrock/" prefix, cross-region
  inference profile prefix ("us.", "eu.", etc.), and ARN forms; the
  marker boundary check rejects lookalikes like "titan-embed-text-v20".
- Tests cover happy path (fixtures), dimensions/encoding_format mapping,
  body-shape fallback, single-element list unwrap, error paths
  (missing input, multi-element list, unsupported model, pre-tokenized),
  mixed chat+embedding batch, and the model-id boundary check.

* fix(bedrock-batch): trust explicit url over body shape in record routing

Greptile-flagged gap in `_is_embedding_record`: when an OpenAI batch
JSONL line carries an explicit `url` pointing to a non-embedding
endpoint (e.g. `/v1/chat/completions`) AND its body happens to have
`input` without `messages`, the body-shape fallback would mis-route
that record to the embedding transformer and corrupt the modelInput.

Changes vs previous commit:
- `_is_embedding_record` now short-circuits to NOT-embedding whenever
  `url` is non-empty and not equal to `/v1/embeddings`. The body-shape
  fallback only runs when `url` is missing or empty. Docstring updated
  to spell out the precedence rules.
- Two new tests cover the case: direct helper assertion that an
  explicit chat url plus an input-bearing body returns False, plus an
  end-to-end check that the resulting modelInput contains no `inputText`
  key. A second test asserts the same short-circuit for arbitrary
  non-embeddings urls (`/v1/completions`, `/v1/responses`).

28/28 tests pass (was 26/26 before this commit + 2 new).

* refactor(bedrock-batch): extract embedding input normalization helper

Splits the input-shape validation out of
`_map_openai_embedding_to_bedrock_params` into a new static helper
`_coerce_embedding_input_to_string`. Same semantics; the goal is to make
the validation testable in isolation and to give future
embedding-provider branches (Titan G1, Cohere) a reusable shaping
function instead of duplicating type checks.

- Helper accepts `str`, single-element `list[str]`, and raises
  `ValueError` / `NotImplementedError` with actionable messages for
  None, multi-element lists, pre-tokenized inputs (`list[int]` /
  `list[list[int]]`), and other unsupported types.
- New unit test exercises the helper directly across happy paths,
  None / missing input, multi-element string list, multi-element int
  list (caught as 'one input per JSONL record' since we can't
  disambiguate from 'multiple strings' without more context),
  pre-tokenized single-element list-of-list, single-element list of
  bare int, and dict input.

29/29 tests in the file still pass.

* fix(bedrock-batch): layer registry mode check + drop unused provider param

Greptile-flagged issues on Titan v2 model detection:

1. Hardcoded substring without registry consultation conflicts with the
   project convention of treating `model_prices_and_context_window.json`
   as the source of truth for model capability flags.

   Fix: `_is_titan_v2_embed_model` now layers a registry mode check on
   top of the existing marker boundary. When `get_model_info` resolves
   the id, we additionally require `mode == "embedding"` so a malformed
   id whose path-component matches the marker but whose registered mode
   is "chat" doesn't slip through. Registry silence (cross-region
   inference profile prefixes like `us.amazon.titan-embed-text-v2:0`,
   ARN forms) keeps the substring-only behavior because the registry
   genuinely can't normalize those ids today. The substring is still
   needed because `mode == "embedding"` alone doesn't distinguish
   Titan v2's InvokeModel schema from Cohere Embed, Nova Multimodal,
   or Titan G1 - all also embedding mode, all with incompatible bodies.

2. `provider` parameter on `_map_openai_embedding_to_bedrock_params`
   was accepted but never used.

   Fix: dropped from the signature and the call site.

Also adds `_lookup_registry_mode` static helper (mirrors the one in the
sibling batches transformer) so the registry try/except shape lives in
one place instead of being inlined into the detector.

5 new tests pin the layered behavior:
- registry mode=chat overrides the marker match (rejected)
- registry mode=embedding + marker match (accepted)
- registry silent + marker match for cross-region and ARN ids (accepted)
- direct `_lookup_registry_mode` coverage across all return paths

33/33 tests in the file pass.

* fix(bedrock-batch): drive Titan v2 detection from registry provider_specific_entry

Addresses Greptile's remaining policy concern on PR #28875: the hardcoded
`_TITAN_V2_EMBED_MODEL_MARKER` substring required a code change to
register new Titan v2 variants. Now the registry is the source of truth.

Changes:
- model_prices_and_context_window.json + model_prices_and_context_window_backup.json
  Adds `provider_specific_entry.bedrock_invocation_schema = "titan_v2"`
  to the `amazon.titan-embed-text-v2:0` entry. Uses the existing
  `provider_specific_entry` escape-hatch field (already surfaced by
  `get_model_info` via `ModelInfo.provider_specific_entry`) rather than
  introducing a new top-level field that would need a corresponding
  change in `utils.py::get_model_info` to flow through.

- BedrockFilesConfig._is_titan_v2_embed_model now reads
  `get_model_info(model).provider_specific_entry.bedrock_invocation_schema`
  first. When the registry resolves the id we trust the field;
  registered ids without (or with a different) schema value are rejected
  outright - no substring second-chance for registered ids. The
  substring fallback only runs when `get_model_info` raises, which
  covers cross-region inference profile prefixes
  (`us.amazon.titan-embed-text-v2:0`) and Bedrock ARN forms - neither of
  which the registry normalizes today.

- _lookup_registry_mode helper renamed to _lookup_provider_specific_field
  and generalized: takes a field name, reads
  `provider_specific_entry[field]` defensively. Future embed-schema
  branches (Titan G1, Cohere, Nova Multimodal) will share this helper.

- New module-level constants: _BEDROCK_INVOCATION_SCHEMA_FIELD names the
  registry key; _TITAN_V2_INVOCATION_SCHEMA names the value.

- _map_openai_embedding_to_bedrock_params: dropped the unused `provider`
  parameter (separate Greptile flag).

Tests updated for the new nested-field semantics. 34/34 tests pass; one
end-to-end integration check confirms that with
LITELLM_LOCAL_MODEL_COST_MAP=true, the registry returns the new field
and all 7 representative model ids classify correctly.

* chore(lint): apply Black formatting to is_thinking_enabled

Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026
base branch - the lint CI check on this PR fails on
`litellm/llms/base_llm/chat/transformation.py` even with zero changes
from our side. Applying the formatter's preferred line-break inside
`is_thinking_enabled` unblocks the lint check without touching the
method's logic.

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Sameerlite added a commit that referenced this pull request Jun 1, 2026
* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Add 1-hour cache write pricing for us-gov Haiku 4.5

AWS Bedrock GovCloud applies a +20% premium over global Anthropic
rates. Global Haiku 4.5 5m/1h cache write is $1.25 / $2.00 per MTok;
us-gov is therefore $1.50 / $2.40 per MTok (the 5m rate was already
correct in litellm; the 1h field was missing).

Adds `cache_creation_input_token_cost_above_1hr: 2.4e-06` to the
two us-gov Haiku 4.5 entries in both pricing JSON files:

  - bedrock/us-gov-east-1/anthropic.claude-haiku-4-5-20251001-v1:0
  - bedrock/us-gov-west-1/anthropic.claude-haiku-4-5-20251001-v1:0

New parametrized regression test
tests/test_litellm/test_bedrock_usgov_haiku_1hr_cache.py pins both
entries and enforces the 1.6x 5m-to-1h ratio invariant matching the
pattern used by the existing Bedrock and Vertex 1h-cache tests.

Companion to the us-gov Sonnet 4.5 pricing fix.

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Sameerlite added a commit that referenced this pull request Jun 1, 2026
* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Fix us-gov-{east,west}-1 Bedrock Sonnet 4.5 pricing (Fixes #27120)

AWS GovCloud Bedrock pricing carries a +20% premium over the global
Anthropic rates, not the +10% premium that applies to commercial US
regions. Litellm's five us-gov Claude Sonnet 4.5 entries mirror the
commercial-US prices (3.3e-06 input, 1.65e-05 output, 4.125e-06 cache
write), undercharging customers by ~9%.

Corrects all five affected entries in both
model_prices_and_context_window.json and the bundled
model_prices_and_context_window_backup.json to match AWS's
published GovCloud rates (per https://aws.amazon.com/bedrock/pricing/):

  - input_cost_per_token              3.3e-06  -> 3.6e-06   ($3.60/MTok)
  - output_cost_per_token             1.65e-05 -> 1.8e-05   ($18.00/MTok)
  - cache_creation_input_token_cost   4.125e-06 -> 4.5e-06  ($4.50/MTok)
  - cache_read_input_token_cost       3.3e-07  -> 3.6e-07   ($0.36/MTok)
  - cache_creation_input_token_cost_above_1hr  (added)      ($7.20/MTok)

Affected entries:
  - bedrock/us-gov-east-1/anthropic.claude-sonnet-4-5-20250929-v1:0
  - bedrock/us-gov-west-1/anthropic.claude-sonnet-4-5-20250929-v1:0
  - bedrock/us-gov-east-1/claude-sonnet-4-5-20250929-v1:0
  - bedrock/us-gov-west-1/claude-sonnet-4-5-20250929-v1:0
  - us-gov.anthropic.claude-sonnet-4-5-20250929-v1:0

New regression test
tests/test_litellm/test_bedrock_usgov_pricing.py pins the corrected
rates across all five entries and asserts the 1.2x global-to-us-gov
ratio invariant, matching the pattern used by the existing Vertex
and EU/AU/JP Bedrock pricing tests.

* chore: trigger shin-agent re-eval on retargeted staging base

* Apply us-gov 200k-tier GovCloud premium to cross-region sonnet-4-5

The us-gov.anthropic.claude-sonnet-4-5-20250929-v1:0 cross-region
inference profile carried _above_200k_tokens fields at the +10%
commercial-US rate while the base tier was correctly at +20%. AWS
GovCloud pricing applies the same +20% uplift across all tiers, so
long-context requests through this profile were undercharging by ~9%.

Updates four existing fields (input/output/cache_creation/cache_read
_above_200k_tokens) and adds cache_creation_input_token_cost_above_1hr
_above_200k_tokens for parity with the us. cross-region entry.

Extends test_bedrock_usgov_pricing.py with five parametrized field
assertions plus a 1.2x ratio invariant across all 200k-tier fields,
so any future regression on either dimension is caught.

* chore: trigger shin-agent re-eval against updated Greptile state

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Sameerlite added a commit that referenced this pull request Jun 3, 2026
…28572)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models

The 1-hour prompt-cache write tier
(`cache_creation_input_token_cost_above_1hr`) was added to the
us./global. variants of the Claude 4.5/4.6/4.7 family on Bedrock, but
the eu./au./jp. cross-region inference profiles were left without it.
AWS Bedrock pricing applies the same +10% regional premium across all
geo profiles, so eu./au./jp. should carry the same 1-hour rates as
us. (1.6x the 5-minute regional rate).

Without these fields, cost tracking on EU/AU/JP Bedrock 1-hour-TTL
prompt caching falls back to the 5-minute write rate and undercounts
spend by ~60% for European, Australian, and Japanese tenants.

Adds the 1-hour tier (and Sonnet 4.5's long-context >200K tier where
AWS publishes one) to 14 regional Bedrock entries in both
`model_prices_and_context_window.json` and the bundled
`model_prices_and_context_window_backup.json`:

  - eu./au.   Opus 4.6     ($11.00 / MTok)
  - eu./au.   Opus 4.7     ($11.00 / MTok)
  - eu./au./jp. Sonnet 4.6 ($6.60 / MTok)
  - eu./au./jp. Sonnet 4.5 ($6.60 / MTok regular, $13.20 / MTok LC)
  - eu./au./jp. Haiku 4.5  ($2.20 / MTok)

Also extends `tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py`
with a `REGIONAL_EXPECTED` parametrized block covering all 13 new
entries plus the existing 1.6x ratio invariant.

Note: `eu.anthropic.claude-opus-4-5-20251101-v1:0` carries the
wrong 5m rate today (base 6.25e-06 instead of regional 6.875e-06),
which would break the 1.6x ratio check. It is intentionally left out
of this PR so the scope stays "1-hour cache tier addition" — a
separate follow-up should correct the EU 5m rates for Opus 4.5.

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Sameerlite added a commit that referenced this pull request Jun 3, 2026
…28569)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add 1-hour cache write pricing tier for Vertex AI Anthropic models

GCP Vertex AI publishes a separate 1-hour cache write column for the
Claude family (1.6x the 5-minute write rate, matching the documented
Bedrock ratio). LiteLLM's Vertex AI Anthropic entries only carry the
5-minute tier, so any request that uses `cache_control: {"ttl": "1h"}`
on Vertex AI Claude is undercounted in cost tracking by ~60%.

The runtime side already supports the 1-hour tier — `VertexAIAnthropicConfig`
extends `AnthropicConfig`, populating `ephemeral_1h_input_tokens`, and
`_calculate_cache_creation_cost` reads `cache_creation_input_token_cost_above_1hr`.
Only the price registry was missing data.

Adds the field to 19 vertex_ai/claude-* entries across both
`model_prices_and_context_window.json` and the bundled
`model_prices_and_context_window_backup.json`:

  - Haiku 4.5 ($1.25 -> $2.00 / MTok)
  - Sonnet 3.7 / 4 / 4.5 / 4.6 ($3.75 -> $6.00 / MTok)
  - Opus 4.5 / 4.6 / 4.7 ($6.25 -> $10.00 / MTok)
  - Opus 4 / 4.1 ($18.75 -> $30.00 / MTok)

Adds `tests/test_litellm/test_vertex_anthropic_1hr_cache_pricing.py`
mirroring the Bedrock equivalent — pins each (5m, 1h) pair per model
and asserts the 1.6x ratio across the family.

Fixes #27781.

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
mateo-berri added a commit that referenced this pull request Jun 3, 2026
* Fix incorrect agent API request example payload structure (#29556)

* fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs (#29427)

* fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs

On /v1/messages and other LITELLM_METADATA_ROUTES, the parent OTel span
is stored in litellm_params['litellm_metadata'] instead of
litellm_params['metadata']. When the request body contains a native
'metadata' field (e.g. Anthropic's {"user_id": "..."}),
litellm_params['metadata'] gets overwritten and the parent span is lost,
producing orphan root spans with a different trace_id.

Add fallback checks to litellm_metadata in:
- _get_span_context(): so child spans find the correct parent
- _end_proxy_span_from_kwargs(): so the proxy span gets closed

Fixes: #27934

* test(otel): tighten assertions per Greptile review

- test_span_context_metadata_takes_priority: assert litellm_metadata
  span is never accessed, proving metadata takes priority
- test_span_context_no_parent_when_neither_has_span: assert both ctx
  and detected_span are None

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* fix: remove premature end-user budget check from get_end_user_object (#29420)

* fix(proxy): remove premature end-user budget check from get_end_user_object

Problem:
- `_check_end_user_budget()` was called inside `get_end_user_object()`
- This caused budget checks to run BEFORE `skip_budget_checks` could be evaluated
- Zero-cost models (e.g., local vLLM) were incorrectly blocked when
  end-users exceeded their budget, even though they should bypass budget checks

Solution:
- Remove `_check_end_user_budget()` calls from `get_end_user_object()`
- Budget enforcement now happens exclusively in `common_checks()` where
  `skip_budget_checks` context is available
- `get_end_user_object()` keeps `route` as optional in function parameter for backwards compatibility and future implementation.

* refactor(tests): update budget enforcement tests to reflect changes in get_end_user_object

- test_get_end_user_object() verifies data fetching
- test_check_end_user_budget() verifies enforcement
- test_budget_enforcement_blocks_over_budget_users() integrates _check_end_user_budget()
- test_resolve_end_user_reraises_budget_exceeded() is now test_resolve_end_user since no budget exceeded is thrown in get_end_user_object()

* Gemini /images/generate and /images/edits billing fixes + add support for size and aspect ratio params (#29534)

* Fix Gemini image config mapping

* Address Gemini image config review

* Format Gemini image generation transform

* Fix Gemini image token usage logging

* Share Gemini image request helpers

* Fix Gemini Imagen model routing

* Fixes as per self code review

* Fixes per internal code review

* Stop gating Imagen imageSize forwarding

* Document Gemini image size mapping source

* chore: retrigger lint

* Clarify Gemini candidate count precedence

* Add Inception provider (#29522)

* add inception as provider (chat, fim)

* linting

* seperate test suite for chat and fim

* fix test coverage

* fix: model hub custom pricing model info (#29293)

* Opik user auth key metadata extractors (#28397)

* fix: enhance Opik metadata extraction to include user API key auth context fixed after refactoring to extractor logic

* test: add unit tests for OPik metadata extraction logic

* fix: enhance extract_opik_metadata function to prioritize metadata sources for improved accuracy

* fix(ci): clarified comments and edited unit tests

* test: add unit tests for OPik metadata extraction with auth and requester overrides

* fix(ui): replace fixed favicon.ico with current api get /get_favicon (#29532)

Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar>

* fix(vertex/gemini): keep tool_call reference when a text-only assistant message follows (#29561)

`_gemini_convert_messages_with_history` tracks `last_message_with_tool_calls`
so a following tool result can be matched back to its tool call. The assignment
was inside a branch guarded by
`assistant_msg.get("tool_calls", []) is not None`, which is also True for a
text-only assistant message (an empty list is not None). As a result, an
assistant message with no tool calls that appears between a tool call and its
tool result overwrote the reference, and conversion failed with:

    Exception: Missing corresponding tool call for tool response message.

This shape is common: a model emits a short narration/assistant message after a
tool call before the tool result is appended.

Only update `last_message_with_tool_calls` when the assistant message actually
carries tool_calls (or a function_call). Adds a regression test.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models (#28572)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models

The 1-hour prompt-cache write tier
(`cache_creation_input_token_cost_above_1hr`) was added to the
us./global. variants of the Claude 4.5/4.6/4.7 family on Bedrock, but
the eu./au./jp. cross-region inference profiles were left without it.
AWS Bedrock pricing applies the same +10% regional premium across all
geo profiles, so eu./au./jp. should carry the same 1-hour rates as
us. (1.6x the 5-minute regional rate).

Without these fields, cost tracking on EU/AU/JP Bedrock 1-hour-TTL
prompt caching falls back to the 5-minute write rate and undercounts
spend by ~60% for European, Australian, and Japanese tenants.

Adds the 1-hour tier (and Sonnet 4.5's long-context >200K tier where
AWS publishes one) to 14 regional Bedrock entries in both
`model_prices_and_context_window.json` and the bundled
`model_prices_and_context_window_backup.json`:

  - eu./au.   Opus 4.6     ($11.00 / MTok)
  - eu./au.   Opus 4.7     ($11.00 / MTok)
  - eu./au./jp. Sonnet 4.6 ($6.60 / MTok)
  - eu./au./jp. Sonnet 4.5 ($6.60 / MTok regular, $13.20 / MTok LC)
  - eu./au./jp. Haiku 4.5  ($2.20 / MTok)

Also extends `tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py`
with a `REGIONAL_EXPECTED` parametrized block covering all 13 new
entries plus the existing 1.6x ratio invariant.

Note: `eu.anthropic.claude-opus-4-5-20251101-v1:0` carries the
wrong 5m rate today (base 6.25e-06 instead of regional 6.875e-06),
which would break the 1.6x ratio check. It is intentionally left out
of this PR so the scope stays "1-hour cache tier addition" — a
separate follow-up should correct the EU 5m rates for Opus 4.5.

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* Add 1-hour cache write pricing tier for Vertex AI Anthropic models (#28569)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add 1-hour cache write pricing tier for Vertex AI Anthropic models

GCP Vertex AI publishes a separate 1-hour cache write column for the
Claude family (1.6x the 5-minute write rate, matching the documented
Bedrock ratio). LiteLLM's Vertex AI Anthropic entries only carry the
5-minute tier, so any request that uses `cache_control: {"ttl": "1h"}`
on Vertex AI Claude is undercounted in cost tracking by ~60%.

The runtime side already supports the 1-hour tier — `VertexAIAnthropicConfig`
extends `AnthropicConfig`, populating `ephemeral_1h_input_tokens`, and
`_calculate_cache_creation_cost` reads `cache_creation_input_token_cost_above_1hr`.
Only the price registry was missing data.

Adds the field to 19 vertex_ai/claude-* entries across both
`model_prices_and_context_window.json` and the bundled
`model_prices_and_context_window_backup.json`:

  - Haiku 4.5 ($1.25 -> $2.00 / MTok)
  - Sonnet 3.7 / 4 / 4.5 / 4.6 ($3.75 -> $6.00 / MTok)
  - Opus 4.5 / 4.6 / 4.7 ($6.25 -> $10.00 / MTok)
  - Opus 4 / 4.1 ($18.75 -> $30.00 / MTok)

Adds `tests/test_litellm/test_vertex_anthropic_1hr_cache_pricing.py`
mirroring the Bedrock equivalent — pins each (5m, 1h) pair per model
and asserts the 1.6x ratio across the family.

Fixes #27781.

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* Fix Gemini multimodal function responses (#29325)

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* address greptile review: add _transform_image_usage method and model-map supports_image_size flag

- Add _transform_image_usage instance method to GoogleImageGenConfig that
  delegates to transform_gemini_image_usage, fixing the regression test
- Replace hardcoded "2.5-flash" string check in supports_gemini_image_size
  with a get_model_info lookup on supports_image_size (default true)
- Add supports_image_size: false to all gemini-2.5-flash model entries in
  model_prices_and_context_window.json so capability is controlled via the
  model map rather than embedded in code

* fix test failures: schema validation, mypy type, model info plumbing, pricing test

- Add supports_image_size to ModelInfoBase TypedDict so get_model_info surfaces it
- Pass supports_image_size through _get_model_info_helper constructor call
- Fix supports_gemini_image_size to use value is not False (None means unset, defaults to True)
- Add supports_image_size to JSON schema in test_aaamodel_prices_and_context_window_json_is_valid
- Correct gemini-3.1-flash-lite pricing assertions in test to match JSON values

* Add Azure AI Kimi K2.6 metadata (#27052)

* Add Azure AI Kimi K2.6 metadata

* Scope Kimi metadata test cost map setup

* fall back to substring check for models not in model_prices_and_context_window.json

Models like gemini-2.5-flash-image-preview are not in the pricing JSON,
so get_model_info raises. Fall back to "2.5-flash" not in model when the
JSON has no explicit supports_image_size entry for the model.

* fix(inception): don't forward global litellm.api_key to Inception FIM

Match the Inception chat config: resolve only an Inception-specific key
(param, litellm.inception_key, or INCEPTION_API_KEY) for the text-completion
FIM path. The global litellm.api_key (often an OpenAI key) was both leaking
to api.inceptionlabs.ai and taking precedence over the configured Inception
key when set.

* fix(auth): enforce end-user budget on custom-auth path that skips common_checks

get_end_user_object() no longer raises BudgetExceededError, so custom-auth
deployments with custom_auth_run_common_checks unset (which skip the
centralized common_checks gate) stopped enforcing the end-user budget,
letting an over-budget end user keep making requests. Re-enforce the
budget in _run_post_custom_auth_checks on that path.

---------

Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar>
Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com>
Co-authored-by: aneeshsangvikar <aneeshsangvikar@fiddler.ai>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com>
Co-authored-by: Suleiman Elkhoury <108065141+suleimanelkhoury@users.noreply.github.com>
Co-authored-by: Dmitriy Alergant <93501479+DmitriyAlergant@users.noreply.github.com>
Co-authored-by: Yanis Miraoui <yanis.miraoui19@imperial.ac.uk>
Co-authored-by: Lovro Seder <vrovro@gmail.com>
Co-authored-by: Thomas Mildner <12685945+Thomas-Mildner@users.noreply.github.com>
Co-authored-by: José Luis Di Biase <josx@interorganic.com.ar>
Co-authored-by: Lai Quang Huy <64073540+1qh@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: ZHONG Ziwen <67355585+zzw-math@users.noreply.github.com>
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Sameerlite added a commit that referenced this pull request Jun 5, 2026
…28567)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7

AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the
existing us./eu./au./global. profiles for Claude Opus 4.7
(ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is
missing from model_prices_and_context_window.json. Tokyo-region
users currently get an "unknown model" error when routing through
the JP geo profile.

Adds the entry to both the canonical file and the bundled backup,
mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches
the other regional profiles (10% premium over base/global).

Regression test pins all six documented profiles (base, global, us, eu,
au, jp) and asserts pricing parity between jp. and au. variants.

Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Sameerlite added a commit that referenced this pull request Jun 5, 2026
…28567)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7

AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the
existing us./eu./au./global. profiles for Claude Opus 4.7
(ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is
missing from model_prices_and_context_window.json. Tokyo-region
users currently get an "unknown model" error when routing through
the JP geo profile.

Adds the entry to both the canonical file and the bundled backup,
mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches
the other regional profiles (10% premium over base/global).

Regression test pins all six documented profiles (base, global, us, eu,
au, jp) and asserts pricing parity between jp. and au. variants.

Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
mateo-berri added a commit that referenced this pull request Jun 5, 2026
* Mark xAI models retiring on 2026-05-15 (#28788)

Per https://docs.x.ai/developers/migration/may-15-retirement, xAI is
retiring the following slugs on 2026-05-15 (auto-redirect to grok-4.3
with various reasoning efforts; callers continuing to use the old slugs
will be billed at grok-4.3 pricing):

  grok-4-1-fast-reasoning{,-latest}      -> grok-4.3 (low effort)
  grok-4-1-fast-non-reasoning{,-latest}  -> grok-4.3 (none)
  grok-4-fast-reasoning                  -> grok-4.3 (low effort)
  grok-4-fast-non-reasoning              -> grok-4.3 (none)
  grok-4-0709                            -> grok-4.3 (low effort)
  grok-code-fast-1{,-0825}               -> grok-build-0.1
  grok-3                                 -> grok-4.3 (none)

Only the direct xai/ slugs are tagged; third-party hosts (azure_ai,
oci, vercel_ai_gateway, perplexity/xai) run their own schedules. The
grok-3 retirement list explicitly names only the base grok-3 slug — the
-mini / -fast / -beta / -latest variants are not listed, so they remain
untouched.

* feat(moonshot): advertise json_schema response support on live models (#29683)

litellm.responses() already routes Moonshot through the responses->chat-completions
bridge, and Moonshot honors response_format json_schema on chat completions. The
cost-map entries left supports_response_schema unset, so discovery layers that gate
on that flag dropped Moonshot from structured-output / responses listings even though
the capability works end to end.

Set supports_response_schema on the nine models currently live on api.moonshot.ai:
kimi-k2.5, kimi-k2.6, the moonshot-v1 8k/32k/128k text and vision-preview variants,
and moonshot-v1-auto. Verified against the live API that each honors json_schema and
that litellm.responses() returns schema-valid structured output through the bridge.

* chore(moonshot): mark models retired from api.moonshot.ai as deprecated (#29685)

Thirteen Moonshot/Kimi models in the cost map no longer resolve on
api.moonshot.ai (all return 404). Stamp each with its deprecation_date from
platform.kimi.ai/docs/models rather than deleting the entries, so historical
cost calculation keeps resolving the names while tooling can surface the
retirement.

Dates: kimi-thinking-preview 2025-11-11; kimi-latest and its 8k/32k/128k context
variants 2026-01-28; the kimi-k2 preview/turbo/thinking series 2026-05-25; the
moonshot-v1 -0430 snapshots use their own 2024-04-30 snapshot date (Moonshot
publishes no discontinuation date for them).

* fix(moonshot): drop temperature for reasoning models (kimi-k2.5/k2.6) (#29687)

Kimi reasoning models reject every temperature except 1; a request with
temperature=0.2 returns "invalid temperature: only 1 is allowed for this model".
litellm only clamped temperature into [0.3, 1], so any value below 1 still 400'd.

Drop the temperature param entirely for reasoning models (gated on
supports_reasoning, the same signal transform_request already uses) so the model
default is used; the non-reasoning moonshot-v1 models keep the existing clamp.

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(mcp): add per-server timeout configuration (#29672)

* feat(mcp): add per-server timeout configuration

* fix(mcp): address timeout field review comments

- use is not None guard instead of or for 0.0 edge case
- copy timeout in both LiteLLM_MCPServerTable constructions (health check path + _build_mcp_server_table)
- add timeout Float? column to all three schema.prisma files
- extend round-trip test to cover _build_mcp_server_table direction
- add test for zero timeout not treated as falsy

* fix(mcp): forward timeout in _build_temporary_mcp_server_record

* fix(mcp): return 504 instead of 500 when per-server timeout fires

* test(mcp): add 504 timeout regression test; fix black formatting

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7 (#28567)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7

AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the
existing us./eu./au./global. profiles for Claude Opus 4.7
(ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is
missing from model_prices_and_context_window.json. Tokyo-region
users currently get an "unknown model" error when routing through
the JP geo profile.

Adds the entry to both the canonical file and the bundled backup,
mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches
the other regional profiles (10% premium over base/global).

Regression test pins all six documented profiles (base, global, us, eu,
au, jp) and asserts pricing parity between jp. and au. variants.

Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(soniox): add soniox audio transcription integration (#29508)

* feat(openmeter): add OPENMETER_TRUST_REQUEST_USER to prevent forged attribution (#29650)

The OpenMeter callback resolves the CloudEvent subject from kwargs["user"]
first, then falls back to the key-bound user_api_key_user_id. For
multi-tenant proxy deployments, a client can set `"user": "..."` in the
request body and cause their usage to be attributed to that arbitrary
string — a billing-attribution forgery risk.

Adds OPENMETER_TRUST_REQUEST_USER env var (default "true" for backward
compatibility). When set to "false", the request-supplied `user` field is
ignored and the subject is resolved solely from user_api_key_user_id.

Matches the existing env-var-driven config pattern in this file
(OPENMETER_API_KEY, OPENMETER_API_ENDPOINT, OPENMETER_EVENT_TYPE).

* feat(search): add you_com as a search provider (#28370)

* feat(search): add you_com as a search provider

Registers You.com Search API as a first-class `search_provider` in the
`search_tools` registry, alongside Tavily, Exa, Perplexity, etc.

- New adapter: litellm/llms/you_com/search/transformation.py
  - POSTs to https://ydc-index.io/v1/search
  - Auth: X-API-Key from YOUCOM_API_KEY (or explicit api_key)
  - Maps Perplexity unified spec: max_results -> count,
    search_domain_filter -> include_domains, country -> country
  - Flattens results.web + results.news into a single SearchResult list;
    snippet prefers snippets[0], falls back to description; page_age -> date
- Registry: SearchProviders.YOU_COM in litellm/types/utils.py and wired
  into ProviderConfigManager.get_provider_search_config()
- Pricing entry: model_prices_and_context_window.json (placeholder $0.0;
  happy to adjust to maintainers' preferred public number)
- Docs: example router config snippet and example proxy yaml updated
- Tests: tests/search_tests/test_you_com_search.py - 5 mocked tests
  (payload shape, domain filter mapping, snippet fallback, news flattening,
  missing-api-key error)

Refs upstream expansion signal: #15942

* review fixups: normalize api_base, lowercase country, scope env-var to test

Addresses Greptile inline review comments on #28370:

- get_complete_url: strip trailing slashes from api_base *before* the
  endswith("/v1/search") check, so a custom base like ".../v1/search/"
  doesn't become ".../v1/search/v1/search".
- transform_search_request: .lower() country before sending, matching
  Tavily's convention so callers using the unified spec form ("US") get
  consistent behavior across providers.
- Tests: replace direct os.environ writes with an autouse monkeypatch
  fixture so YOUCOM_API_KEY is set per-test and removed afterwards.
  The missing-key test now uses monkeypatch.delenv. New test asserts the
  trailing-slash normalization above.

Reverts the ARCHITECTURE.md / example yaml edits per the reviewer note
that documentation changes belong in the litellm-docs repo.

* support keyless free tier (api.you.com/v1/agents/search) as default

You.com offers an IP-throttled keyless endpoint that returns the same
response shape as the keyed one (~100 queries/day, no signup). This is a
significant onboarding lever - mirrors the keyless DuckDuckGo/SearXNG
providers already in the search_tools registry.

Behavior:
- YOUCOM_API_KEY set        -> keyed:  POST https://ydc-index.io/v1/search
                                       (X-API-Key header)
- no key                    -> free:   POST https://api.you.com/v1/agents/search
                                       (no auth)
- YOUCOM_API_BASE override  -> honored as-is

Tests:
- New: test_you_com_search_keyless_free_tier - asserts URL + absence of
  X-API-Key when no key is configured.
- New: test_you_com_search_validate_environment_keyless - asserts the
  config no longer raises when the key is absent.
- Removed: test_you_com_search_raises_without_api_key (the precondition
  no longer holds).
- Existing payload/domain-filter/etc tests still cover keyed mode via
  the autouse YOUCOM_API_KEY fixture.

Verified both endpoints accept POST + return identical JSON shape:
  results.web[] / results.news[] with title, url, snippets, description,
  page_age.

* register you_com in provider_endpoints_support.json

Adding `litellm/llms/you_com/` requires a corresponding entry in
provider_endpoints_support.json or the
code-quality/check_provider_folders_documented CI check fails.

Follows the compact tavily/serper pattern - endpoints: { search: true }.
Local run of the check now reports "All 114 provider folders are documented".

* move tests under tests/test_litellm/llms/ so CI exercises them

The litellm CI workflows scope unit tests to `tests/test_litellm/...`
(see test-unit-llm-providers.yml: `tests/test_litellm/llms` path), so
tests living under `tests/search_tests/` are never run in CI - which is
why codecov reports 0% patch coverage for the new adapter even though
the unit tests exist and pass locally.

Move test_you_com_search.py into `tests/test_litellm/llms/you_com/` so
the test-unit-llm-providers job picks it up. 7/7 tests still pass at
the new location.

(Sibling search-only providers - tavily, exa_ai, brave, etc. - still
live only in `tests/search_tests/` and would benefit from the same
move, but that is out of scope for this PR.)

* fix(you_com): pin Accept-Encoding: identity to dodge keyless gzip bug

The keyless free-tier endpoint (api.you.com/v1/agents/search) advertises
Content-Encoding: gzip but returns a body that httpx's decoder rejects
with `zlib.error: Error -3 while decompressing data: incorrect header
check`, surfacing as litellm.APIConnectionError in user code. curl works
because it doesn't request compression by default.

Pin Accept-Encoding: identity in validate_environment so the upstream
server skips compression entirely. Harmless on the keyed endpoint
(ydc-index.io/v1/search) which negotiates content-encoding correctly.

The header uses setdefault so a caller-supplied Accept-Encoding still
takes precedence. (Server-side bug has been flagged to the You.com team
separately - once fixed there, this workaround can be removed.)

New unit test: test_you_com_search_pins_identity_accept_encoding.

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* docs: fix README typo (#29419)

Correct clear spelling mistakes in documentation without changing behavior.

Confidence: high
Scope-risk: narrow
Tested: git diff --check; uvx codespell on changed files
Not-tested: Full docs build not run; text-only changes

* Fix(langfuse): pass httpx_client to Langfuse in langfuse_prompt_management to respect SSL_VERIFY (#29480)

* fix(langfuse): pass ssl_verify to Langfuse httpx client

* fix_langfuse_

* add unit tests

* addressed comments

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(models): add minimax/MiniMax-M3 to model cost map (#29412)

Add MiniMax's new flagship MiniMax-M3 to the native minimax provider:
512K context, 128K max output, native multimodal (supports_vision),
reasoning, prompt caching. Pricing (USD/M tokens): input 0.6 / output
2.4 / cache read 0.12. M3 has no active prompt-cache-write tier, so
cache_creation_input_token_cost is omitted.

Updated both the root model_prices_and_context_window.json (remote
source) and the bundled litellm/model_prices_and_context_window_backup.json
(local fallback), keeping them in sync.

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log (#29394)

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log

* fix(logging): extend terminal event handling to ResponseIncompleteEvent and ResponseFailedEvent; fix return type annotation

* feat(provider): Add Neosantara provider as OpenAI Compatible (#29646)

* Add Neosantara provider

* Register Neosantara provider enum

* Address Neosantara provider review feedback

* Add Neosantara packaged endpoint support

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix: address greptile and veria review feedback

- langfuse: guard httpx_client injection behind version check (>= 2.7.3)
- soniox: propagate audio_transcription_duration in _hidden_params for spend tracking
- soniox: give SONIOX_API_BASE env var priority over caller-supplied api_base
- mcp: replace CancelledError catch with asyncio.wait_for + TimeoutError

* chore(mcp): add migration for per-server timeout column

* fix(test): add tool_use_system_prompt_tokens to model prices schema validator

* fix: mcp timeout test uses real asyncio.wait_for timeout; you_com get_complete_url respects resolved api_key

* fix: forward resolved api_key into you_com endpoint selection and apply timeout to soniox polling GETs

The search flow resolves api_key in validate_environment but never passed it
into get_complete_url, so a programmatic api_key (with no YOUCOM_API_KEY in the
env) set the X-API-Key header yet still selected the keyless free-tier endpoint.
Forward api_key through both the search entrypoint and the http handler so the
keyed endpoint is chosen.

HTTPHandler.get/AsyncHTTPHandler.get had no timeout parameter, so the Soniox
poll and transcript-fetch GETs silently used the client global default instead
of the caller timeout. Add a per-request timeout to get() and forward the
configured timeout from the Soniox handler.

* fix(soniox): price stt-async-v4 per second so transcriptions are billed

The handler stores audio_transcription_duration in _hidden_params, but the
model carried only token cost fields and the response has no token usage, so
the transcription cost path fell through to cost_per_second and returned $0.
An authenticated caller could transcribe Soniox audio without decrementing
their budget. Switch the entry to output_cost_per_second at Soniox's published
$0.10/hour async rate so the stored duration produces a real charge.

* fix(langfuse): use a dedicated httpx client for the SDK injection

The httpx_client handed to the Langfuse SDK came from _get_httpx_client(),
which returns LiteLLM's globally cached HTTPHandler. If Langfuse closed that
client on teardown it would invalidate the shared client used by every other
LiteLLM HTTP call. Build a dedicated httpx.Client instead, still resolving SSL
verification and client certificate from LiteLLM's configuration.

* fix(soniox): prefer caller-supplied api_base over SONIOX_API_BASE env var

* fix(cohere): support max_completion_tokens on cohere v2 chat (default route) (#29779)

* fix(cohere): support max_completion_tokens on cohere v2 chat

The default cohere_chat route resolves to CohereV2ChatConfig, which did not
list or map max_completion_tokens, so get_optional_params raised
UnsupportedParamsError for the standard OpenAI parameter (the modern
replacement for the deprecated max_tokens). The v1 config already maps it to
cohere's max_tokens; mirror that in v2 and add v2 regression tests.

* fix(cohere): make max_completion_tokens take precedence over max_tokens on v2

When both max_tokens and max_completion_tokens are supplied, prefer
max_completion_tokens explicitly rather than relying on dict iteration order,
and cover both orderings with a regression test.

---------

Co-authored-by: Daniel Yudelevich <4537920+yudelevi@users.noreply.github.com>
Co-authored-by: hectorc98 <hector.chamorroalvarez@adyen.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Dan Lemon <dan@danlemon.com>
Co-authored-by: Saswat <saswatds@users.noreply.github.com>
Co-authored-by: Brian Sparker <brainsparker@users.noreply.github.com>
Co-authored-by: Zhao73 <156770117+Zhao73@users.noreply.github.com>
Co-authored-by: Urain Ahmad Shah <60431964+urainshah@users.noreply.github.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: kape <168134658+kapelame@users.noreply.github.com>
Co-authored-by: danisalvaa <159898202+danisalvaa@users.noreply.github.com>
Co-authored-by: Just R <remixingmagelang@gmail.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: is_thinking_enabled crashes with AttributeError when non_default_params["thinking"] is None

4 participants