Skip to content

chore(ci): promote internal staging to main#29243

Merged
yuneng-berri merged 54 commits into
mainfrom
litellm_internal_staging
May 29, 2026
Merged

chore(ci): promote internal staging to main#29243
yuneng-berri merged 54 commits into
mainfrom
litellm_internal_staging

Conversation

@yuneng-berri

Copy link
Copy Markdown
Collaborator

Relevant issues

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

yuneng-berri and others added 30 commits May 23, 2026 16:41
…8683)

* fix(team): refresh team cache on team_model_add/delete (LIT-3244)

team_model_add and team_model_delete wrote to the DB but did not
invalidate the in-memory LiteLLM_TeamTableCachedObj used by
common_checks. After the v1.83.14 common_checks centralization made
team.models authoritative on /v1/files and /v1/vector_stores/*,
adding a Team-BYOK model silently failed to grant the new public
model name to team members until the cache TTL expired (and a
removed model kept working until then on the symmetric path).

Extract the cache-refresh snippet from update_team into a small
helper and apply it consistently at all three team-write sites.

* test: also assert updated models in team-cache-refresh pin

Strengthens the LIT-3244 regression test to also assert
`call_kwargs["team_table"].models` matches the updated row,
not just `team_id`. Both `existing_team` and `updated_team`
share `team_id` in the test setup, so the previous assertion
would have passed even if the implementation accidentally cached
the pre-mutation row.

Greptile review feedback.

* fix(team): hydrate object_permission on cache-refreshing team updates

The Prisma update calls in update_team, team_model_add, and
team_model_delete returned a team row with object_permission_id set
but object_permission=None (the relation was not requested via
include=). _refresh_cached_team then wrote that to the in-memory
LiteLLM_TeamTableCachedObj, and the cache-hit path in get_team_object
returns the cached object without re-hydrating. Downstream consumers
(validate_key_search_tools_against_team, the MCP/agent authz paths)
treat a missing object_permission as no team-level restriction, so
a team-write op silently dropped object-permission enforcement until
the cache TTL expired or a DB-fetch path re-hydrated it.

Add include={"object_permission": True} to all three updates so the
refresh writes a complete cached team. Extend the LIT-3244 regression
test to pin both the cached object_permission and the include shape
on the Prisma call.

Surfaced in PR review of LIT-3244.
… Anthropic (#28723)

`getProviderModels()` matched a model into a provider's dropdown when the
model's `litellm_provider` string *contained* the provider key as a
substring. The intent was to admit suffix variants (e.g. `anthropic_text`,
`bedrock_converse`), but the substring check is too loose: it also pulls in
unrelated providers whose name happens to contain the key, most visibly
`vertex_ai-anthropic_models` matching `anthropic` and `vertex_ai-openai_models`
matching `openai`.

Replace `.includes()` with separator-anchored prefix matching
(`startsWith(provider + "_")` / `startsWith(provider + "-")`). All legitimate
variants in `model_prices_and_context_window.json` still match
(`anthropic_text`, `azure_text`, `azure_ai`, `bedrock_converse`,
`bedrock_mantle`, `cohere_chat`, `fireworks_ai-embedding-models`,
`vertex_ai-*`, `vertex_ai_beta`), and the cross-provider leak is closed.

Tests: update one assertion that pinned the buggy substring behavior
(`custom_openai_endpoint` matching `openai` — not a real provider value);
add 6 new tests covering the leak regressions and the variant-preservation
contract for vertex_ai/bedrock/fireworks.
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
…rs and signed request body (#27526)

* Fix Bedrock KB pass-through SigV4 headers and signed body

Coerce botocore HeadersDict to a dict for pass-through routes. When
forward_headers is true, drop request headers that collide case-insensitively
with signed headers so client Bearer auth does not shadow AWS SigV4.
Send prepped.body as raw content so the outbound payload matches the
signature after logging hooks mutate the parsed dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Simplify pass-through raw body handling

Read the SigV4-signed bytes directly from request.state inside
pass_through_request instead of threading a custom_raw_body argument
through three functions. Helper methods are restored to their original
signatures, and the new branch lives in one place at each httpx call site.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Harden pass-through raw body read from request.state

Guard missing request.state (test fixtures) and ignore non-bytes/str
values so MagicMock does not trigger the SigV4 raw-body path.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Test pass_through_request state_raw_body uses httpx content=

Cover non-streaming (async_client.request) and streaming (build_request)
paths so SigV4 bytes on request.state are not replaced by json= of a
hook-mutated dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214

The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).

Changes:
  - Replace 26 hardcoded references to 888602223428 with 941277531214 across
    8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
    ARNs, batch execution role ARN, and example proxy config).
  - The provisioned-model and imported-model ARNs are referenced only from
    mocked unit tests — no AWS resources to recreate.
  - The batch execution IAM role has been recreated in the new account with
    the same name and equivalent permissions.
  - The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
    hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
    under the same names — see tools/agentcore-deploy/ in a follow-up.

CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.

Smoke-tested locally against the new account:
  aws bedrock-runtime converse --region us-west-2 \
    --model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
    --messages '[{"role":"user","content":[{"text":"ping"}]}]'
  → 200, model returned 'pong'

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes

The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).

Deployed runtimes:
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy

Both runtimes are status=READY and pass a smoke invoke:
  $ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
  → 200, {"result": "echo: ping"}

The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): point Bedrock batch tests at new-account S3 bucket

The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.

Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): point live S3 logging test at new-account bucket

Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.

Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails

The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
  - wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
    with explicit inputAction=ANONYMIZE so masking applies to INPUT,
    which is the source litellm's moderation hook sends)
  - ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
    to the exact string the tests assert on)

Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): migrate legacy models to current inference profiles

The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
  - anthropic.claude-3-sonnet-20240229    -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
  - anthropic.claude-3-haiku-20240307     -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).

cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources

These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
  - SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
    -> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
  - Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
    vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)

claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.

Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): swap/skip legacy-gated models unavailable on new CI account

The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:

- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
  legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
  authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
  us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
  active us.anthropic.claude-sonnet-4-5 inference profile.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account

- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
  is not authorized on account 941277531214) and migrate the missed
  s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
  us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
  output e2e test.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)

Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
  instead of skipping, so the missing entitlement stays visible in CI; they
  still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
  batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
  transform + cost-tracking path stays under test without live model access

https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT

Co-authored-by: Claude <noreply@anthropic.com>

* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells

Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
…http_request (#28794)

Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
* feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543)

* feat(dashboard): refine navbar zones and Agent Platform notice

Restructure the admin navbar for production users: clear product vs community
vs personal columns with vertical dividers, icon-only Slack/GitHub in a
shared chip, and Docs/Blog typography aligned on an 8px rhythm.

Add a notifications bell with popover linking to the LiteLLM Agent Platform
repo and optional mark-as-read persistence.

Promote the account control with initials avatar, single-line display name,
and navDisplayName mapping for placeholder user ids (e.g. default_user_id).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex

- Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock
- Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages
- Remove redundant equality checks in navDisplayName (regex already covers them)
- Remove unused `lower` variable after simplification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(dashboard): drop dead useHealthReadiness import in navbar

The module was removed in #27896 (replaced by useHealthReadinessDetails),
but the import survived the rebase. The symbol is unused — only
useHealthReadinessDetails is consumed in the file. Removing the dead
import unblocks the UI TypeScript build.

* fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels

The component was refactored to an icon-only chip with aria-label='LiteLLM
on GitHub' (squash #27543), but the test still asserted /star us on
github/i. Update the query to match the rendered accessible name.

* refactor(dashboard): drop unused props from NavbarProps

The navbar refactor moved user identity + dark-mode state to internal
hooks (useAuthorized, useWorker), but the NavbarProps interface still
declared userID, userEmail, userRole, premiumUser, isDarkMode, and
toggleDarkMode as required, forcing every caller to thread them through.

Drop them from the interface and all four call sites (page.tsx,
(dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also
shrinks the destructure in layout.tsx so the now-unused locals stop
being pulled out of useAuthorized().

* refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag

Reads/writes of the litellmHideAgentPlatformBanner key were done
directly inside NotificationsBell via a useEffect + useState pair.
Every other localStorage-backed flag in the dashboard (Disable
ShowPrompts, DisableBouncingIcon, DisableShowNewBadge,
DisableUsageIndicator, DisableBlogPosts) is wrapped in a
useSyncExternalStore hook over localStorageUtils so all mounted
components stay in sync.

Extract useHideAgentPlatformBanner to follow the same shape, swap
NotificationsBell to consume it, and add a regression test that
two sibling bells stay in sync without a remount when one is
dismissed.

* refactor: mask credential fields in proxy settings GET responses (#28682)

* refactor: mask credential fields in proxy settings GET responses

Brings SSO settings, cache settings, and the email/Slack alerting view in
/get/config/callbacks in line with the HashiCorp Vault config-override
pattern, so persisted credentials are not transported back to the UI in
plaintext.

* refactor: harden short-value masking and hoist alerting var constant

Closes two review observations:

- mask_sensitive_keys now replaces short values (below the visible
  prefix+suffix length) with an all-mask string instead of returning them
  unchanged, so a 1-7 character credential is no longer round-tripped
  verbatim.
- _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level
  constant, matching the analogous _SSO_SENSITIVE_FIELDS and
  _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files.

---------

Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
)

The Key Info Overview tab's Spend card truncated sub-dollar budgets to
"$0" because formatNumberWithCommas defaults to 0 decimals. The Settings
tab passes 2; align the overview so a $0.10 budget renders as "$0.10".

Resolves LIT-2845
…28442)

* feat(proxy): allow llm_api_routes virtual keys to list MCP servers

Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET
/v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that
virtual keys configured with `allowed_routes=["llm_api_routes"]` can
discover the MCP servers they have access to. Previously these calls
failed with 'Virtual key is not allowed to call this route. Only allowed
to call routes: [llm_api_routes]'.

The GET handlers already sanitize the response for restricted virtual
keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping
credential-bearing fields (url, headers, env). Write methods
(POST/PUT/DELETE) on the same paths remain gated by the existing
handler-level admin role checks.

The new discovery list is intentionally kept OUT of
`mcp_inference_routes`, so `is_llm_api_route()` still returns False
for these paths — this preserves the existing contract that
DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP
servers.

Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* refactor(proxy): make MCP discovery carve-out method-aware

Replace the `mcp_discovery_routes` group in `llm_api_routes` with a
method-aware special case inside `is_virtual_key_allowed_to_call_route`.
Virtual keys with allowed_routes=["llm_api_routes"] are now permitted
to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} —
non-GET methods and multi-segment admin sub-paths fall through to the
existing 403. This keeps the general llm_api_routes list free of
management paths and avoids accidentally exposing POST/PUT/DELETE
writes through the route-check layer.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
…#28737)

* fix(team): keep team_alias cache in sync on _cache_team_object writes

_cache_team_object wrote only to the team_id:<id> cache key, but the
JWT auth path that uses team_alias_jwt_field reads from a separate
team_alias:<alias> key (get_team_object_by_alias caches under both
keys on miss, but reads only the alias-keyed one). After any
team-mutation endpoint (team_model_add, team_model_delete,
update_team, the two access-group writes) the team_id cache was
refreshed but the team_alias cache stayed stale until TTL — JWT
callers using team_alias_jwt_field kept seeing the pre-mutation
team for the full cache window.

Mirror the write under the alias key inside _cache_team_object so
every existing caller stays in sync without further changes. Skip
the alias write when team_alias is None/empty so we don't collide
across alias-less teams.

Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the
LIT-3244 fix correctly invalidated the team_id cache but the
customer's JWT used team_alias_jwt_field, so they kept hitting the
stale alias-keyed entry.

* fix(team): delete (not overwrite) team_alias cache on _cache_team_object

The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias>
from _cache_team_object. team_alias is NOT unique in the schema
(no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias
enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises).
Writing the alias-keyed cache from the generic refresh path bypassed
that check: a team admin renaming their team to collide with another
team's alias could silently overwrite the cached team for JWT-by-alias
auth, swapping the resolved team under that alias for the cache window.

Switch the alias-keyed operation from a write to a delete (mirroring
the dual-cache delete pattern in _delete_cache_key_object). After every
team write, the next JWT-by-alias reader cache-misses and falls through
to get_team_object_by_alias, which (a) re-fetches the fresh team from
DB, closing the LIT-3244 staleness gap that motivated this PR, and
(b) enforces alias uniqueness before populating either cache key.

team_id:<id> writes are unchanged — team_id is the table PK and is
guaranteed unique.

Surfaced in veria-ai review on #28739.

* fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id

extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)`
which substring-matches the `model_id,` inside the file-ID encoding's
`llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id
then fed that deployment UUID back into the auth path as a model
candidate via _extract_models_from_managed_resource_id, and every
team-BYOK file attach 403'd with:

    team not allowed to access model. This team can only access
    models=['openai/*']. Tried to access <deployment-uuid>

The team's models list correctly contains the public name (`openai/*`)
that target_model_names matches, but the bogus UUID candidate fails
the wildcard check first.

Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it
matches the legitimate top-level `model_id,<value>` field on
vector_store unified IDs and skips substring matches inside other
fields. File-IDs (which have no top-level `model_id` field) now
return None and contribute no spurious UUID candidate.

Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's
exact flow: team with openai/* BYOK deployment, JWT-scoped user,
POST /v1/vector_stores/{id}/files attaching a file uploaded with
target_model_names=openai/gpt-4o.
* fix(proxy): hydrate wildcard discovery credentials

* fix(proxy): constrain wildcard credential hydration

Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>
Creates litellm_oss_agent_shin_MM_DD_YYYY from main every day at 00:00 UTC.
Lets us retarget oss-agent-shin fork PRs onto a canonical branch so CircleCI runs with secrets, without granting the agent write access.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
* test(proxy): add harness for proxy_server.py behavior-pinning

Creates tests/test_litellm/proxy/proxy_server/ with:
- conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as,
  mock_router with parametrized response builders, normalize, etc.)
- _coverage_check.py: per-PR coverage gate (line + branch) against a
  baseline, self-selects target by inspecting which placeholder files
  have been filled
- _pin_check.py: AST-based gate that verifies every pin-list item has
  >=1 happy + >=1 error test with a real assertion (no status-only)
- test_harness_smoke.py: 19 smoke tests covering every fixture +
  both scripts end-to-end
- 26 placeholder test files (one docstring each) reserved for
  follow-up PRs per the directory ownership in the Notion plan
- .coverage_baseline pinned at 0% so future PRs measure deltas
  against new-tests-only and aren't entangled with the broader
  scattered test suite

Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml
so this directory's runtime + coverage are tracked independently.

Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc

* ci(proxy-endpoints): allow workflow_dispatch

Lets the workflow be triggered manually on a branch via
`gh workflow run`, which is needed for the verify-first
flow on workflow changes before opening a PR.

* test(proxy): address review feedback on proxy_server harness

- conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4])
  instead of CWD-relative os.path.abspath("../../../../") which resolved
  to the wrong directory when pytest is launched from the repo root.
- _coverage_check.py: actually read .coverage_baseline and use it as
  the floor (line_min = max(target, baseline)). Closes the gap between
  the PR description's "delta semantics" and what the script was doing.
  With baseline=0.0 today this is a no-op; future PRs that update the
  baseline cause regressions (test deletions etc.) to trip the gate
  even if the static PR target is still met.
- _pin_check.py: drop unreachable startswith("_") guard
  (test_*.py glob never yields underscore-prefixed names) and read
  each test file once instead of twice.
…sidency (#28626)

* feat(openai): apply regional-processing cost uplift for EU/US data residency

OpenAI charges a 10% uplift on the latest GPT models when requests are
served from a regionalized hostname (eu./us.api.openai.com).  Infer the
region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`,
and multiply the computed cost by a per-model
`regional_processing_uplift_multiplier_<region>` field.

https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW

* test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema

* fix(cost): tighten data_residency inference and restore model_cost in tests

- Only infer OpenAI data_residency when custom_llm_provider == "openai";
  drop the implicit None fallback so non-OpenAI callers can't accidentally
  pick up a regional tag from a stray OpenAI hostname.
- _local_model_cost_map fixture now snapshots and restores
  litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak
  state across the session.

* refactor(openai): move data_residency helper under llms/openai

* fix: thread data_residency through realtime stream cost calculation

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(cost): thread data_residency through batch_cost_calculator

Apply the OpenAI regional-processing uplift multiplier to retrieve_batch
cost paths so Batch API requests served via eu./us.api.openai.com are
priced at the same uplifted token rates as completions/transcriptions.

* refactor(openai): encapsulate provider check inside infer_openai_data_residency

Move the custom_llm_provider == "openai" guard from get_litellm_params
into the helper itself so the core utility no longer carries
provider-specific dispatch logic. Callers pass through the provider
unconditionally; the helper returns None for any non-OpenAI provider.

* fix(responses): thread data_residency through Responses logging params

The Responses API paths build their logging litellm_params dict after
provider resolution but did not include data_residency, so cost calc
saw None even when the effective api_base was a regional OpenAI host.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
…28112)

* chore(admin-ui): regenerate static export with trailingSlash: true

Rebuilds litellm/proxy/_experimental/out/ from ui/litellm-dashboard with
`trailingSlash: true` enabled in next.config.mjs. Next.js now emits every
route as <dir>/index.html (e.g. mcp/oauth/callback/index.html) instead of
<dir>.html with a sibling metadata-only directory, which fixes the 404 on
extensionless URLs served through FastAPI's StaticFiles(html=True) mount.

This is the build artifact half of the fix; the config change, Dockerfile
cleanup, and regression test live in the follow-up source PR that stacks
on top of this branch.

* fix(admin-ui): emit nested routes as <dir>/index.html (#28106)

Linear and other OAuth providers redirect the user back to
/ui/mcp/oauth/callback?code=...&state=... after the consent step. The
packaged Next.js static export only produced /ui/mcp/oauth/callback.html,
so FastAPI's StaticFiles served a 404 on the extensionless URL and the
OAuth handshake never completed.

The Dockerfile.non_root build step tried to paper over this at image-build
time with `for html_file in *.html; do ...`, but that shell glob does not
recurse, so nested routes like mcp/oauth/callback.html were left stranded
next to an empty mcp/oauth/callback/ directory containing only Next.js
metadata. The runtime restructure step in proxy_server.py was then skipped
because the .litellm_ui_ready marker had already been dropped.

Set trailingSlash: true in the dashboard's Next.js config so the export
emits every nested route as <dir>/index.html natively. The Dockerfile loop
is now a no-op for the bundled UI and has been removed; the
.litellm_ui_ready marker is still written so the proxy keeps skipping the
redundant Python restructure step at startup. Stacks on top of the static
export regeneration in the parent branch.

* chore: restore origin/litellm_internal_staging out files
* fix(azure): preserve AD token refresh in v1 OpenAI client path

The /openai/v1/ code path (api_version in {"v1", "latest", "preview"})
constructs a plain OpenAI/AsyncOpenAI client, but only forwarded
`api_key` from `azure_client_params`. When `enable_azure_ad_token_refresh`
is set (or any AD-only auth), `api_key` is None and the client
constructor raised "The api_key client option must be set...", breaking
every Azure call with a v1 api_version.

The OpenAI SDK (>=2.20.0) accepts a callable for `api_key` and re-invokes
it on every request via `_refresh_api_key`, so we now forward
`azure_ad_token_provider` directly — preserving the per-request token
refresh behavior of the regular AzureOpenAI client and avoiding the
expiry hole that resolving the token once at client-creation time would
introduce. Static `azure_ad_token` strings fall through to `api_key`.

For the async path we wrap the sync provider returned by azure-identity
in an async function since AsyncOpenAI expects `Callable[[], Awaitable[str]]`.

Fixes #27945

https://claude.ai/code/session_01UnzrDSFUUgp5T2wRoPMxq5

* fix(azure): offload sync token provider to thread in v1 async wrapper

* fix(azure): include AD credential identity in v1 client cache key

---------

Co-authored-by: Claude <noreply@anthropic.com>
* fix(ui): route API Reference back to query-param page

The path-based /ui/api-reference route was broken in practice — the
page-local useProxySettings hook didn't match what the root page passes
down. Remove api_ref from the migration maps (LEGACY_REDIRECTS in
app/page.tsx, MIGRATED_PAGES in leftnav.tsx and (dashboard)/layout.tsx),
point the leftnav item back at page="api_ref", and restore the api_ref
render branch in the root page. The path-based page.tsx and the
useProxySettings hook stay in place unchanged; only api_ref is moved
back to query-param routing while the migration infrastructure is
preserved for future page moves.

* fix(ui): alias ?page=api-reference to api_ref branch

Handles bookmarks of the hyphen-form query param that was live during
the brief path-based migration window, so they render the working
APIReferenceView instead of falling through to the default page.
…8719)

* fix(model-edit): allow clearing custom input/output cost on wildcard deployments

A user-set pricing override on a `/model/*` wildcard deployment could not
be removed: clearing the Input/Output Cost fields in the UI succeeded
visually, but the next read still showed the old values because both
`litellm_params` and `model_info` (mirrored via `SPECIAL_MODEL_INFO_PARAMS`)
retained the original rates.

UI: when the pricing field is touched but left empty, send `null` instead
of dropping it from the payload so the backend sees the clear intent. The
cache-read-cost fallback now guards against `null` as well as `undefined`
so a cleared input cost cannot silently wipe the cache-read override.

Backend: `update_db_model` honors explicit-null clears, but ONLY for
`SPECIAL_MODEL_INFO_PARAMS` (the 4 pricing fields). Restricting the
null-clear path prevents a team-scoped caller from using this codepath to
null out privileged fields like `team_id` or access groups.

Tests cover both clear paths (`litellm_params` and `model_info`), the
SPECIAL_MODEL_INFO_PARAMS mirror, PATCH semantics for omitted fields, and
the security guard that non-pricing nulls don't reach the merged dict.

Resolves LIT-3250

* fix(model-edit): run null-clears after both merges, not interleaved

The previous version cleared `model_info` from inside the litellm_params
merge block, but the subsequent `model_info.update(...)` re-injected the
old pricing because the UI's PATCH carries the full model_info blob with
the stale values still in it. Move the explicit-null clear pass to after
both merges so a model_info passthrough cannot resurrect cleared fields.

Adds a regression test for the realistic UI submit shape (both blobs in
the patch, model_info still holding the old pricing).

* test(e2e): clear-custom-pricing flow with create/delete cleanup

Covers the dashboard model edit form's pricing-clear flow end-to-end:
seeds a deployment with custom input/output pricing, drives the UI to
clear both fields, asserts the outgoing PATCH sends explicit nulls,
and confirms via /v2/model/info that the override is gone from both
litellm_params and model_info.

The dashboard DB persists across this suite, so beforeEach creates a
uniquely-named deployment and afterEach POSTs /model/delete to leave
the DB clean regardless of test outcome.

* fix(model-edit): extend pricing clear to cache_read and cache_write costs

Pre-existing parallel of the wildcard input/output cost bug: cleared
cache_read_input_token_cost and cache_creation_input_token_cost overrides
silently persisted because the UI omitted the key (delete or fallback) and
the backend null-clear allowlist did not cover them.

- types/router.py: add cache_read_input_token_cost and
  cache_creation_input_token_cost to SPECIAL_MODEL_INFO_PARAMS, so they are
  mirrored between litellm_params and model_info by Deployment.__init__ and
  honoured by the null-clear loop in update_db_model.
- model_info_view.tsx: emit explicit null for touched-but-empty cache_read
  and cache_write fields. Preserve the input_cost->cache_read mirror only
  when cache_read itself was not touched.
- model_management_endpoints.py: update the allowlist comment.
- Tests: three new unit tests for cache clear paths and a preserve check;
  the e2e spec now seeds, clears, and asserts null PATCH + key-absence for
  all four pricing fields.
…ero VCR misses on consecutive runs) (#28826)

* test(vcr): make Redis-backed cassettes replay deterministically across runs

- Pin LITELLM_LOCAL_MODEL_COST_MAP=True in the shared VCR harness so the
  per-test importlib.reload(litellm) no longer fetches the model cost map
  from raw.githubusercontent.com. That live fetch was being recorded into
  cassettes; for tests that subsequently skip it was the only recorded
  episode, so the persister refused to save it (skipped tests don't persist)
  and the test re-recorded it live every run (MISS:NOT_PERSISTED).

- Compare-time symmetric matcher tolerance for Google OAuth (ya29.*) tokens,
  observability/telemetry payloads, credential-exchange bodies, and volatile
  UUID/timestamp tokens, so existing cassettes select a recorded episode
  instead of growing past the 50-episode cap and re-recording live.

- Don't record fire-and-forget telemetry (langfuse/arize/otel/...) into
  non-telemetry tests' cassettes. Several modules set litellm.success_callback
  at import time, so observability logging is globally enabled and an async
  flush from the background logging worker lands in an unrelated test's VCR
  window, saved as a spurious MISS:RECORDED (observed: a Langfuse batch from
  another completion landing on test_lowest_latency_routing_buffer). Such a
  request now passes through live (telemetry hosts aren't real-spend hosts);
  tests that actually assert on telemetry keep recording it.

- Dedupe + cap the VCR diagnostic dump so the classification summary survives
  CircleCI's ~400KB step-output truncation.

- Stabilize a non-deterministic rate-limit test body; mark AWS Secrets Manager
  lifecycle tests VCR-incompatible (uniquely-named secrets can't be replayed).

- Mark test_router_text_completion_client VCR-incompatible: it fires 300
  identical requests to verify async-client reuse, but vcrpy patches the HTTP
  transport so replay never exercises the real connection pool the test
  validates, and recording 300 near-identical episodes overflows the
  50-episode cap (MISS:OVERFLOW every run). It hits a free mock endpoint.

- Mark the Vertex AI MaaS Mistral OCR tests (vertex_ai/mistral-ocr-2505)
  VCR-incompatible: the MaaS model is not provisioned in the CI GCP project,
  so the live :rawPredict call fails and the test skips every run, leaving no
  cassette to record (MISS:NOT_PERSISTED every run). Sibling direct-Mistral
  and Azure OCR tests are unaffected and still replay from cache.

* fix(tests/vcr): refresh cassette TTL on read so replayed cassettes don't expire

The Redis VCR persister loaded cassettes with a plain GET, which does not
touch the key's TTL. A cassette that is only ever replayed (HIT/NOOP, never
re-recorded) therefore expired exactly 24h after its last *write*, no matter
how often it was read. Whichever CI run happened to cross that boundary
re-recorded the cassette live and surfaced a spurious VCR MISS on otherwise
deterministic cassettes — the residual per-run flakiness floor (a different
random subset of read-only cassettes expiring each run).

Slide the expiry forward on every successful load (best-effort EXPIRE), so
any cassette used at least once per TTL window stays alive indefinitely and
the 2nd/3rd run of a day replays cleanly.

* fix(tests/vcr): recover from spurious GET-None for existing cassette keys

Under concurrent CI load, the persister's load GET was observed returning
None for a cassette key that demonstrably existed on the (single, non-
clustered) Redis master — an external monitor saw the key present with a
healthy TTL at the same instant the in-process client read None. Because
None is a valid GET result (not a RedisError), the retry-on-error client
config never engaged, so the cassette re-recorded live (a phantom
MISS:RECORDED); for flaky/networked tests the failed live call then
triggered a pytest rerun, which is why a rotating subset of otherwise
deterministic tests missed each run.

On a None result, re-check EXISTS and re-read once. If the key really
exists, use the recovered value and log [vcr-transient-miss-recovered]
(also counted in cassette_cache_health). A genuinely absent key (a new
cassette) still falls through to CassetteNotFoundError.

* chore(tests/vcr): TEMP diagnostic for persistent-miss cassette load path

Logs GET/EXISTS at load time for the three cassettes that re-record every
run despite being present in Redis, to capture what the in-process client
sees. To be reverted before merge.

* chore(tests/vcr): write load diagnostic to Redis (truncation-proof)

CI stdout truncates to the last ~400KB, dropping the early loaddbg lines
for the alphabetically-first failing test. Push the load probe to a Redis
list instead so it survives. To be reverted before merge.

* fix(tests/vcr): don't drop stored telemetry episodes during cassette load

Root cause of the residual per-run misses on present cassettes: vcrpy's
Cassette._load() replays each *stored* interaction through Cassette.append(),
which runs before_record_request on it — and a None return there silently
drops that episode. The telemetry-leak suppressor (_should_drop_telemetry_record)
returns None for telemetry requests, so when a non-telemetry-named test (or the
alphabetically-first test in a worker, whose _current_test_nodeid is still empty)
loaded a cassette containing a Langfuse ingestion episode, the episode was
dropped on read — forcing an endless live re-record (a phantom MISS:RECORDED on
a cassette that was demonstrably present in Redis). Verified by reproducing
Cassette._load() against the real cassette: empty/non-telemetry nodeid -> 0
episodes survive; with the guard -> 1 survives.

Fix: guard the suppressor with a thread-local set around Cassette._load (via a
small idempotent monkeypatch), so the drop only ever stops *new* incidental
telemetry from being recorded and never filters the existing cassette on read.

Also drops the speculative GET-None recovery + its diagnostics from the previous
commits: the load diagnostic showed GET returns the cassette bytes fine
(get=1440B), so the persister never returned a spurious None — the loss happened
later in vcrpy's append. The proven TTL-refresh-on-read fix is retained.

* fix(tests/vcr): drop incidental telemetry export POSTs to stop rotating async-flush misses

litellm's observability loggers flush on a background thread, so a Langfuse
ingestion POST scheduled by one telemetry test can fire mid-way through a
*later* telemetry-named test (after that test's own httpx mock has exited) and
be recorded by VCR as a phantom episode — a non-deterministic MISS:RECORDED /
PARTIAL that rotates onto a different telemetry test from run to run.

Telemetry export POSTs are fire-and-forget; no test asserts on a *recorded*
export response except the pass-through proxy test (which forwards a client POST
to Langfuse ingestion and replays its 207). So _should_drop_telemetry_record now
drops incidental export POSTs for every test except that one. Dropping returns
None (live fire-and-forget, never stored), so it can only turn a phantom miss
into a harmless live call, never the reverse; recorded read-back GETs that
telemetry tests assert on are matched by method and left untouched.

* fix(tests/vcr): restore assertion in test_banner_silent_when_vcr_disabled

The assertion that the banner is suppressed when VCR is disabled was
inadvertently moved into test_diagnostic_log_silent_when_no_dir when
the diagnostic-log tests were added, leaving the disabled-VCR test
verifying nothing.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
…28425)

* fix(proxy): strip LiteLLM policy tracking from OpenAI batch metadata

Batch create was failing with `Invalid type for 'metadata.applied_policies':
expected a string, but got an array instead` whenever a policy attachment
matched the request. The policy engine helpers wrote `applied_policies`,
`applied_guardrails`, and `policy_sources` into `data["metadata"]`
unconditionally, and `/v1/batches` forwarded that dict straight to OpenAI,
which only accepts string values.

- Route proxy-internal tracking into `litellm_metadata` for batch/file
  routes via a shared `_get_or_create_proxy_metadata_bucket` helper.
- Sanitize `data["metadata"]` in `create_batch` to drop known internal
  keys and non-string values before building the OpenAI request.
- Cover both behaviors with unit + endpoint tests.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(proxy): merge metadata buckets for batch policy response headers

Ensure get_logging_caching_headers reads both metadata and litellm_metadata so policy/guardrail headers are emitted on batch routes with user metadata, and log dropped non-string OpenAI metadata at debug level.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
* Fix token cost lookup when deployment ids repeat the provider prefix.

Router configs may expose models like openai/openai/<model>; normalize those
strings before joining provider/model so model_cost resolves correctly.

Co-authored-by: Cursor <cursoragent@cursor.com>

* scope duplicate-prefix cost fix to explicit providers; isolate test patch state

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mypy): narrow custom_llm_provider after resolution in cost_per_token

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(cost_calculator): guard provider-prefix dedup against non-string model

The provider-prefix dedup loop assumed `model` is always a string. When a
non-string is passed (e.g. a MagicMock from a mocked transport in router
tests), `model.startswith(...)` is always truthy and each slice returns a new
object, so the loop never terminates — it spins and OOM-kills the test worker
(observed as the litellm_router_testing CI regression, e.g.
test_router_pattern_match_e2e). Only run the string-based dedup and prefix-join
when `model` is actually a str, preserving the previous graceful behavior for
non-string inputs.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* fix(mcp): handle OAuth IdP error responses in /callback (LIT-2750)

Per RFC 6749 section 4.1.2.1, when the IdP rejects an OAuth authorization
request it redirects back to the client with ?error=...&error_description=...
and no code. The MCP /callback handler declared code and state as required
query params, so FastAPI rejected such error responses with a 422 before
the handler ran -- stranding the MCP client waiting on the loopback.

This change:
- Makes code and state optional and accepts the RFC-defined error,
  error_description, and error_uri params.
- When state decodes to a trusted client redirect_uri, propagates the
  error params back to that URI with the client's original (un-wrapped)
  state preserved, so the client's OAuth library can surface the failure.
- When state is missing/undecryptable or the encoded redirect_uri is no
  longer trusted, renders a 400 HTML page with the (HTML-escaped) error
  details instead of leaking to an attacker-controlled redirect.
- Preserves the existing success path (code + state -> 302 to validated
  client redirect_uri with original state).

Fixes LIT-2750.

* test(mcp): regression tests for /callback handling IdP error responses (LIT-2750)

Adds a new test module covering the LIT-2750 fix: the MCP OAuth /callback
endpoint must accept IdP error responses (e.g. ?error=access_denied) per
RFC 6749 section 4.1.2.1 instead of returning a 422 because ``code`` is missing.

Coverage:
- IdP error with no state -> 400 HTML page surfacing the error.
- HTML escaping of user-controlled error / error_description fields.
- IdP error with a trusted (loopback) state -> 302 propagating
  error / error_description / original client state to the client.
- IdP error with an untrusted redirect_uri encoded in state -> 400 inline
  (no open-redirect to attacker-controlled origin).
- IdP error with an undecryptable state -> 400 HTML fallback.
- Bare GET /callback with no params -> 400 HTML (not Pydantic 422).
- Success path (code + state) still 302 to validated client redirect_uri
  with the original (un-wrapped) state preserved.

* refactor(mcp): drop unused _OAUTH_ERROR_PARAMS constant (Greptile P2)

The tuple was leftover scaffolding from an earlier draft of the LIT-2750
fix; nothing references it. The explanatory RFC 6749 §4.1.2.1 comment block
above the callback handler covers the same intent.

* fix(mcp/oauth): preserve empty original_state and clarify missing-param error in /callback

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* fix: apply black formatting to base_llm chat transformation

Fix CI black --check failure on is_thinking_enabled return formatting.

Co-authored-by: Cursor <cursoragent@cursor.com>

* merge main (#28836)

* fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body (#27526)

* Fix Bedrock KB pass-through SigV4 headers and signed body

Coerce botocore HeadersDict to a dict for pass-through routes. When
forward_headers is true, drop request headers that collide case-insensitively
with signed headers so client Bearer auth does not shadow AWS SigV4.
Send prepped.body as raw content so the outbound payload matches the
signature after logging hooks mutate the parsed dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Simplify pass-through raw body handling

Read the SigV4-signed bytes directly from request.state inside
pass_through_request instead of threading a custom_raw_body argument
through three functions. Helper methods are restored to their original
signatures, and the new branch lives in one place at each httpx call site.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Harden pass-through raw body read from request.state

Guard missing request.state (test fixtures) and ignore non-bytes/str
values so MagicMock does not trigger the SigV4 raw-body path.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Test pass_through_request state_raw_body uses httpx content=

Cover non-streaming (async_client.request) and streaming (build_request)
paths so SigV4 bytes on request.state are not replaced by json= of a
hook-mutated dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728)

* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214

The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).

Changes:
  - Replace 26 hardcoded references to 888602223428 with 941277531214 across
    8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
    ARNs, batch execution role ARN, and example proxy config).
  - The provisioned-model and imported-model ARNs are referenced only from
    mocked unit tests — no AWS resources to recreate.
  - The batch execution IAM role has been recreated in the new account with
    the same name and equivalent permissions.
  - The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
    hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
    under the same names — see tools/agentcore-deploy/ in a follow-up.

CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.

Smoke-tested locally against the new account:
  aws bedrock-runtime converse --region us-west-2 \
    --model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
    --messages '[{"role":"user","content":[{"text":"ping"}]}]'
  → 200, model returned 'pong'

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes

The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).

Deployed runtimes:
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy

Both runtimes are status=READY and pass a smoke invoke:
  $ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
  → 200, {"result": "echo: ping"}

The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): point Bedrock batch tests at new-account S3 bucket

The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.

Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): point live S3 logging test at new-account bucket

Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.

Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails

The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
  - wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
    with explicit inputAction=ANONYMIZE so masking applies to INPUT,
    which is the source litellm's moderation hook sends)
  - ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
    to the exact string the tests assert on)

Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): migrate legacy models to current inference profiles

The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
  - anthropic.claude-3-sonnet-20240229    -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
  - anthropic.claude-3-haiku-20240307     -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).

cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources

These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
  - SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
    -> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
  - Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
    vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)

claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.

Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): swap/skip legacy-gated models unavailable on new CI account

The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:

- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
  legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
  authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
  us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
  active us.anthropic.claude-sonnet-4-5 inference profile.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account

- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
  is not authorized on account 941277531214) and migrate the missed
  s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
  us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
  output e2e test.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)

Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
  instead of skipping, so the missing entitlement stays visible in CI; they
  still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
  batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
  transform + cost-tracking path stays under test without live model access

https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT

Co-authored-by: Claude <noreply@anthropic.com>

* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells

Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(otel): export SERVER span on management-endpoint success without http_request (#28794)

Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>

* chore(ci): merge dev branch (#28801)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* chore(ci): merge dev branch (#28657)

* feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543)

* feat(dashboard): refine navbar zones and Agent Platform notice

Restructure the admin navbar for production users: clear product vs community
vs personal columns with vertical dividers, icon-only Slack/GitHub in a
shared chip, and Docs/Blog typography aligned on an 8px rhythm.

Add a notifications bell with popover linking to the LiteLLM Agent Platform
repo and optional mark-as-read persistence.

Promote the account control with initials avatar, single-line display name,
and navDisplayName mapping for placeholder user ids (e.g. default_user_id).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex

- Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock
- Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages
- Remove redundant equality checks in navDisplayName (regex already covers them)
- Remove unused `lower` variable after simplification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(dashboard): drop dead useHealthReadiness import in navbar

The module was removed in #27896 (replaced by useHealthReadinessDetails),
but the import survived the rebase. The symbol is unused — only
useHealthReadinessDetails is consumed in the file. Removing the dead
import unblocks the UI TypeScript build.

* fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels

The component was refactored to an icon-only chip with aria-label='LiteLLM
on GitHub' (squash #27543), but the test still asserted /star us on
github/i. Update the query to match the rendered accessible name.

* refactor(dashboard): drop unused props from NavbarProps

The navbar refactor moved user identity + dark-mode state to internal
hooks (useAuthorized, useWorker), but the NavbarProps interface still
declared userID, userEmail, userRole, premiumUser, isDarkMode, and
toggleDarkMode as required, forcing every caller to thread them through.

Drop them from the interface and all four call sites (page.tsx,
(dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also
shrinks the destructure in layout.tsx so the now-unused locals stop
being pulled out of useAuthorized().

* refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag

Reads/writes of the litellmHideAgentPlatformBanner key were done
directly inside NotificationsBell via a useEffect + useState pair.
Every other localStorage-backed flag in the dashboard (Disable
ShowPrompts, DisableBouncingIcon, DisableShowNewBadge,
DisableUsageIndicator, DisableBlogPosts) is wrapped in a
useSyncExternalStore hook over localStorageUtils so all mounted
components stay in sync.

Extract useHideAgentPlatformBanner to follow the same shape, swap
NotificationsBell to consume it, and add a regression test that
two sibling bells stay in sync without a remount when one is
dismissed.

* refactor: mask credential fields in proxy settings GET responses (#28682)

* refactor: mask credential fields in proxy settings GET responses

Brings SSO settings, cache settings, and the email/Slack alerting view in
/get/config/callbacks in line with the HashiCorp Vault config-override
pattern, so persisted credentials are not transported back to the UI in
plaintext.

* refactor: harden short-value masking and hoist alerting var constant

Closes two review observations:

- mask_sensitive_keys now replaces short values (below the visible
  prefix+suffix length) with an all-mask string instead of returning them
  unchanged, so a 1-7 character credential is no longer round-tripped
  verbatim.
- _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level
  constant, matching the analogous _SSO_SENSITIVE_FIELDS and
  _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files.

---------

Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ui): show 2-decimal precision for max_budget on key overview (#28809)

The Key Info Overview tab's Spend card truncated sub-dollar budgets to
"$0" because formatNumberWithCommas defaults to 0 decimals. The Settings
tab passes 2; align the overview so a $0.10 budget renders as "$0.10".

Resolves LIT-2845

* feat(proxy): allow `llm_api_routes` virtual keys to list MCP servers (#28442)

* feat(proxy): allow llm_api_routes virtual keys to list MCP servers

Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET
/v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that
virtual keys configured with `allowed_routes=["llm_api_routes"]` can
discover the MCP servers they have access to. Previously these calls
failed with 'Virtual key is not allowed to call this route. Only allowed
to call routes: [llm_api_routes]'.

The GET handlers already sanitize the response for restricted virtual
keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping
credential-bearing fields (url, headers, env). Write methods
(POST/PUT/DELETE) on the same paths remain gated by the existing
handler-level admin role checks.

The new discovery list is intentionally kept OUT of
`mcp_inference_routes`, so `is_llm_api_route()` still returns False
for these paths — this preserves the existing contract that
DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP
servers.

Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* refactor(proxy): make MCP discovery carve-out method-aware

Replace the `mcp_discovery_routes` group in `llm_api_routes` with a
method-aware special case inside `is_virtual_key_allowed_to_call_route`.
Virtual keys with allowed_routes=["llm_api_routes"] are now permitted
to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} —
non-GET methods and multi-segment admin sub-paths fall through to the
existing 403. This keeps the general llm_api_routes list free of
management paths and avoids accidentally exposing POST/PUT/DELETE
writes through the route-check layer.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* chore(ci): merge dev branch (#28807)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* fix(team): keep team_alias cache in sync on _cache_team_object writes (#28737)

* fix(team): keep team_alias cache in sync on _cache_team_object writes

_cache_team_object wrote only to the team_id:<id> cache key, but the
JWT auth path that uses team_alias_jwt_field reads from a separate
team_alias:<alias> key (get_team_object_by_alias caches under both
keys on miss, but reads only the alias-keyed one). After any
team-mutation endpoint (team_model_add, team_model_delete,
update_team, the two access-group writes) the team_id cache was
refreshed but the team_alias cache stayed stale until TTL — JWT
callers using team_alias_jwt_field kept seeing the pre-mutation
team for the full cache window.

Mirror the write under the alias key inside _cache_team_object so
every existing caller stays in sync without further changes. Skip
the alias write when team_alias is None/empty so we don't collide
across alias-less teams.

Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the
LIT-3244 fix correctly invalidated the team_id cache but the
customer's JWT used team_alias_jwt_field, so they kept hitting the
stale alias-keyed entry.

* fix(team): delete (not overwrite) team_alias cache on _cache_team_object

The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias>
from _cache_team_object. team_alias is NOT unique in the schema
(no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias
enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises).
Writing the alias-keyed cache from the generic refresh path bypassed
that check: a team admin renaming their team to collide with another
team's alias could silently overwrite the cached team for JWT-by-alias
auth, swapping the resolved team under that alias for the cache window.

Switch the alias-keyed operation from a write to a delete (mirroring
the dual-cache delete pattern in _delete_cache_key_object). After every
team write, the next JWT-by-alias reader cache-misses and falls through
to get_team_object_by_alias, which (a) re-fetches the fresh team from
DB, closing the LIT-3244 staleness gap that motivated this PR, and
(b) enforces alias uniqueness before populating either cache key.

team_id:<id> writes are unchanged — team_id is the table PK and is
guaranteed unique.

Surfaced in veria-ai review on #28739.

* fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id

extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)`
which substring-matches the `model_id,` inside the file-ID encoding's
`llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id
then fed that deployment UUID back into the auth path as a model
candidate via _extract_models_from_managed_resource_id, and every
team-BYOK file attach 403'd with:

    team not allowed to access model. This team can only access
    models=['openai/*']. Tried to access <deployment-uuid>

The team's models list correctly contains the public name (`openai/*`)
that target_model_names matches, but the bogus UUID candidate fails
the wildcard check first.

Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it
matches the legitimate top-level `model_id,<value>` field on
vector_store unified IDs and skips substring matches inside other
fields. File-IDs (which have no top-level `model_id` field) now
return None and contribute no spurious UUID candidate.

Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's
exact flow: team with openai/* BYOK deployment, JWT-scoped user,
POST /v1/vector_stores/{id}/files attaching a file uploaded with
target_model_names=openai/gpt-4o.

* fix(proxy): hydrate wildcard discovery credentials (#28284) (#28822)

* fix(proxy): hydrate wildcard discovery credentials

* fix(proxy): constrain wildcard credential hydration

Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>

* ci: add daily oss-agent-shin branch creation workflow (#28829)

Creates litellm_oss_agent_shin_MM_DD_YYYY from main every day at 00:00 UTC.
Lets us retarget oss-agent-shin fork PRs onto a canonical branch so CircleCI runs with secrets, without granting the agent write access.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* test(proxy): add harness for proxy_server.py behavior-pinning (#28827)

* test(proxy): add harness for proxy_server.py behavior-pinning

Creates tests/test_litellm/proxy/proxy_server/ with:
- conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as,
  mock_router with parametrized response builders, normalize, etc.)
- _coverage_check.py: per-PR coverage gate (line + branch) against a
  baseline, self-selects target by inspecting which placeholder files
  have been filled
- _pin_check.py: AST-based gate that verifies every pin-list item has
  >=1 happy + >=1 error test with a real assertion (no status-only)
- test_harness_smoke.py: 19 smoke tests covering every fixture +
  both scripts end-to-end
- 26 placeholder test files (one docstring each) reserved for
  follow-up PRs per the directory ownership in the Notion plan
- .coverage_baseline pinned at 0% so future PRs measure deltas
  against new-tests-only and aren't entangled with the broader
  scattered test suite

Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml
so this directory's runtime + coverage are tracked independently.

Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc

* ci(proxy-endpoints): allow workflow_dispatch

Lets the workflow be triggered manually on a branch via
`gh workflow run`, which is needed for the verify-first
flow on workflow changes before opening a PR.

* test(proxy): address review feedback on proxy_server harness

- conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4])
  instead of CWD-relative os.path.abspath("../../../../") which resolved
  to the wrong directory when pytest is launched from the repo root.
- _coverage_check.py: actually read .coverage_baseline and use it as
  the floor (line_min = max(target, baseline)). Closes the gap between
  the PR description's "delta semantics" and what the script was doing.
  With baseline=0.0 today this is a no-op; future PRs that update the
  baseline cause regressions (test deletions etc.) to trip the gate
  even if the static PR target is still met.
- _pin_check.py: drop unreachable startswith("_") guard
  (test_*.py glob never yields underscore-prefixed names) and read
  each test file once instead of twice.

* feat(openai): apply regional-processing cost uplift for EU/US data residency (#28626)

* feat(openai): apply regional-processing cost uplift for EU/US data residency

OpenAI charges a 10% uplift on the latest GPT models when requests are
served from a regionalized hostname (eu./us.api.openai.com).  Infer the
region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`,
and multiply the computed cost by a per-model
`regional_processing_uplift_multiplier_<region>` field.

https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW

* test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema

* fix(cost): tighten data_residency inference and restore model_cost in tests

- Only infer OpenAI data_residency when custom_llm_provider == "openai";
  drop the implicit None fallback so non-OpenAI callers can't accidentally
  pick up a regional tag from a stray OpenAI hostname.
- _local_model_cost_map fixture now snapshots and restores
  litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak
  state across the session.

* refactor(openai): move data_residency helper under llms/openai

* fix: thread data_residency through realtime stream cost calculation

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(cost): thread data_residency through batch_cost_calculator

Apply the OpenAI regional-processing uplift multiplier to retrieve_batch
cost paths so Batch API requests served via eu./us.api.openai.com are
priced at the same uplifted token rates as completions/transcriptions.

* refactor(openai): encapsulate provider check inside infer_openai_data_residency

Move the custom_llm_provider == "openai" guard from get_litellm_params
into the helper itself so the core utility no longer carries
provider-specific dispatch logic. Callers pass through the provider
unconditionally; the helper returns None for any non-OpenAI provider.

* fix(responses): thread data_residency through Responses logging params

The Responses API paths build their logging litellm_params dict after
provider resolution but did not include data_residency, so cost calc
saw None even when the effective api_base was a regional OpenAI host.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* fix: preserve OTEL response payload and remove duplicate constant

- _emit_management_endpoint_otel_span now passes result as response on success
- remove duplicate _CREDENTIAL_LITELLM_PARAM_FIELDS assignment in model_checks

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: address bug detection findings

- pass_through_endpoints: use request.method instead of hardcoded POST
  in streaming SigV4-signed request path for consistency with the
  non-streaming branch
- llm_cost_calc/utils: hoist DataResidency value set to a module-level
  frozenset to avoid rebuilding it on every cost calculation
- example_config_yaml/oai_misc_config: replace real-looking AWS account
  ID with placeholder 123456789012 in example bucket and role ARN

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* chore(github_copilot): refresh model catalog from upstream /models API (#28055)

Aligns the github_copilot catalog with values returned by Copilot's
public /models endpoint (capabilities.limits + capabilities.supports +
model.supported_endpoints).

- Adds 10 new model entries: claude-opus-4.7, claude-sonnet-4.6,
  gemini-3-flash-preview, gemini-3.1-pro-preview, gpt-4-0125-preview,
  gpt-5.2-codex, gpt-5.4, gpt-5.4-mini, gpt-5.5, oswe-vscode-prime.
- Updates max_input_tokens for existing entries to reflect each
  model's true context window (e.g. gpt-4o-mini 64000 -> 128000,
  gpt-5-mini 128000 -> 264000, gpt-5.3-codex 128000 -> 400000,
  claude-haiku-4.5 128000 -> 200000).
- Adds supports_reasoning, supports_response_schema,
  supports_function_calling, supports_parallel_function_calling,
  supports_vision based on capabilities.supports.
- Declares supported_endpoints for entries missing it
  (e.g. gpt-3.5-turbo, gpt-4o, embeddings).
- For responses-only models (gpt-5.2-codex, gpt-5.4, gpt-5.4-mini,
  gpt-5.5), sets mode to 'responses'.
- gpt-41-copilot.mode changes from 'completion' to 'chat' because
  Copilot reports capabilities.type = 'chat'. Revertible on request.

Pricing fields and other manually-curated values are preserved.

* feat(datadog): emit litellm.overhead.latency as a standalone Datadog metric (#28831)

Adds a new `litellm.overhead.latency` gauge metric to `DatadogMetricsLogger`
(the `/api/v2/series` path). The value is sourced from
`hidden_params["litellm_overhead_time_ms"]` already computed in
`ResponseMetadata` and exposed in `StandardLoggingPayload`.

Matches the Prometheus integration which exposes the same value via
`litellm_overhead_latency_metric`. Emitted in seconds (ms ÷ 1000) for
consistency with the other latency series.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Shin <shin@litellm.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>

* feat(arize): route Phoenix traces via per-project TracerProviders (#28876)

Use LRU-cached TracerProviders with project-scoped OTEL Resources so team/key
metadata routes traces correctly. On the proxy, project selection is limited to
server-controlled user_api_key_auth_metadata; client metadata fields stay banned.

* fix(arize_phoenix): skip _emit_semantic_logs on failure path

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(arize_phoenix): skip raw request logging and metrics on failure path

Restores pre-refactor behavior: _handle_failure no longer emits raw-request
sub-spans or records OTEL metrics, matching the original _handle_failure
that did not call these helpers.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(security): close two medium telemetry trust-boundary issues

Issue 1 (arize_phoenix.py — caller-controlled telemetry routing):
- _is_proxy_request no longer detects proxy mode by checking
  user_api_key_auth_metadata in request metadata.  That field is
  user-supplied, so an authenticated caller could fake proxy-mode
  detection and have _project_from_metadata_dict read their own dict
  for project selection, routing telemetry to arbitrary Arize/Phoenix
  projects.  Proxy mode is now determined solely by the server-set
  proxy_server_request field in litellm_params.
- auth_utils.py adds user_api_key_auth_metadata to the banned request
  body params list so the proxy rejects any attempt to supply the field
  at the HTTP layer.  The field is server-reserved: it is written
  exclusively by add_user_api_key_auth_to_request_metadata from the
  authenticated key's database record after the ban check runs.

Issue 2 (management_helpers/utils.py — API key in OTEL span):
- _emit_management_endpoint_otel_span stripped plaintext credential
  fields (key, token, api_key, secret, …) from the response dict before
  passing it to the OTEL success hook.  dict(result) on a Pydantic
  GenerateKeyResponse includes the freshly-generated key field, which
  would previously be written as a span attribute to every configured
  OTEL collector/backend.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com>
Co-authored-by: Shin <shin@litellm.ai>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
* fix(bedrock): align toolUse/toolSpec names and allow hyphens in tool names

Sanitize tool names consistently in tool history and tool_choice, and
preserve hyphens per current Bedrock [a-zA-Z0-9_-]+ constraint.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(bedrock): docstring matches alpha-first tool name normalization

Greptile: pattern is [a-zA-Z][a-zA-Z0-9_-]* to reflect prepend-'a' behavior.
Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…28848)

* fix(realtime): send TEXT frames and valid guardrail session.update

Decode backend recv bytes before send_text so clients receive OP_TEXT JSON
events. Include turn_detection.type server_vad in injected session.update.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(realtime): skip non-UTF-8 backend binary frames

Avoid terminating the forwarding loop on UnicodeDecodeError when the
backend sends unexpected binary payloads.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
A key's unified access_group_ids now extend the team's MCP scope instead
of being capped by it — mirrors the model-side union from LIT-2404. The
group's assigned_team_ids / assigned_key_ids still gate the override, so
team members can't pull in MCPs via a foreign team's group.

Resolves LIT-3189
#28771)

* fix(galileo): support hosted v2 spans API and string output extraction

Use GALILEO_API_KEY with /v2/projects/{id}/spans for Galileo Cloud,
keep legacy observe/ingest for username/password deployments, and
extract assistant content as a string instead of a message dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(galileo): address review — async enterprise auth and message input

Use async httpx for enterprise login to avoid blocking the event loop,
preserve multi-turn messages in v2 span input, and clean up tests.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(galileo): handle negative TZ offsets, 2xx success, and Pydantic ImageObject serialization

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(galileo): treat any 2xx ingest response as success

Use response.is_success so 201 Created clears in_memory_records and
avoids duplicate span submissions on subsequent flushes.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(galileo): cast message dict for mypy in convert_content_list_to_str

Co-authored-by: Cursor <cursoragent@cursor.com>

* merge main (#28835)

* fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body (#27526)

* Fix Bedrock KB pass-through SigV4 headers and signed body

Coerce botocore HeadersDict to a dict for pass-through routes. When
forward_headers is true, drop request headers that collide case-insensitively
with signed headers so client Bearer auth does not shadow AWS SigV4.
Send prepped.body as raw content so the outbound payload matches the
signature after logging hooks mutate the parsed dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Simplify pass-through raw body handling

Read the SigV4-signed bytes directly from request.state inside
pass_through_request instead of threading a custom_raw_body argument
through three functions. Helper methods are restored to their original
signatures, and the new branch lives in one place at each httpx call site.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Harden pass-through raw body read from request.state

Guard missing request.state (test fixtures) and ignore non-bytes/str
values so MagicMock does not trigger the SigV4 raw-body path.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Test pass_through_request state_raw_body uses httpx content=

Cover non-streaming (async_client.request) and streaming (build_request)
paths so SigV4 bytes on request.state are not replaced by json= of a
hook-mutated dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728)

* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214

The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).

Changes:
  - Replace 26 hardcoded references to 888602223428 with 941277531214 across
    8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
    ARNs, batch execution role ARN, and example proxy config).
  - The provisioned-model and imported-model ARNs are referenced only from
    mocked unit tests — no AWS resources to recreate.
  - The batch execution IAM role has been recreated in the new account with
    the same name and equivalent permissions.
  - The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
    hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
    under the same names — see tools/agentcore-deploy/ in a follow-up.

CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.

Smoke-tested locally against the new account:
  aws bedrock-runtime converse --region us-west-2 \
    --model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
    --messages '[{"role":"user","content":[{"text":"ping"}]}]'
  → 200, model returned 'pong'

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes

The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).

Deployed runtimes:
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy

Both runtimes are status=READY and pass a smoke invoke:
  $ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
  → 200, {"result": "echo: ping"}

The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): point Bedrock batch tests at new-account S3 bucket

The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.

Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): point live S3 logging test at new-account bucket

Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.

Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails

The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
  - wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
    with explicit inputAction=ANONYMIZE so masking applies to INPUT,
    which is the source litellm's moderation hook sends)
  - ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
    to the exact string the tests assert on)

Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): migrate legacy models to current inference profiles

The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
  - anthropic.claude-3-sonnet-20240229    -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
  - anthropic.claude-3-haiku-20240307     -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).

cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources

These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
  - SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
    -> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
  - Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
    vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)

claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.

Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): swap/skip legacy-gated models unavailable on new CI account

The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:

- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
  legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
  authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
  us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
  active us.anthropic.claude-sonnet-4-5 inference profile.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account

- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
  is not authorized on account 941277531214) and migrate the missed
  s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
  us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
  output e2e test.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)

Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
  instead of skipping, so the missing entitlement stays visible in CI; they
  still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
  batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
  transform + cost-tracking path stays under test without live model access

https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT

Co-authored-by: Claude <noreply@anthropic.com>

* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells

Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(otel): export SERVER span on management-endpoint success without http_request (#28794)

Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>

* chore(ci): merge dev branch (#28801)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* chore(ci): merge dev branch (#28657)

* feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543)

* feat(dashboard): refine navbar zones and Agent Platform notice

Restructure the admin navbar for production users: clear product vs community
vs personal columns with vertical dividers, icon-only Slack/GitHub in a
shared chip, and Docs/Blog typography aligned on an 8px rhythm.

Add a notifications bell with popover linking to the LiteLLM Agent Platform
repo and optional mark-as-read persistence.

Promote the account control with initials avatar, single-line display name,
and navDisplayName mapping for placeholder user ids (e.g. default_user_id).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex

- Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock
- Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages
- Remove redundant equality checks in navDisplayName (regex already covers them)
- Remove unused `lower` variable after simplification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(dashboard): drop dead useHealthReadiness import in navbar

The module was removed in #27896 (replaced by useHealthReadinessDetails),
but the import survived the rebase. The symbol is unused — only
useHealthReadinessDetails is consumed in the file. Removing the dead
import unblocks the UI TypeScript build.

* fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels

The component was refactored to an icon-only chip with aria-label='LiteLLM
on GitHub' (squash #27543), but the test still asserted /star us on
github/i. Update the query to match the rendered accessible name.

* refactor(dashboard): drop unused props from NavbarProps

The navbar refactor moved user identity + dark-mode state to internal
hooks (useAuthorized, useWorker), but the NavbarProps interface still
declared userID, userEmail, userRole, premiumUser, isDarkMode, and
toggleDarkMode as required, forcing every caller to thread them through.

Drop them from the interface and all four call sites (page.tsx,
(dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also
shrinks the destructure in layout.tsx so the now-unused locals stop
being pulled out of useAuthorized().

* refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag

Reads/writes of the litellmHideAgentPlatformBanner key were done
directly inside NotificationsBell via a useEffect + useState pair.
Every other localStorage-backed flag in the dashboard (Disable
ShowPrompts, DisableBouncingIcon, DisableShowNewBadge,
DisableUsageIndicator, DisableBlogPosts) is wrapped in a
useSyncExternalStore hook over localStorageUtils so all mounted
components stay in sync.

Extract useHideAgentPlatformBanner to follow the same shape, swap
NotificationsBell to consume it, and add a regression test that
two sibling bells stay in sync without a remount when one is
dismissed.

* refactor: mask credential fields in proxy settings GET responses (#28682)

* refactor: mask credential fields in proxy settings GET responses

Brings SSO settings, cache settings, and the email/Slack alerting view in
/get/config/callbacks in line with the HashiCorp Vault config-override
pattern, so persisted credentials are not transported back to the UI in
plaintext.

* refactor: harden short-value masking and hoist alerting var constant

Closes two review observations:

- mask_sensitive_keys now replaces short values (below the visible
  prefix+suffix length) with an all-mask string instead of returning them
  unchanged, so a 1-7 character credential is no longer round-tripped
  verbatim.
- _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level
  constant, matching the analogous _SSO_SENSITIVE_FIELDS and
  _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files.

---------

Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ui): show 2-decimal precision for max_budget on key overview (#28809)

The Key Info Overview tab's Spend card truncated sub-dollar budgets to
"$0" because formatNumberWithCommas defaults to 0 decimals. The Settings
tab passes 2; align the overview so a $0.10 budget renders as "$0.10".

Resolves LIT-2845

* feat(proxy): allow `llm_api_routes` virtual keys to list MCP servers (#28442)

* feat(proxy): allow llm_api_routes virtual keys to list MCP servers

Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET
/v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that
virtual keys configured with `allowed_routes=["llm_api_routes"]` can
discover the MCP servers they have access to. Previously these calls
failed with 'Virtual key is not allowed to call this route. Only allowed
to call routes: [llm_api_routes]'.

The GET handlers already sanitize the response for restricted virtual
keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping
credential-bearing fields (url, headers, env). Write methods
(POST/PUT/DELETE) on the same paths remain gated by the existing
handler-level admin role checks.

The new discovery list is intentionally kept OUT of
`mcp_inference_routes`, so `is_llm_api_route()` still returns False
for these paths — this preserves the existing contract that
DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP
servers.

Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* refactor(proxy): make MCP discovery carve-out method-aware

Replace the `mcp_discovery_routes` group in `llm_api_routes` with a
method-aware special case inside `is_virtual_key_allowed_to_call_route`.
Virtual keys with allowed_routes=["llm_api_routes"] are now permitted
to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} —
non-GET methods and multi-segment admin sub-paths fall through to the
existing 403. This keeps the general llm_api_routes list free of
management paths and avoids accidentally exposing POST/PUT/DELETE
writes through the route-check layer.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* chore(ci): merge dev branch (#28807)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* fix(team): keep team_alias cache in sync on _cache_team_object writes (#28737)

* fix(team): keep team_alias cache in sync on _cache_team_object writes

_cache_team_object wrote only to the team_id:<id> cache key, but the
JWT auth path that uses team_alias_jwt_field reads from a separate
team_alias:<alias> key (get_team_object_by_alias caches under both
keys on miss, but reads only the alias-keyed one). After any
team-mutation endpoint (team_model_add, team_model_delete,
update_team, the two access-group writes) the team_id cache was
refreshed but the team_alias cache stayed stale until TTL — JWT
callers using team_alias_jwt_field kept seeing the pre-mutation
team for the full cache window.

Mirror the write under the alias key inside _cache_team_object so
every existing caller stays in sync without further changes. Skip
the alias write when team_alias is None/empty so we don't collide
across alias-less teams.

Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the
LIT-3244 fix correctly invalidated the team_id cache but the
customer's JWT used team_alias_jwt_field, so they kept hitting the
stale alias-keyed entry.

* fix(team): delete (not overwrite) team_alias cache on _cache_team_object

The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias>
from _cache_team_object. team_alias is NOT unique in the schema
(no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias
enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises).
Writing the alias-keyed cache from the generic refresh path bypassed
that check: a team admin renaming their team to collide with another
team's alias could silently overwrite the cached team for JWT-by-alias
auth, swapping the resolved team under that alias for the cache window.

Switch the alias-keyed operation from a write to a delete (mirroring
the dual-cache delete pattern in _delete_cache_key_object). After every
team write, the next JWT-by-alias reader cache-misses and falls through
to get_team_object_by_alias, which (a) re-fetches the fresh team from
DB, closing the LIT-3244 staleness gap that motivated this PR, and
(b) enforces alias uniqueness before populating either cache key.

team_id:<id> writes are unchanged — team_id is the table PK and is
guaranteed unique.

Surfaced in veria-ai review on #28739.

* fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id

extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)`
which substring-matches the `model_id,` inside the file-ID encoding's
`llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id
then fed that deployment UUID back into the auth path as a model
candidate via _extract_models_from_managed_resource_id, and every
team-BYOK file attach 403'd with:

    team not allowed to access model. This team can only access
    models=['openai/*']. Tried to access <deployment-uuid>

The team's models list correctly contains the public name (`openai/*`)
that target_model_names matches, but the bogus UUID candidate fails
the wildcard check first.

Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it
matches the legitimate top-level `model_id,<value>` field on
vector_store unified IDs and skips substring matches inside other
fields. File-IDs (which have no top-level `model_id` field) now
return None and contribute no spurious UUID candidate.

Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's
exact flow: team with openai/* BYOK deployment, JWT-scoped user,
POST /v1/vector_stores/{id}/files attaching a file uploaded with
target_model_names=openai/gpt-4o.

* fix(proxy): hydrate wildcard discovery credentials (#28284) (#28822)

* fix(proxy): hydrate wildcard discovery credentials

* fix(proxy): constrain wildcard credential hydration

Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>

* ci: add daily oss-agent-shin branch creation workflow (#28829)

Creates litellm_oss_agent_shin_MM_DD_YYYY from main every day at 00:00 UTC.
Lets us retarget oss-agent-shin fork PRs onto a canonical branch so CircleCI runs with secrets, without granting the agent write access.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* test(proxy): add harness for proxy_server.py behavior-pinning (#28827)

* test(proxy): add harness for proxy_server.py behavior-pinning

Creates tests/test_litellm/proxy/proxy_server/ with:
- conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as,
  mock_router with parametrized response builders, normalize, etc.)
- _coverage_check.py: per-PR coverage gate (line + branch) against a
  baseline, self-selects target by inspecting which placeholder files
  have been filled
- _pin_check.py: AST-based gate that verifies every pin-list item has
  >=1 happy + >=1 error test with a real assertion (no status-only)
- test_harness_smoke.py: 19 smoke tests covering every fixture +
  both scripts end-to-end
- 26 placeholder test files (one docstring each) reserved for
  follow-up PRs per the directory ownership in the Notion plan
- .coverage_baseline pinned at 0% so future PRs measure deltas
  against new-tests-only and aren't entangled with the broader
  scattered test suite

Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml
so this directory's runtime + coverage are tracked independently.

Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc

* ci(proxy-endpoints): allow workflow_dispatch

Lets the workflow be triggered manually on a branch via
`gh workflow run`, which is needed for the verify-first
flow on workflow changes before opening a PR.

* test(proxy): address review feedback on proxy_server harness

- conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4])
  instead of CWD-relative os.path.abspath("../../../../") which resolved
  to the wrong directory when pytest is launched from the repo root.
- _coverage_check.py: actually read .coverage_baseline and use it as
  the floor (line_min = max(target, baseline)). Closes the gap between
  the PR description's "delta semantics" and what the script was doing.
  With baseline=0.0 today this is a no-op; future PRs that update the
  baseline cause regressions (test deletions etc.) to trip the gate
  even if the static PR target is still met.
- _pin_check.py: drop unreachable startswith("_") guard
  (test_*.py glob never yields underscore-prefixed names) and read
  each test file once instead of twice.

* feat(openai): apply regional-processing cost uplift for EU/US data residency (#28626)

* feat(openai): apply regional-processing cost uplift for EU/US data residency

OpenAI charges a 10% uplift on the latest GPT models when requests are
served from a regionalized hostname (eu./us.api.openai.com).  Infer the
region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`,
and multiply the computed cost by a per-model
`regional_processing_uplift_multiplier_<region>` field.

https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW

* test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema

* fix(cost): tighten data_residency inference and restore model_cost in tests

- Only infer OpenAI data_residency when custom_llm_provider == "openai";
  drop the implicit None fallback so non-OpenAI callers can't accidentally
  pick up a regional tag from a stray OpenAI hostname.
- _local_model_cost_map fixture now snapshots and restores
  litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak
  state across the session.

* refactor(openai): move data_residency helper under llms/openai

* fix: thread data_residency through realtime stream cost calculation

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(cost): thread data_residency through batch_cost_calculator

Apply the OpenAI regional-processing uplift multiplier to retrieve_batch
cost paths so Batch API requests served via eu./us.api.openai.com are
priced at the same uplifted token rates as completions/transcriptions.

* refactor(openai): encapsulate provider check inside infer_openai_data_residency

Move the custom_llm_provider == "openai" guard from get_litellm_params
into the helper itself so the core utility no longer carries
provider-specific dispatch logic. Callers pass through the provider
unconditionally; the helper returns None for any non-OpenAI provider.

* fix(responses): thread data_residency through Responses logging params

The Responses API paths build their logging litellm_params dict after
provider resolution but did not include data_residency, so cost calc
saw None even when the effective api_base was a regional OpenAI host.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* fix: preserve OTEL response payload and remove duplicate constant

- Remove duplicate _CREDENTIAL_LITELLM_PARAM_FIELDS assignment in model_checks
- Restore response=dict(result) in _emit_management_endpoint_otel_span so
  OTEL spans for successful management endpoint calls include response data

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: harden OTEL failure path and cap Galileo in-memory buffer

- Wrap _emit_management_endpoint_otel_span in try/except on the failure
  path of management_endpoint_wrapper so OTEL errors cannot swallow the
  original management-endpoint exception.
- Bound GalileoObserve.in_memory_records at GALILEO_MAX_IN_MEMORY_RECORDS
  to prevent unbounded memory growth when flushes persistently fail.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(galileo): reset stale bearer token on auth error; preserve records under concurrency

- Snapshot record count before await so concurrent appends during the
  network round-trip aren't silently dropped when clearing the buffer.
- Build payload from a snapshot list so the legacy path no longer shares
  a live reference with self.in_memory_records.
- On legacy enterprise auth (username/password), drop cached bearer-token
  headers when the upstream rejects the request (401/403) so the next
  flush re-authenticates instead of failing forever on a stale token.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* test(galileo): expand v2 coverage for config, ingest, headers, and flush paths

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
…6590)

* Add tool calling support for gemini and vertex ai live api

* Fix greptile reviews

* Add new functionality behind flag

* fix greptile issues

* Fix greptile review

* Fix greptile review

* Fix greptile review

* Fix greptile review

* Fix greptile review

* fix lint

* fix(realtime): address P1 issues - guardrail timing and inputAudioTranscription default

- Remove early guardrail turn-detection update that consumed first setup slot
- Add inputAudioTranscription default in Gemini deferred-mode setup
- Add tests for both fixes

Made-with: Cursor

* fix(realtime): inject turn_detection into first session.update for deferred mode

- Instead of sending turn_detection as separate message (which gets dropped), inject it into the first client session.update
- This ensures guardrails work correctly in deferred mode
- Add test for turn_detection injection in deferred mode

Made-with: Cursor

* fix(realtime): emit response.created preamble before tool-call events

- Emit response.created, output_item.added, and conversation.item.created for function calls
- Ensures OpenAI Realtime API spec compliance
- Add test for preamble emission

Made-with: Cursor

* fix(realtime): add response.output_item.done to complete tool-call sequence

- Emit response.output_item.done between function_call_arguments.done and conversation.item.created
- Required by OpenAI Realtime spec to finalize function-call items
- Update test to verify complete event sequence

Made-with: Cursor

* fix(realtime): emit response.done after tool-call sequence (P0 CRITICAL)

- Add response.done event after tool-call loop to signal response completion
- Required by OpenAI SDK clients to submit tool results
- Without this, clients stall indefinitely waiting for response completion
- Update test to verify complete 6-event sequence including response.done

Made-with: Cursor

* fix(realtime): include function name in toolResponse (P1)

- Store call_id → name mapping when receiving toolCall from Gemini
- Look up and include name in functionResponses when sending tool results
- Required by Gemini Live API spec for proper tool call routing
- Add test to verify name field is included in round-trip

Made-with: Cursor

* fix: resolve merge conflict markers in UI build chunk

Take litellm_internal_staging version of e1a670efcb966aaa.js after
incomplete merge left conflict markers in the committed artifact.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(vertex_ai/realtime): call super().__init__() to initialize tool call state

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(realtime): correct guardrail flag and event-mapping fallback

- realtime_streaming: only mark _guardrail_turn_detection_update_sent
  when the message was actually delivered to the backend. The provider
  transformation (e.g. Gemini after initial setup) may silently drop
  session.update; previously we set the flag anyway, falsely claiming
  the disable was sent and preventing any retry on subsequent
  session.created events. _send_to_backend now returns whether at
  least one transformed message was sent.

- gemini realtime transformation: avoid shadowing the outer
  openai_event variable in map_openai_event's fallback loop. With
  the new toolCall entry now last in MAP_GEMINI_FIELD_TO_OPENAI_EVENT,
  an unmatched key would otherwise leak FUNCTION_CALL_ARGUMENTS_DONE
  and skip the ValueError raise. Use a distinct loop variable so the
  is-None check correctly raises for unknown Gemini messages.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini/realtime): reset response IDs after tool-call response.done

After closing a tool-call response, clear current_output_item_id and
current_response_id so post-tool model turns emit a fresh response.created
preamble. Add regression tests and align guardrail turn_detection test with
GA session shape; apply Black formatting.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix lint

* fix(realtime): log injected message and forward guardrail VAD-disable on Gemini

- Move store_input() after the guardrail turn_detection injection in
  client_ack_messages so audit logs reflect what is actually forwarded
  to the backend (previously the unmodified pre-injection message was
  logged).
- In Gemini's _handle_session_update, allow a session.update that only
  carries a turn_detection change to be forwarded as a follow-up Gemini
  setup with realtimeInputConfig.automaticActivityDetection set, even
  after the initial setup. This restores the guardrail layer's ability
  to disable VAD auto-response in non-deferred mode (the default Gemini
  flow), which was a regression after _handle_session_update started
  silently dropping subsequent session.update messages. Both flat
  beta-style and nested GA-style turn_detection payloads are accepted.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini/realtime): resolve mypy TypedDict errors in transformation

Align realtime event payloads and setup types with OpenAI/Gemini TypedDicts so mypy passes and tool-call events type-check correctly.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(realtime): forward turn_detection updates for Vertex; respect partial VAD config; cache setup after send

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(realtime): consolidate send-and-cache, guard session.update lookup, preserve client turn_detection in GA remap

- Replace duplicated transform/send/cache logic in client_ack_messages with a call to _send_to_backend so future changes stay in one place.
- VertexAIRealtimeConfig.transform_realtime_request now uses .get('session') or {} for the first session.update so a malformed client payload no longer crashes the connection.
- Move the audio-transcription guardrail turn_detection injection to run BEFORE the beta->GA session remap. This lets the injected create_response ride along with any client-provided turn_detection fields (e.g. silence_duration_ms) into the nested audio.input.turn_detection path produced by the remap instead of being stranded as a separate root-level dict.
- Update the deferred-mode injection test to assert the GA-shaped location.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): pop tool_call_id mapping after use to bound memory

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(realtime): correct deferred-setup session.created modalities and reset IDs after response.done

- Convert provider's real session.created to session.updated when a synthetic
  one was already forwarded so clients receive the authoritative modalities
  derived from their session.update instead of the synthetic placeholder.
- Reset current_response_id / current_output_item_id after Gemini RESPONSE_DONE
  so a toolCall arriving in a later frame starts a fresh response instead of
  reusing the completed response's ID and emitting a duplicate response.done.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini-realtime): preserve nested turn_detection through map_openai_params

After the GA remap moves session.turn_detection into session.audio.input.turn_detection,
Gemini's map_openai_params only looks at top-level keys and silently drops it. Normalize
the extracted turn_detection back to the top level on first session.update so the guardrail
create_response:False (and any client-provided VAD settings) reach the Gemini setup.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(realtime): normalize Vertex AI nested turn_detection and unify session.created guardrail ordering

- Vertex AI _build_vertex_ai_setup_config now lifts nested
  audio.input.turn_detection to the top level before calling
  map_openai_params, mirroring the parent GeminiRealtimeConfig
  behavior. Without this, guardrail-injected create_response: False
  was silently dropped for GA-protocol Vertex AI clients.
- realtime_streaming session.created handling now sends the
  (possibly re-typed) event first and then triggers the guardrail
  turn-detection update for both first and duplicate cases, removing
  the inconsistent guardrail-then-event ordering for duplicates.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(realtime): tolerate non-dict turn_detection in guardrail injection

When a client sends a session.update whose turn_detection field is None or
a non-dict value (e.g. "auto"), the guardrail injection used setdefault
followed by item assignment on the returned value, raising TypeError. The
inner except only caught JSONDecodeError/AttributeError, so the TypeError
escaped to the outer Exception handler that wraps the entire client_ack
loop, killing the connection. Replace non-dict turn_detection with a
fresh dict carrying create_response=False so the guardrail still applies
without crashing the loop.

* fix(gemini realtime): default synthetic session.created modalities to AUDIO

The synthetic session.created event emitted in deferred setup mode used
TEXT as the default for responseModalities, while _handle_session_update
defaults to AUDIO. Align the default so clients reading modalities from
the initial session.created see the correct value for live sessions.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(vertex_ai/realtime): drop follow-up session.update to avoid 1007 close

Vertex AI Live treats setup as a first-and-only client message; emitting a
second setup with realtimeInputConfig only closes the websocket with a 1007
policy error. Reverting the follow-up-setup branch restores the pre-existing
no-op behavior for subsequent session.update messages.

* fix(gemini realtime): default responseModalities to AUDIO in delta events

Align return_new_content_delta_events with the AUDIO defaults used in
_handle_session_update and transform_session_created_event so deferred
session config does not produce TEXT-typed delta events for audio data.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): default response.done modalities to AUDIO and correct audio-done test

* fix(realtime): set guardrail turn_detection flag only after successful send

Previously the _guardrail_turn_detection_update_sent flag was set inline
during message rewriting in client_ack_messages, before the modified
session.update was forwarded to the backend. If _send_to_backend raised
(e.g. backend WebSocket disconnect), the exception was caught and the
loop continued, but the flag remained True — permanently disabling the
guardrail create_response=False injection for the rest of the session.
Neither the client_ack_messages path nor the
_maybe_send_guardrail_turn_detection_update backup path would retry.

Track the injection locally and only set the flag after _send_to_backend
returns a truthy sent result, matching the pattern used by
_maybe_send_guardrail_turn_detection_update.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(vertex_ai realtime): keep VAD enabled when guardrails inject create_response: False

map_automatic_turn_detection sets disabled=True whenever create_response is
absent OR False. Transcription guardrails inject create_response: False to
suppress auto-responses while expecting VAD to stay active, but the previous
override in _build_vertex_ai_setup_config only fired when create_response was
absent, leaving disabled=True and silently breaking speech detection and
transcription events. Vertex Live has no 'VAD on, no auto-response' mode, so
always keep VAD active in the setup config.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): normalize GA-remapped session fields before mapping

map_openai_params only recognises the flat OpenAI-beta keys (modalities,
input_audio_transcription, turn_detection). For GA clients the upstream
shim renames these into the nested GA schema (output_modalities,
audio.input.transcription, audio.input.turn_detection), causing them to
be silently dropped in _handle_session_update. Add a normalization helper
that surfaces the GA-remapped values back at the top level so the
existing mapping logic picks them up. Without this, a GA client
explicitly requesting modalities=['text'] would still default to audio
output.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(vertex_ai/realtime): normalize all GA-remapped session fields before mapping

Previously _build_vertex_ai_setup_config only lifted nested turn_detection
back to the top level. GA clients' output_modalities and
audio.input.transcription were silently dropped because map_openai_params
only recognises the flat OpenAI-beta keys. Use the parent's
_normalize_session_payload_for_mapping so modalities, transcription, and
turn_detection are all surfaced before mapping.

* fix(realtime): force create_response=False in all client session.update turn_detection when audio guardrails active

Prevents a client from re-enabling Gemini/GA VAD auto-response (and thereby
bypassing the audio transcription guardrail) by sending a later
session.update with turn_detection.create_response: true.

* fix(lint): silence PLR0915 on client_ack_messages

The function exceeded the 50-statement limit (64 > 50) after recent
realtime guardrail additions. Matches the existing project pattern for
inherently complex event/message-mapping methods (see _process_event,
translate_messages_to_responses_input, transform_realtime_response,
_arealtime, etc.).

* fix(gemini realtime): preserve original setup config on follow-up session.update

Gemini Live treats a second BidiGenerateContentSetup as a full session
replacement, not a partial merge. The guardrail-driven turn_detection-only
session.update was emitting a setup containing only model + realtimeInputConfig,
which would silently drop tools, generationConfig, inputAudioTranscription, and
systemInstruction from the original setup. Carry forward the cached original
setup and only override realtimeInputConfig.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(realtime): avoid double-serialization and normalize non-dict turn_detection in guardrail override

- Skip the force-override block when the injection block already ran for
  the same session.update to avoid redundant JSON re-serialization.
- Normalize non-dict client-provided turn_detection values (flat and
  nested audio.input.turn_detection) to a dict before enforcing
  create_response=False, matching the injection block's behavior and
  preventing potential bypass on backends that accept non-dict values.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* test(gemini realtime): exercise toolCall → function_call_output name round-trip

Update test_gemini_realtime_function_call_output_transformation to pre-load
the call_id → name mapping by transforming a Gemini toolCall first, then
assert that the resulting Gemini toolResponse functionResponses entry
carries the function name. This pins the production round-trip rather
than the degenerate 'name missing' branch.

* fix(realtime): correct conversation_id, VAD disable, modality state, empty toolCall

- Gemini tool-call response.done now includes conversation_id so clients
  can match it against the preceding response.created.
- Vertex AI setup no longer overrides an explicit guardrail-injected
  create_response: False back to disabled: False; the guardrail's intent
  to disable VAD auto-response is now respected.
- Modality handler is now passed the locally-updated response/item IDs
  rather than the original input snapshot, preventing stale IDs after a
  prior tool-call/response.done in the same JSON message resets them.
- Skip emitting orphaned response.created/response.done events when
  Gemini sends an empty functionCalls array.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(realtime): preserve client session.update fields on follow-up Gemini setup

In non-deferred mode the auto-setup pre-populates session_configuration_request,
so a later client session.update carrying tools or instructions used to fall
into the subsequent path and only forward turn_detection. Rebuild a merged
follow-up setup that overlays the new client fields on top of the original
setup so tools/instructions/etc. are no longer silently dropped.

* fix(gemini realtime): include usage on tool-call response.done; coerce non-dict tool output to struct

- Tool-call response.done now includes an empty usage object, matching the
  non-tool-call path so OpenAI-compatible clients always see usage.
- _handle_function_call_output wraps non-dict JSON parses under a 'result'
  key so Gemini's functionResponses[].response (a Struct) always receives a
  mapping.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): deep-merge nested config in follow-up session update

Previously, the follow-up setup performed a shallow merge between the
original setup and new overrides. If a session.update touched any field
inside generationConfig (e.g. modalities), the entire generationConfig
would be replaced, silently dropping unrelated sub-keys like temperature
or maxOutputTokens. Apply the same deep-merge to realtimeInputConfig so
partial automatic-activity-detection updates don't drop other realtime
input config fields either.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): default conversation_id before tool-call response.done

mypy flagged that response.done's conversation_id (str on the TypedDict)
could be None when current_response_id was already set on entry. Ensure
the fallback runs unconditionally before the response is constructed.

* fix(realtime): deep-merge generationConfig and refresh cache on follow-up setup

A subsequent Gemini session.update that touches any generationConfig sub-field
(e.g. just temperature) was clobbering the original generationConfig — silently
dropping responseModalities and switching the session to text-only. Deep-merge
generationConfig so existing keys (responseModalities, maxOutputTokens, ...) are
preserved when the client updates only a subset.

Also drop the early-return in _cache_session_configuration_request so the
cached payload tracks the latest setup sent to the backend. Without this,
downstream readers (transform_session_created_event, modality lookup in
return_new_content_delta_events) keep reading stale modalities/system
instruction after a follow-up setup.

* fix(gemini realtime): mirror modalities/temperature/max_output_tokens on tool-call response.created

The audio/text response.created preamble includes modalities, temperature,
and max_output_tokens on the response object so spec-compliant clients can
initialise per-response state. The tool-call response.created was missing
these fields, leaving clients without consistent response metadata when a
response starts with a tool call instead of content. Read them from the
cached session_configuration_request the same way the audio/text path
does.

* fix(gemini realtime): keep call_id→name mapping across function_call_output retries

A client SDK that retries function_call_output (or sends the same result
twice) would previously hit a missing-name lookup on the second send
because _handle_function_call_output popped the call_id → name entry.
Without name, Gemini may silently reject the response. Use dict.get so
the mapping persists for the lifetime of the session.

* fix(gemini realtime): empty toolCall must not terminate the WebSocket

If Gemini sends a toolCall whose functionCalls list is empty (or absent),
the previous `continue` left returned_message empty and the
"Unknown message type" guard fired, killing the WebSocket session.
Return a normal (empty) result instead so the session keeps going.

* fix(vertex realtime): warn when dropping guardrail turn-detection update

In non-deferred mode the auto-setup is sent on connect, so the audio-transcription
guardrail's subsequent session.update carrying turn_detection.create_response=False
cannot be forwarded as a second setup (Vertex Live closes the WebSocket with 1007).
Surface a warning when this specific drop happens so operators know the model
will auto-respond before the guardrail can gate it, instead of failing silently
at debug level.

* fix(gemini realtime): deep-merge automaticActivityDetection on follow-up session.update

The follow-up setup merge already deep-merged generationConfig and
realtimeInputConfig, but realtimeInputConfig.automaticActivityDetection
itself is a nested dict. A partial VAD update (e.g. the
guardrail-injected disabled=True from create_response=False) silently
dropped unrelated knobs such as silenceDurationMs and prefixPaddingMs
from the original setup. Deep-merge that block too so partial overrides
only touch the fields they specify.

* fix(realtime): record synthetic session.created in deferred-setup mode

The deferred-setup path emits a synthetic session.created directly to
the client websocket but did not run it through RealTimeStreaming's
store_message, so the event was missing from the session log used by
success_handler / async_success_handler. Call store_message before
forwarding so the synthetic event lands in the same log stream as
provider-driven events.

* fix(gemini realtime): bound _tool_call_id_to_name with an LRU; exercise modality forwarding test

Two minor follow-ups from review:

* Switch _tool_call_id_to_name to a 256-entry LRU OrderedDict so a long
  session with many tool calls doesn't grow the dict without bound,
  while retried function_call_output lookups still hit for recently-seen
  call_ids.
* Fix test_gemini_realtime_transformation_session_created to wrap the
  cached session config in {"setup": ...} so the modality lookup in
  transform_session_created_event actually exercises responseModalities
  forwarding (the prior payload was silently treated as empty).

* test(gemini realtime): wrap remaining cached session configs in setup envelope

The session_configuration_request the proxy caches is always serialized
as {"setup": ...}; three modality-related tests dumped a bare config
dict instead, so transform_session_created_event's
`.get('setup', {})` quietly returned an empty dict and the
responseModalities lookup ran against the default rather than the
fixture. Wrap the remaining tests in the same shape the production
cache uses so any regression in modality forwarding actually trips.

* fix(gemini realtime): cast merged realtimeInputConfig for typeddict assignment

mypy flagged the assignment of the merged dict into
BidiGenerateContentSetup.realtimeInputConfig with [typeddict-item]: the
intermediate variable widens to dict[Any, Any], losing the TypedDict
narrowing the previous dict-literal form had.

* test(gemini realtime): wrap test_gemini_tool_call_resets_ids fixture in setup envelope

The cached session_configuration_request the proxy stores is always
serialized as {"setup": ...}; this test passed a bare config dict, so
transform_session_created_event's .get('setup', {}) returned an empty
dict and the responseModalities lookup ran against the default rather
than the fixture. Wrap the fixture in the same shape the production
cache uses.

* fix(gemini realtime): skip unknown sibling keys in transform loop

Gemini realtime messages can include sibling metadata keys like
usageMetadata alongside primary payload keys (toolCall, serverContent).
Previously, the transform loop called map_openai_event for every
top-level key, raising ValueError for unknown ones and terminating
the WebSocket session.

Skip top-level keys not present in MAP_GEMINI_FIELD_TO_OPENAI_EVENT
to keep the session alive when Gemini emits usage metadata with a
toolCall response.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): scope dotted-key event lookup and propagate session metadata to tool-call response.done

- map_openai_event: only check the current key/value pair when resolving
  dotted map entries (e.g. serverContent.turnComplete) so a sibling key in
  the same frame can't misclassify the event being processed
  (e.g. toolCall returning RESPONSE_DONE).
- tool-call path: extract generationConfig once and include modalities,
  temperature, and max_output_tokens on response.done so its shape matches
  response.created and the non-tool-call response.done.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): cast maxOutputTokens to int for typeddict assignment

* fix(gemini realtime): use camelCase maxOutputTokens in response.done

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): cast maxOutputTokens to int for typeddict assignment

* fix(realtime): inject guardrail turn_detection on subsequent session.update without one

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): tolerate sibling-only frames (e.g. standalone usageMetadata)

A Gemini Live frame that contains only metadata keys outside
_KNOWN_GEMINI_TOP_LEVEL_KEYS (e.g. a bare {"usageMetadata": {...}}
emitted between turns) leaves returned_message empty after the
transform loop and was tripping the 'Unknown message type' guard,
which raised ValueError and terminated the WebSocket session.

Treat such frames as no-ops and return the unchanged state instead.

* fix(gemini realtime): preserve sibling toolCall when serverContent has only transcription

Previously, when a Gemini frame contained both a transcription-only
serverContent and a sibling toolCall, the transcription handler would
early-return and silently drop the toolCall. Instead, mark serverContent
as handled and fall through so the main loop still processes siblings
like toolCall, while preserving the prior no-op behavior for empty/
transcription-only frames.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* refactor(gemini realtime): drop unused json_message arg from map_openai_event

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): promote nested turn_detection when flat value is not a dict

When the session payload had `turn_detection: None` (or any non-dict value), the
normalizer skipped promoting the GA nested `audio.input.turn_detection` because
it only checked key presence. The stale None then flowed into
`map_automatic_turn_detection` and raised TypeError on `'create_response' in value`.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(realtime): run guardrails on function_call_output content

Tool result outputs are client-controlled and fed to the model, so
they must pass the same content checks as user text messages.
Otherwise an attacker can smuggle blocked content into a
function_call_output and have the model process it.

* fix(gemini realtime): emit function_call_arguments.delta before .done

Gemini delivers the full function-call arguments in a single toolCall
frame. The OpenAI Realtime spec orders the streaming events as
output_item.added -> function_call_arguments.delta(+) ->
function_call_arguments.done -> output_item.done. Emit a single delta
carrying the complete arguments string before the matching .done so
spec-compliant SDK clients that accumulate deltas and gate finalisation
on at least one delta arriving do not stall on Gemini tool calls.

* fix(realtime): avoid stale session.created flag triggering guardrail re-injection

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(ci): restore guardrail injection on duplicate session.created and cast realtime delta event

- Re-enable the one-time guardrail turn_detection update on duplicate
  session.created. `_maybe_send_guardrail_turn_detection_update` is
  already idempotent via `_guardrail_turn_detection_update_sent`, so
  the previous guard was unnecessary and broke the deferred-setup path
  where the synthetic session.created is emitted by llm_http_handler
  outside this loop (no prior chance to inject).

- Cast the response.function_call_arguments.delta dict appended to
  `returned_message: List[OpenAIRealtimeEvents]` so mypy is satisfied.

* fix(realtime): forward sanitized function_call_output on guardrail block

Providers that pair every toolCall with a toolResponse (e.g. Gemini and
Vertex Live) stay in the awaiting-tool-call state until a toolResponse
arrives. Dropping a blocked function_call_output outright left those
providers stalled — the subsequent guardrail clientContent and
response.create were ignored because the prior toolCall had no matching
toolResponse.

When the client-supplied tool output fails the realtime guardrail check,
forward a sanitized placeholder function_call_output (same call_id,
generic policy marker as output) instead of dropping the message
entirely. The placeholder carries no blocked content, so the model never
sees it, while still completing the provider's tool-call cycle so the
session can recover and the violation message reaches the user.

* fix(gemini realtime): preserve sibling keys on empty toolCall no-op

Replace the early return on `functionCalls` empty/absent with a
`continue` plus a `tool_call_handled` flag that mirrors the existing
`server_content_handled` pattern. The post-loop guard already
distinguishes intentionally-consumed known keys from genuinely-unknown
messages, so adding `toolCall` to that exclusion list lets the loop
continue iterating over any sibling top-level keys in the same Gemini
frame instead of short-circuiting on the first empty toolCall.

In practice Gemini's protobuf places `toolCall`/`serverContent`/
`setupComplete` in a `oneof` so the only realistic sibling is
`usageMetadata` (already filtered as unknown-top-level), but the
uniform handling avoids silently discarding any future sibling key
should the wire format grow.

* fix(gemini realtime): redact realtime payloads from debug logs

The transform_realtime_response debug logs were dumping the raw inbound
Gemini frame and each outbound OpenAI event payload (up to 500 chars).
Realtime frames carry transcripts, model output, and tool-call arguments,
so those strings ended up in application logs whenever DEBUG was enabled.
Replace the inbound dump with just the top-level frame keys and the
outbound dump with just the event type.

* fix(realtime): check function_call_output before user role to prevent guardrail bypass

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): propagate usageMetadata on tool-call response.done

Gemini Live emits usageMetadata as a sibling top-level key alongside the
toolCall frame; the tool-call branch was unconditionally building
response.done from get_empty_usage(), so tokens consumed by tool-call
turns were recorded as zero spend and bypassed LiteLLM budget
accounting. Mirror the non-tool-call RESPONSE_DONE path: when the same
frame carries usageMetadata, run VertexGeminiConfig._calculate_usage and
forward the real token counts.

* fix(realtime): send sanitized toolResponse before guardrail clientContent

Two related fixes for the function_call_output blocked-by-guardrail path:

1. Ordering: Gemini Live requires a matching toolResponse immediately
   after a toolCall before any other client message. Previously we ran
   the guardrail first (which sends clientContent/cancel) and only then
   forwarded the sanitized function_call_output. Add an optional
   pre_block_backend_message arg to run_realtime_guardrails so the
   sanitized toolResponse is emitted before the guardrail's own backend
   messages.

2. Stale pending flag: stop setting _pending_guardrail_message in the
   tool-output block. That flag exists to swallow the reflexive
   response.create an OpenAI client sends right after a user text
   message. In tool-calling flows the client may never send a
   response.create (e.g. Gemini SDKs auto-respond), so leaving the flag
   set would consume an unrelated response.create from a later turn.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* test(model_prices): allow audio_transcription_config in schema

* fix(gemini realtime): event_id, item copy, and dict guard for tool-call events

- Emit event_id on response.output_item.added for tool calls so spec-compliant
  OpenAI Realtime SDK clients can index/deduplicate the event like every other
  server-sent event in the sequence.
- Pass a shallow copy of function_call_item to response.output_item.done and
  conversation.item.created so downstream handlers (e.g. the beta-protocol
  translator) that mutate the item dict don't corrupt sibling events sharing
  the same reference.
- Guard map_openai_event against non-dict values (e.g. Gemini's
  'setupComplete: true' boolean payload) so the WebSocket session doesn't die
  with an AttributeError on the unguarded .get() call.

Add NotRequired event_id field on OpenAIRealtimeStreamResponseOutputItemAdded
to keep existing call-sites that don't set event_id compatible.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(gemini realtime): buffer standalone usageMetadata for next response.done

Gemini Live can emit usageMetadata as a standalone WebSocket frame between
turns. The previous transformer treated those frames as no-ops, so token
counts arriving outside the closing turnComplete/toolCall frame were
dropped from spend and budget accounting. An authenticated client could
drive turns whose usage was recorded as zero, bypassing budgets.

Buffer any standalone usageMetadata on the config instance and attribute
the deferred counts to the next emitted response.done (tool-call or
normal). In-frame usageMetadata remains authoritative and clears the
buffer.

* merge main (#28839)

* fix(helm): drop main- prefix from default image tag (#28710)

* fix(helm): drop main- prefix from default image tag

The default image tag in the deployment + migrations-job templates was
`main-{{ .Chart.AppVersion }}`. The current release pipeline publishes
content tags without the `main-` prefix (e.g. `v1.85.1` / `1.85.1`,
`v1.86.0-rc.1` / `1.86.0-rc.1`), so the rendered ref points at a tag
that does not exist on GHCR or DockerHub and installs fail with
ImagePullBackOff.

- templates/deployment.yaml, templates/migrations-job.yaml: render
  `.Chart.AppVersion` directly instead of `main-<AppVersion>`.
- Chart.yaml: bump stale `appVersion: v1.80.12` (not on either
  registry) to `v1.85.1` so local-checkout installs also resolve.
- values.yaml: update the commented tag-override hint to match.

* fix(helm): use :latest in tag override example, not pinned version

Per review: ghcr.io/berriai/litellm-database:latest is a floating
alias for the most recent stable (same digest as :main-stable),
maintained by the release pipeline's UPDATE_LATEST advance step.
Better example than a pinned version that goes stale.

* test(model_prices): allow audio_transcription_config in schema (#28708)

The schema in test_aaamodel_prices_and_context_window_json_is_valid uses
additionalProperties: false. The azure/speech/azure-stt entry added in
#27482 introduced an audio_transcription_config field that the schema
did not whitelist, so the test fails on every branch built on top of
staging.

Add the field as a string property.

* fix(team): refresh team cache on team_model_add/delete (LIT-3244) (#28683)

* fix(team): refresh team cache on team_model_add/delete (LIT-3244)

team_model_add and team_model_delete wrote to the DB but did not
invalidate the in-memory LiteLLM_TeamTableCachedObj used by
common_checks. After the v1.83.14 common_checks centralization made
team.models authoritative on /v1/files and /v1/vector_stores/*,
adding a Team-BYOK model silently failed to grant the new public
model name to team members until the cache TTL expired (and a
removed model kept working until then on the symmetric path).

Extract the cache-refresh snippet from update_team into a small
helper and apply it consistently at all three team-write sites.

* test: also assert updated models in team-cache-refresh pin

Strengthens the LIT-3244 regression test to also assert
`call_kwargs["team_table"].models` matches the updated row,
not just `team_id`. Both `existing_team` and `updated_team`
share `team_id` in the test setup, so the previous assertion
would have passed even if the implementation accidentally cached
the pre-mutation row.

Greptile review feedback.

* fix(team): hydrate object_permission on cache-refreshing team updates

The Prisma update calls in update_team, team_model_add, and
team_model_delete returned a team row with object_permission_id set
but object_permission=None (the relation was not requested via
include=). _refresh_cached_team then wrote that to the in-memory
LiteLLM_TeamTableCachedObj, and the cache-hit path in get_team_object
returns the cached object without re-hydrating. Downstream consumers
(validate_key_search_tools_against_team, the MCP/agent authz paths)
treat a missing object_permission as no team-level restriction, so
a team-write op silently dropped object-permission enforcement until
the cache TTL expired or a DB-fetch path re-hydrated it.

Add include={"object_permission": True} to all three updates so the
refresh writes a complete cached team. Extend the LIT-3244 regression
test to pin both the cached object_permission and the include shape
on the Prisma call.

Surfaced in PR review of LIT-3244.

* fix(ui/add-model): stop vertex_ai-anthropic_models from leaking under Anthropic (#28723)

`getProviderModels()` matched a model into a provider's dropdown when the
model's `litellm_provider` string *contained* the provider key as a
substring. The intent was to admit suffix variants (e.g. `anthropic_text`,
`bedrock_converse`), but the substring check is too loose: it also pulls in
unrelated providers whose name happens to contain the key, most visibly
`vertex_ai-anthropic_models` matching `anthropic` and `vertex_ai-openai_models`
matching `openai`.

Replace `.includes()` with separator-anchored prefix matching
(`startsWith(provider + "_")` / `startsWith(provider + "-")`). All legitimate
variants in `model_prices_and_context_window.json` still match
(`anthropic_text`, `azure_text`, `azure_ai`, `bedrock_converse`,
`bedrock_mantle`, `cohere_chat`, `fireworks_ai-embedding-models`,
`vertex_ai-*`, `vertex_ai_beta`), and the cross-provider leak is closed.

Tests: update one assertion that pinned the buggy substring behavior
(`custom_openai_endpoint` matching `openai` — not a real provider value);
add 6 new tests covering the leak regressions and the variant-preservation
contract for vertex_ai/bedrock/fireworks.

* Fix spend logs v2 route permissions (#28705)

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body (#27526)

* Fix Bedrock KB pass-through SigV4 headers and signed body

Coerce botocore HeadersDict to a dict for pass-through routes. When
forward_headers is true, drop request headers that collide case-insensitively
with signed headers so client Bearer auth does not shadow AWS SigV4.
Send prepped.body as raw content so the outbound payload matches the
signature after logging hooks mutate the parsed dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Simplify pass-through raw body handling

Read the SigV4-signed bytes directly from request.state inside
pass_through_request instead of threading a custom_raw_body argument
through three functions. Helper methods are restored to their original
signatures, and the new branch lives in one place at each httpx call site.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Harden pass-through raw body read from request.state

Guard missing request.state (test fixtures) and ignore non-bytes/str
values so MagicMock does not trigger the SigV4 raw-body path.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Test pass_through_request state_raw_body uses httpx content=

Cover non-streaming (async_client.request) and streaming (build_request)
paths so SigV4 bytes on request.state are not replaced by json= of a
hook-mutated dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728)

* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214

The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).

Changes:
  - Replace 26 hardcoded references to 888602223428 with 941277531214 across
    8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
    ARNs, batch execution role ARN, and example proxy config).
  - The provisioned-model and imported-model ARNs are referenced only from
    mocked unit tests — no AWS resources to recreate.
  - The batch execution IAM role has been recreated in the new account with
    the same name and equivalent permissions.
  - The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
    hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
    under the same names — see tools/agentcore-deploy/ in a follow-up.

CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.

Smoke-tested locally against the new account:
  aws bedrock-runtime converse --region us-west-2 \
    --model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
    --messages '[{"role":"user","content":[{"text":"ping"}]}]'
  → 200, model returned 'pong'

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes

The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).

Deployed runtimes:
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy

Both runtimes are status=READY and pass a smoke invoke:
  $ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
  → 200, {"result": "echo: ping"}

The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): point Bedrock batch tests at new-account S3 bucket

The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.

Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): point live S3 logging test at new-account bucket

Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.

Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails

The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
  - wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
    with explicit inputAction=ANONYMIZE so masking applies to INPUT,
    which is the source litellm's moderation hook sends)
  - ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
    to the exact string the tests assert on)

Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): migrate legacy models to current inference profiles

The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
  - anthropic.claude-3-sonnet-20240229    -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
  - anthropic.claude-3-haiku-20240307     -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).

cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources

These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
  - SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
    -> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
  - Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
    vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)

claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.

Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): swap/skip legacy-gated models unavailable on new CI account

The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:

- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
  legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
  authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
  us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
  active us.anthropic.claude-sonnet-4-5 inference profile.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account

- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
  is not authorized on account 941277531214) and migrate the missed
  s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
  us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
  output e2e test.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)

Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
  instead of skipping, so the missing entitlement stays visible in CI; they
  still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
  batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
  transform + cost-tracking path stays under test without live model access

https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT

Co-authored-by: Claude <noreply@anthropic.com>

* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells

Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(otel): export SERVER span on management-endpoint success without http_request (#28794)

Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>

* chore(ci): merge dev branch (#28801)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* chore(ci): merge dev branch (#28657)

* feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543)

* feat(dashboard): refine navbar zones and Agent Platform notice

Restructure the admin navbar for production users: clear product vs community
vs personal columns with vertical dividers, icon-only Slack/GitHub in a
shared chip, and Docs/Blog typography aligned on an 8px rhythm.

Add a notifications bell with popover linking to the LiteLLM Agent Platform
repo and optional mark-as-read persistence.

Promote the account control with initials avatar, single-line display name,
and navDisplayName mapping for placeholder user ids (e.g. default_user_id).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex

- Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock
- Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages
- Remove redundant equality checks in navDisplayName (regex already covers them)
- Remove unused `lower` variable after simplification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(dashboard): drop dead useHealthReadiness import in navbar

The module was removed in #27896 (replaced by useHealthReadinessDetails),
but the import survived the rebase. The symbol is unused — only
useHealthReadinessDetails is consumed in the file. Removing the dead
import unblocks the UI TypeScript build.

* fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels

The component was refactored to an icon-only chip with aria-label='LiteLLM
on GitHub' (squash #27543), but the test still asserted /star us on
github/i. Update the query to match the rendered accessible name.

* refactor(dashboard): drop unused props from NavbarProps

The navbar refactor moved user identity + dark-mode state to internal
hooks (useAuthorized, useWorker), but the NavbarProps interface still
declared userID, userEmail, userRole, premiumUser, isDarkMode, and
toggleDarkMode as required, forcing every caller to thread them through.

Drop them from the interface and all four call sites (page.tsx,
(dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also
shrinks the destructure in layout.tsx so the now-unused locals stop
being pulled out of useAuthorized().

* refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag

Reads/writes of the litellmHideAgentPlatformBanner key were done
directly inside NotificationsBell via a useEffect + useState pair.
Every other localStorage-backed flag in the dashboard (Disable
ShowPrompts, DisableBouncingIcon, DisableShowNewBadge,
DisableUsageIndicator, DisableBlogPosts) is wrapped in a
useSyncExternalStore hook over localStorageUtils so all mounted
components stay in sync.

Extract useHideAgentPlatformBanner to follow the same shape, swap
NotificationsBell to consume it, and add a regression test that
two sibling bells stay in sync without a remount when one is
dismissed.

* refactor: mask credential fields in proxy settings GET responses (#28682)

* refactor: mask credential fields in proxy settings GET responses

Brings SSO settings, cache settings, and the email/Slack alerting view in
/get/config/callbacks in line with the HashiCorp Vault config-override
pattern, so persisted credentials are not transported back to the UI in
plaintext.

* refactor: harden short-value masking and hoist alerting var constant

Closes two review observations:

- mask_sensitive_keys now replaces short values (below the visible
  prefix+suffix length) with an all-mask string instead of returning them
  unchanged, so a 1-7 character credential is no longer round-tripped
  verbatim.
- _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level
  constant, matching the analogous _SSO_SENSITIVE_FIELDS and
  _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files.

---------

Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ui): show 2-decimal precision for max_budget on key overview (#28809)

The Key Info Overview tab's Spend card truncated sub-dollar budgets to
"$0" because formatNumberWithCommas defaults to 0 decimals. The Settings
tab passes 2; align the overview so a $0.10 budget renders as "$0.10".

Resolves LIT-2845

* feat(proxy): allow `llm_api_routes` virtual keys to list MCP servers (#28442)

* feat(proxy): allow llm_api_routes virtual keys to list MCP servers

Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET
/v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that
virtual keys configured with `allowed_routes=["llm_api_routes"]` can
discover the MCP servers they have access to. Previously these calls
failed with 'Virtual key is not allowed to call this route. Only allowed
to call routes: [llm_api_routes]'.

The GET handlers already sanitize the response for restricted virtual
keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping
credential-bearing fields (url, headers, env). Write methods
(POST/PUT/DELETE) on the same paths remain gated by the existing
handler-level admin role checks.

The new discovery list is intentionally kept OUT of
`mcp_inference_routes`, so `is_llm_api_route()` still returns False
for these paths — this preserves the existing contract that
DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP
servers.

Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* refactor(proxy): make MCP discovery carve-out method-aware

Replace the `mcp_discovery_routes` group in `llm_api_routes` with a
method-aware special case inside `is_virtual_key_allowed_to_call_route`.
Virtual keys with allowed_routes=["llm_api_routes"] are now permitted
to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} —
non-GET methods and multi-segment admin sub-paths fall through to the
existing 403. This keeps the general llm_api_routes list free of
management paths and avoids accidentally exposing POST/PUT/DELETE
writes through the route-check layer.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* chore(ci): merge dev branch (#28807)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* fix(team): keep team_alias cache in sync on _cache_team_object writes (#28737)

* fix(team): keep team_alias cache in sync on _cache_team_object writes

_cache_team_object wrote only to the team_id:<id> cache key, but the
JWT auth path that uses team_alias_jwt_field reads from a separate
team_alias:<alias> key (get_team_object_by_alias caches under both
keys on miss, but reads only the alias-keyed one). After any
team-mutation endpoint (team_model_add, team_model_delete,
update_team, the two access-group writes) the team_id cache was
refreshed but the team_alias cache stayed stale until TTL — JWT
callers using team_alias_jwt_field kept seeing the pre-mutation
team for the full cache window.

Mirror the write under the alias key inside _cache_team_object so
every existing caller stays in sync without further changes. Skip
the alias write when team_alias is None/empty so we don't collide
across alias-less teams.

Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the
LIT-3244 fix correctly invalidated the team_id cache but the
customer's JWT used team_alias_jwt_field, so they kept hitting the
stale alias-keyed entry.

* fix(team): delete (not overwrite) team_alias cache on _cache_team_object

The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias>
from _cache_team_object. team_alias is NOT unique in the schema
(no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias
enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises).
Writing the alias-keyed cache from the generic refresh path bypassed
that check: a team admin renaming their team to collide with another
team's alias could silently overwrite the cached team for JWT-by-alias
auth, swapping the resolved team under that alias for the cache window.

Switch the alias-keyed operation from a write to a delete (mirroring
the dual-cache delete pattern in _delete_cache_key_object). After every
team write, the next JWT-by-alias reader cache-misses and falls through
to get_team_object_by_alias, which (a) re-fetches the fresh team from
DB, closing the LIT-3244 staleness gap that motivated this PR, and
(b) enforces alias uniqueness before populating either cache key.

team_id:<id> writes are unchanged — team_id is the table PK and is
guaranteed unique.

Surfaced in veria-ai review on #28739.

* fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id

extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)`
which substring-matches the `model_id,` inside the file-ID encoding's
`llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id
then fed that deployment UUID back into the auth path as a model
candidate via _extract_models_from_managed_resource_id, and every
team-BYOK file attach 403'd with:

    team not allowed to access model. This team can only access
    models=['openai/*']. Tried to access <deployment-uuid>

The team's models list correctly contains the public name (`openai/*`)
that target_model_names matches, but the bogus UUID candidate fails
the wildcard check first.

Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it
matches the legitimate top-level `model_id,<value>` field on
vector_store unified IDs and skips substring matches inside other
fields. File-IDs (which have no top-level `model_id` field) now
return None and contribute no spurious UUID candidate.

Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's
exact flow: team with openai/* BYOK deployment, JWT-scoped user,
POST /v1/vector_stores/{id}/files attaching a file uploaded with
target_model_names=openai/gpt-4o.

* fix(proxy): hydrate wildcard discovery credentials (#28284) (#28822)

* fix(proxy): hydrate wildcard discovery credentials

* fix(proxy): constrain wildcard credential hydration

Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>

* ci: add daily oss-agent-shin bra…
ryan-crabbe-berri and others added 17 commits May 27, 2026 14:45
* test(ui): e2e cover team model edit + admin identity in navbar

Adds two Playwright tests as part of the manual-QA → e2e migration:
"Edit team model selection" exercises the Settings tab Models multi-select
+ Save Changes flow on a seeded team, and the existing login test now
opens the User dropdown and asserts the role and User ID render — guarding
against regressions where login succeeds but the auth context is empty.

Resolves LIT-3093

* test(ui): restore seeded models in team-edit test so retries don't fail

The 'Edit team model selection' test removed fake-anthropic-claude from
E2E_TEAM_CRUD_ID without restoring it. CI runs with retries: 2 and the seed
script runs once before the suite, so a flake on this test would fail the
retry at the "tag is visible" assertion. Wrap the test in try/finally and
restore the seeded models via /team/update before and after.

* test(e2e): fail loudly if team/update restore call fails

Surfaces the real cause when the master key is wrong or the proxy is
unreachable, instead of silently leaving the team in a stale state and
failing later on the visibility assertion.

* fix(e2e): match navbar account button by aria-label, not non-existent "User" text

The previous trigger filter (hasText: /^User$/) didn't match the rendered
UserDropdown button — its text is the displayName ("Account" for the
master-key admin, an email for SSO users), never "User". The evaluate
call then timed out after 15s in CI. Use the stable aria-label prefix
the component always emits, and click directly since the dropdown is
configured trigger=["click"] (the synthetic hover was unnecessary).
* test(e2e): cover add-fallback flow in Router Settings as proxy admin

The Router Settings → Fallbacks → Add Fallbacks flow was an uncovered
manual-QA path. This adds a test that opens the modal, picks a primary
+ fallback from the seeded mock models, saves, and verifies both render
in the fallback table.

* fix(e2e): make router-fallback test idempotent and pick antd options by text

- Match `.ant-select-item-option` by text instead of `getByTitle(...)` —
  FallbackGroupConfig uses `options=` (not <Select.Option> children), so
  no `title` attribute is emitted and the title-based selector hangs.
- Add before/after hooks that wipe any fallback for fake-openai-gpt-4 via
  /config/update so retries and local reruns don't trip on leftover state.
- Tighten the success assertion to a single tbody row containing BOTH the
  primary and the fallback names — pre-existing rows can no longer
  vacuously satisfy the check.
- Fix the stale "Three tabs" comment to "Four tabs".

Addresses Greptile P2s on PR #29069.

* fix(e2e): keyboard-select fallback models + correct cleanup endpoint

- Replace mouse-based option clicks with click-to-focus + type + Enter.
  FallbackGroupConfig's Selects use `options=` and a custom
  getPopupContainer, so locating options via `.ant-select-dropdown`
  hit several races: DOM-clicks left antd's popup state stale (the
  primary popup then intercepted the fallback click), `getByRole`
  matched always-mounted hidden options, and pointer stability fought
  the open animation. Typing into the showSearch input narrows the
  listbox to one option and Enter selects it cleanly.
- Assert on dialog-side state changes (the active tab adopts the
  primary model name; the chain helper shows "1/10 used") instead of
  popup contents — these reflect the actual selection landing.
- Cleanup helper now hits /get/config/callbacks (the real endpoint;
  /get/callbacks returns 404), so the before/after reset actually
  clears prior router_settings.fallbacks state.
* test(e2e): cover Team-BYOK add-model flow as proxy admin

The team-only model + team assignment was an uncovered manual-QA path.
This adds a premium-gated test that toggles Team-BYOK, picks the seeded
E2E Team CRUD, submits, and verifies the model lands in All Models with
the team alias attached.

* test(e2e): apply greptile fixes to Team-BYOK test

- Add the 2s networkidle settle that the sibling addModel tests use —
  networkidle fires before the All Models table finishes re-rendering,
  so the search input was racing with the render.
- Assert on `models-results-count` before inspecting the table body so
  an empty search result fails with a clear "expected results count"
  message instead of timing out on a missing row.

Addresses Greptile P2s on PR #29068.

* test(e2e): harden Team-BYOK test against flake and stale state

- Add before/after cleanup that deletes any Cohere model already scoped
  to e2e-team-crud via /v2/model/info + /model/delete, so Playwright
  retries and local reruns don't accumulate rows.
- Pick the team from the dropdown by role/option name instead of a
  global getByText match — avoids matching a previously-rendered tag
  elsewhere in the form.
- Scope the "created successfully" assertion to .ant-notification so a
  stale toast from an earlier test in the same browser context can't
  vacuously satisfy it.
- Tighten the All Models assertion: require a single row that contains
  BOTH the cohere model name AND the e2e-team-crud alias, so the
  team-less wildcard from the sibling "Add wildcard route" test can't
  satisfy the check.
…ma Json serialization (#28990)

* fix(containers): record ownership for service-account keys + fix Prisma Json field serialization

- Track containers created implicitly via /v1/responses by extracting container IDs
  from the response output and calling record_container_owner for each one, so
  subsequent file-API calls from the same service account pass ownership checks.
- Fix DataError: Prisma Python requires Json fields to be JSON strings; serialize
  file_object with json.dumps() before insert/update in LiteLLM_ManagedObjectTable.
- Add collect_container_ids_from_responses_response utility to responses/utils.py
  that walks all output item shapes (code_interpreter_call, message annotations).
- Tests: two new cases covering the responses-tracking path and the end-to-end
  record-then-assert flow for service accounts with team scope.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(containers): swallow all exceptions in ownership hook; tighten file_object_json type to str

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(containers): parse file_object JSON string in existing ownership test

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: container ownership recording bugs

- Remove unreachable _aresponses_websocket from route_type set in
  base_process_llm_request; the WebSocket endpoint never flows through
  base_process_llm_request, so this branch was dead code that gave a
  false impression of coverage.
- Drop the HTTPException re-raise in record_container_owners_from_responses_response
  so per-container failures (including HTTP 403/500 from conflicting
  ownership rows) no longer abort the batch and skip recording for the
  remaining container IDs in the same response.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(containers): record ownership for streaming /v1/responses too

Streaming /v1/responses returns through the select_data_generator
branch in base_process_llm_request and bypasses the non-streaming
ownership tail, so code-interpreter containers created mid-stream
were never written to LiteLLM_ManagedObjectTable. Follow-up file API
calls would then 403.

Wrap the SSE generator so container ownership is recorded once the
upstream iterator finishes assembling completed_response. Also covers
the background-polling path, which loops body_iterator end-to-end.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
)

* test(e2e): cover add-MCP-server flow via discovery → custom form

The "Add MCP server" manual-QA step was uncovered. This adds a test
that opens the discovery modal, jumps into the custom-server form,
fills name + Streamable HTTP transport + a placeholder URL + None
auth, submits, and verifies both the success toast and the new row.

* test(e2e): apply greptile fixes to MCP add-server test

- Anchor the auth-type Select via its enclosing Collapse panel
  ("Authentication") instead of the placeholder text. The Form.Item has
  no label prop, so the previous `hasText: /auth type/i` filter was
  matching via "Select auth type" placeholder copy — fragile.
- Document the intentional lack of teardown, matching the pattern used
  in addModel.spec.ts: the e2e runner discards the DB per invocation.

Addresses Greptile P2s on PR #29070.

* test(e2e): scope MCP row assertion to the servers table

Scope the post-create row lookup to `table tbody` so the form modal's
`server_name` input — which still holds the timestamped value during
its close animation — can't satisfy the assertion before the server
actually lands in the list.

* docs(e2e): note MCP coverage scope and link to tracker

This spec only smoke-tests the happy-path Streamable HTTP + None auth
flow. Add a top-of-file comment pointing at E2E_COVERAGE.md so future
contributors can see what's still uncovered (other transports, all
auth types, edit/delete, BYOK, tool list/call, access groups).
…29071)

* test(e2e): cover AI Hub make-public flow and public model_hub_table

Three previously-uncovered manual-QA paths land in one spec:

- Admin opens "Select Models to Make Public", advances through the
  multi-step modal, and verifies the success toast.
- AI Hub tab strip exposes Model Hub / Agent Hub / MCP Hub / Skill Hub
  — note the manual-QA "Claude Code Plugin Marketplace" label was
  renamed to Skill Hub; the test pins the current name.
- Anonymous /ui/model_hub_table loads with the master key as `?key=`
  and renders the Model Hub tab. Agent Hub / MCP Hub tabs are
  conditional on public data and are not asserted here.

* test(e2e): harden AI Hub make-public + public hub assertions

Address Greptile review:

- Make-public test now asserts "Select All (N)" with N>=1 before clicking,
  so a missing-seed-data run surfaces immediately instead of timing out
  on the disabled Next button or the success toast.
- Public model_hub_table test dismisses the feedback popup before the
  tab visibility assertion, matching the ordering used by navigateToPage
  so a popup race can't mask the tab mid-evaluation.

* docs(e2e): explain admin vs public AI Hub tab asymmetry

Greptile flagged the all-4-tabs assertion as a potential CI flake,
inferring from the public-page comment that Agent Hub / MCP Hub might
be data-conditional in the admin view too. They aren't — ModelHubTable
renders all four tabs unconditionally for admins. Document the asymmetry
inline so future readers (and future review passes) don't re-derive it.
…onfig (#28898)

* feat: support goal mode for claude on bedrock

* fix failing lint test

* addressing greptile comments

* fixing failed test

* address greptile: copy output_config and warn on dropped converse format

* fix(bedrock): skip redundant output_config normalization on Converse reasoning_effort path

When reasoning_effort is mapped via _handle_reasoning_effort_parameter, the
resulting output_config is already normalized via
normalize_bedrock_opus_output_config_effort. Mark it as normalized so
_prepare_request_params can skip the redundant call (and the associated
get_model_info lookup) on every request.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* test(reasoning-effort-grid): reflect Bedrock opus-4-6 xhigh→max clamping

* fix(bedrock): stop leaking output_config marker and message-content mutation

* fix(bedrock): guard effort key access in normalize_bedrock_opus_output_config_effort

Defensively check that 'effort' is a valid key in _BEDROCK_OUTPUT_CONFIG_EFFORT_ORDER
before indexing, to prevent a KeyError if the hardcoded guard tuple ever drifts from
the order dict's keys.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(bedrock): drop dead second clause in effort normalization guard

The 'effort not in _BEDROCK_OUTPUT_CONFIG_EFFORT_ORDER' check is
unreachable once 'effort not in ("xhigh", "max")' has been ruled out,
since both literals are present in the order dict. Keep the literal
membership check and let the dict lookups below speak for themselves.

* fix(bedrock): clamp output_config.effort against ceiling for any known value

The early return when effort was not 'xhigh'/'max' meant a ceiling of
'low' or 'medium' would silently forward an out-of-range value. Gate on
the known effort ordering instead so the ceiling comparison runs for
every recognized effort.

* test(grid_spec): use _CAPS_OPUS_4_7 for non-Bedrock opus-4-6 entries

claude-opus-4-6 now declares supports_xhigh_reasoning_effort in the model
map, so production accepts xhigh on Azure AI and Vertex AI routes. Update
those grid_spec entries to match production capabilities so expected()
predicts 200 for xhigh instead of 400.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* test(grid_spec): revert xhigh caps for non-Bedrock opus-4-6

azure_ai/claude-opus-4-6 and vertex_ai/claude-opus-4-6 do not declare
supports_xhigh_reasoning_effort in model_prices_and_context_window.json.
Azure AI upstream rejects xhigh with HTTP 400 ("Supported levels: high,
low, max, medium"). Restore _CAPS_4_6 so the grid predicts 400 for
xhigh, matching production capabilities.

* fix: stop advertising xhigh effort on Opus 4.5/4.6

Only Opus 4.7 supports the xhigh reasoning effort level. Remove the
supports_xhigh_reasoning_effort flag from every Opus 4.5 and Opus 4.6
entry (direct Anthropic, Bedrock, and regional variants) in both model
catalog files.

On the direct Anthropic path there is no effort clamp, so flagging 4.5/4.6
as xhigh-capable caused litellm to forward xhigh to a model that rejects it
(and made get_model_info misreport the capability). xhigh now correctly
degrades to high / raises on those models.

Bedrock graceful degradation for Claude Code goal mode is unaffected: it
relies solely on the bedrock_output_config_effort_ceiling clamp (4.5->high,
4.6->max, 4.7->xhigh), which runs before validation, so xhigh requests to
older Bedrock Opus models are still silently lowered rather than rejected.

Update effort-gating tests to reflect that 4.5/4.6 no longer accept xhigh.

* fix: clamp xhigh effort on Bedrock Invoke /v1/messages instead of rejecting

Claude Code "goal mode" sends output_config.effort=xhigh over the Anthropic
/v1/messages API, which routes Bedrock models through
AmazonAnthropicClaudeMessagesConfig. That path validated effort against the
model's native capability and raised 400 for xhigh on Opus 4.6, while the
chat-completions paths (Converse + Invoke) already clamp xhigh to the model's
bedrock_output_config_effort_ceiling. That asymmetry broke goal mode on the
exact API surface Claude Code uses.

Apply the same ceiling clamp on the messages path before the shared effort
gate runs, so xhigh degrades to max on Opus 4.6 (and stays xhigh on 4.7).
Scoped to adaptive-thinking models and to models that declare a ceiling, so
Sonnet 4.6 (no ceiling) and Opus 4.5 (budget mode) are unaffected and still
reject xhigh.

* fix(bedrock): preserve user output_config when applying reasoning_effort

- Converse path: merge mapped effort into existing output_config via
  setdefault instead of overwriting it, matching the Anthropic Messages
  path. Prevents user-supplied output_config.format from being silently
  dropped when reasoning_effort is also provided.
- tests: clear _get_local_model_cost_map lru_cache in the autouse
  fixture alongside get_bedrock_response_stream_shape to avoid stale
  cache leakage between tests.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(bedrock): pre-clamp reasoning_effort for chat invoke; correct test caps

- Add _clamp_adaptive_reasoning_effort_for_bedrock to AmazonAnthropicClaudeConfig
  so raw reasoning_effort=xhigh degrades to the model's bedrock effort ceiling
  before AnthropicConfig.map_openai_params converts it to output_config.
  Mirrors converse path (_handle_reasoning_effort_parameter) and messages path
  (_clamp_adaptive_reasoning_effort_for_bedrock) so the three Bedrock paths
  are consistent.

- grid_spec: restore caps=_CAPS_4_6 for Bedrock converse/invoke Opus 4.6 entries
  so the test reflects the model's actual JSON capabilities. Teach expected()
  to bypass the xhigh/max cap check when bedrock_effort_ceiling will clamp
  the wire effort, so the test still passes for Bedrock's graceful degradation
  contract without lying about native model caps.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Dennis Henry <dennis.henry@okta.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
…28970)

* feat(guardrails): wire apply_guardrail into proxy logging callbacks

Route /apply_guardrail through pre/post proxy hooks and LiteLLM success/failure handlers so Langfuse and OTEL integrations receive input/output on guardrail-only requests.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(guardrails): fix Greptile review comments on apply_guardrail logging

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(apply_guardrail): preserve original exception and capture modified response

- Capture return value from post_call_success_hook so callback-modified
  responses propagate to the caller.
- Wrap success/failure logging calls in defensive try/except so logging
  infrastructure failures don't replace the user-visible response or mask
  the original guardrail exception.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* Fix mypy

* fix(apply_guardrail): isolate failure logging and use post-hook response for logging

- Split async_failure_handler and post_call_failure_hook into independent
  try/except blocks so a callback bug in one does not silently skip the
  other.
- Build response_for_logging inside _emit_guardrail_success_logs after
  post_call_success_hook runs, so logged data matches the response the
  caller actually receives when the hook modifies the response.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(apply_guardrail): fix black formatting and update tests for fastapi_request param

- Run black on guardrail_endpoints.py to fix CI formatting check
- Add _mock_proxy_logging() helper to enterprise guardrail tests to patch
  proxy-server globals imported at call time
- Pass fastapi_request=Mock() in all direct apply_guardrail test calls
  to match updated function signature

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(guardrails): use transformed exception from post_call_failure_hook in apply_guardrail

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(guardrails): isolate sync/async logging handlers in apply_guardrail

Separate each logging handler call into its own try/except so a failure
in the async handler does not silently skip the sync handler submission
(and vice versa). Matches the docstring's defensive intent.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(apply_guardrail): guard transformed_exception with isinstance check

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(guardrails): mock proxy globals in not_found test and share apply_guardrail logging fixture

- Add proxy-server global mocks to test_apply_guardrail_not_found so the
  failure-path post_call_failure_hook call doesn't touch the real proxy
  logging singleton.
- Extract the duplicated _mock_proxy_logging context manager out of the
  two enterprise apply_guardrail test files into a shared conftest fixture
  so the helper stays in one place.

* fix(guardrails): use update_messages to keep logging obj in sync

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* build(deps): bump next from 16.2.4 to 16.2.6 in /ui/litellm-dashboard (#27665)

Bumps [next](https://github.com/vercel/next.js) from 16.2.4 to 16.2.6.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](vercel/next.js@v16.2.4...v16.2.6)

---
updated-dependencies:
- dependency-name: next
  dependency-version: 16.2.6
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump protobufjs in /tests/pass_through_tests (#28296)

Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.5.6 to 7.6.0.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/protobufjs-v7.6.0/CHANGELOG.md)
- [Commits](protobufjs/protobuf.js@protobufjs-v7.5.6...protobufjs-v7.6.0)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-version: 7.6.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump ws from 8.20.0 to 8.20.1 in /tests/pass_through_tests (#28303)

Bumps [ws](https://github.com/websockets/ws) from 8.20.0 to 8.20.1.
- [Release notes](https://github.com/websockets/ws/releases)
- [Commits](websockets/ws@8.20.0...8.20.1)

---
updated-dependencies:
- dependency-name: ws
  dependency-version: 8.20.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix(proxy): enforce tag budgets for key-level tags

Merge API key metadata.tags into request_data before _tag_max_budget_check
so per-tag budgets apply when tags are set on the key at creation time.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(auth): avoid false reject for key-inherited tags

Run reject_clientside_metadata_tags before key-tag injection, then inject key metadata tags immediately before tag budget checks so key tags still enforce budgets without being treated as client-supplied tags.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…video edit (#29098)

* fix(vertex-ai): pass litellm_params to validate_environment in video handlers and implement video edit for Veo

- Pass litellm_params to validate_environment in 11 video handler call sites
  (remix, create_character, get_character, edit, extension, delete) so
  DB-stored Vertex AI credentials are used instead of falling back to ADC
- Implement transform_video_edit_request/response for VertexAI: fetches
  source video via fetchPredictOperation then submits a new
  predictLongRunning request with the video bytes/gcsUri + edit prompt

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(vertex-ai): hoist fetchPredictOperation into handlers to avoid blocking event loop

- Add get_video_edit_prefetch_params() to BaseVideoConfig (returns None)
- VertexAI overrides it to return the fetchPredictOperation URL/body
- Both sync and async video_edit handlers call this and use their shared
  httpx client for the fetch, passing the result as prefetched_source_data
- transform_video_edit_request is now a pure transform with no HTTP calls
- Fix extra_body.pop() mutation by working on a shallow copy

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(vertex-ai): include prefetch call inside _handle_error try/except block

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(videos): add prefetched_source_data param to all transform_video_edit_request overrides

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(video_edit): keep transform/pre_call outside try so validation errors propagate

Move transform_video_edit_request and logging_obj.pre_call outside the
try/except that wraps HTTP calls in (async_)video_edit_handler so that
ValueError validation errors (e.g. 'source video not complete yet') are
not silently wrapped as 500s by _handle_error. The prefetch HTTP call
keeps its own try/except so its errors are still mapped through the
provider's error handler. Matches the pattern used by
video_extension_handler and video_remix_handler.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* refactor(vertex_ai): delegate get_video_edit_prefetch_params to status retrieve

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* Fix varia review

* fix(video_edit): route transform errors through _handle_error

Wrap transform_video_edit_request and pre_call in the same try/except
as the HTTP call in sync and async handlers so validation failures
(e.g. source video not complete) return typed LiteLLM exceptions.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
…st (#28487)

* fix(datadog): drain cost-management queue + opt-in FinOps tag allowlist

* fix(datadog): guard non-dict callback_specific_params + log empty aggregation

* fix(datadog): block user-controlled tags from overwriting reserved cost-attribution dimensions

* fix(datadog): cast metadata to dict[str, Any] to satisfy mypy
… and UI (#28712)

* feat(helm): split per-component ServiceAccounts for gateway, backend, and UI

Replace the single shared serviceAccount with three separate serviceAccounts
(gateway, backend, ui) so operators can attach different IRSA / Workload
Identity annotations per component without granting data-plane credentials
to the UI pod.

Key changes:
- values.yaml: rename serviceAccount → serviceAccounts with gateway/backend/ui
  sub-keys; UI defaults to automount: false
- _helpers.tpl: replace litellm.serviceAccountName with three component-scoped
  helpers (litellm.gateway/backend/ui.serviceAccountName)
- serviceaccount.yaml: create up to three separate ServiceAccount objects with
  component labels and per-SA automountServiceAccountToken
- gateway/backend deployments: use their respective SA helpers
- ui deployment: use litellm.ui.serviceAccountName + explicit
  automountServiceAccountToken: false on the pod spec so the projected token
  is absent even when the SA itself allows it
- migrations-job: share the backend SA (both need DB write access)

Resolves LIT-3171

https://claude.ai/code/session_01QPy362WnjmEpeNuJaPUqmF

* fix(helm): enforce automountServiceAccountToken on all pod specs; fix leading --- in serviceaccount.yaml

- gateway/backend deployments: add explicit automountServiceAccountToken on
  the pod spec so serviceAccounts.*.automount is honoured regardless of
  whether the SA is chart-created or operator-supplied (previously the flag
  only took effect on the SA object when create: true, creating an asymmetry
  with the UI which already enforced it at pod-spec level)
- serviceaccount.yaml: use a $prev sentinel to emit --- only between
  documents, preventing a leading --- when gateway SA is skipped but
  backend or ui SA is created (avoids lint/GitOps warnings from strict
  YAML parsers and tools like ArgoCD)

https://claude.ai/code/session_01QPy362WnjmEpeNuJaPUqmF

---------

Co-authored-by: Claude <noreply@anthropic.com>
* fix(deps): bump vulnerable proxy dependencies (starlette/fastapi, granian, pyarrow, semantic-router)

Resolve known CVEs flagged by osv-scanner/grype against uv.lock. All bumped
versions verified to resolve, install, and pass the proxy auth/route/middleware
unit suites (717 tests) plus an import smoke on the new stack.

- starlette 0.50.0 -> 1.1.0 (CVE-2026-48710 "BadHost", GHSA-86qp-5c8j-p5mr):
  versions <1.0.1 reconstruct request.url from the unvalidated Host header,
  poisoning request.url.path. Required raising fastapi 0.124.4 -> 0.136.3,
  which dropped fastapi's starlette<0.51.0 cap; an explicit starlette>=1.0.1
  floor blocks regression to a vulnerable transitive resolution. The proxy's
  own auth already reads scope["path"] via get_request_route, but the locked
  starlette still flagged in container scanners and left other request.url
  consumers exposed.
- granian 2.5.7 -> 2.7.4 (CVE-2026-42544, unauthenticated DoS via WebSocket
  subprotocol header panic; CVE-2026-42545, WSGI response-header-panic DoS).
  granian is a selectable proxy server (proxy_cli).
- pyarrow 22.0.0 -> 23.0.1 (CVE-2026-25087 / PYSEC-2026-113).
- semantic-router 0.1.12 -> 0.1.15: 0.1.12 was yanked (CVE-2026-42208 — its
  unbounded litellm pin could resolve a credential-exfiltrating litellm==1.82.8
  wheel).

Not fixable by bump: diskcache 5.6.3 (CVE-2025-69872, unsafe pickle
deserialization) has no upstream fix and is left pinned; exploiting it requires
write access to the local cache directory.

Relock side effect: sse-starlette 3.4.2 -> 3.4.4.

* deps: relax exact pins in optional extras to compatible ranges

The proxy/optional extras exact-pinned every dependency, which (1) forces
downstream `pip install litellm[proxy]` consumers into version lockstep and
(2) blocks them from pulling transitive security patches without forking — the
structural cause behind needing a litellm release to clear the starlette CVE in
the previous commit.

Convert the ordinary extras deps to `>=current,<next_major` ranges, mirroring
the core [project].dependencies style. Reproducibility for litellm's own
Docker/CI is unaffected: images install via `uv sync --frozen`, and the lock
re-resolves to the identical versions (no locked version changed).

Kept exact-pinned:
- litellm-proxy-extras, litellm-enterprise — litellm's own sub-packages,
  versioned in lockstep with the release.
- opentelemetry-api/sdk/exporter-otlp — must resolve to matching versions.
- grpcio — supply-chain-pinned to a vetted, aged release.

Also corrects the stale comment claiming the extras are exact-pinned for Docker
reproducibility (the images use the lock, not these pins).

* fix(ci): resolve license-check lookup version from the floor for ranged deps

check_licenses.py derived the PyPI lookup version with
`next(iter(req.specifier))`, which returns an arbitrary specifier clause. For
a range like `>=0.12.1,<1.0` it picked the upper bound (`1.0`) — a version
that doesn't exist on PyPI — so the license lookup 404'd and the package was
flagged as having an unknown license.

The previous commit's switch from exact pins to ranges exposed this for
soundfile, pyroscope-io, redisvl, diskcache, and mlflow (the ranged deps not
already in liccheck.ini's allowlist). Prefer a lower-bound/exact version (a
real released version) for the lookup.

* fix(proxy): set strict_content_type=False on the FastAPI app

Starlette 1.0 / FastAPI 0.13x flipped the default to strict_content_type=True,
which refuses to parse a JSON request body when the client omits the
Content-Type header. The proxy previously accepted those requests, so the
fastapi/starlette bump in this PR would silently break clients that don't send
a Content-Type. Restore the prior lenient behavior explicitly.

Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
…replay (#29229)

The Redis-backed VCR layer was recording and replaying the Google
OAuth2/STS token-mint call. The replayed ya29.* access token is
long-expired, but its recorded expires_in keeps credentials.expired
False, so litellm never refreshes it and sends the stale token to a live
Vertex/Gemini endpoint, which returns 401 ACCESS_TOKEN_EXPIRED. This
broke live partner-model tests whose completion call is not itself
cassette-backed (e.g. test_vertex_ai_llama_tool_calling).

Force credential-exchange hosts to pass through live (never recorded,
never replayed) by returning None from before_record_request, mirroring
the existing telemetry passthrough, so a fresh token is minted each run.

Regression from #28826, which added OAuth-token matcher tolerance plus
TTL-refresh-on-read so a stale token episode matched and never expired.
Updates the gollem_go_agent_framework example to the current Go release.
Clears stale Go stdlib advisories reported by osv-scanner against the
older 1.25.1 directive. No source changes; the single pinned dependency
(gollem v0.1.0) is backward compatible.
@greptile-apps

greptile-apps Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Too many files changed for review. (288 files found, 100 file limit)

exit 1

- name: Setup Node for Playwright
uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
@CLAassistant

CLAassistant commented May 29, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
7 out of 9 committers have signed the CLA.

✅ yuneng-berri
✅ ryan-crabbe-berri
✅ milan-berri
✅ mateo-berri
✅ Sameerlite
✅ michelligabriele
✅ shivamrawat1
❌ yassin-berriai
❌ ishaan-berri
You have signed the CLA already but the status is still pending? Let us recheck it.

* bump: version 1.87.0 → 1.88.0

* uv lock
@codspeed-hq

codspeed-hq Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_internal_staging (bae0459) with main (06f6cfc)

Open in CodSpeed

…#29238)

* feat(anthropic): add Claude Opus 4.8 and prune reasoning-effort flags

Register claude-opus-4-8 across the anthropic/bedrock/vertex/azure cost-map
entries, BEDROCK_CONVERSE_MODELS, and the setup-wizard provider list.

Prune two reasoning-effort fields from the cost map:
- Drop supports_minimal_reasoning_effort from the Claude fleet (58 entries).
  "minimal" is not a real Anthropic effort level (the API accepts only
  low/medium/high/xhigh/max), so LiteLLM degrades it to "low" regardless;
  the flag was inert and misleading on Anthropic.
- Remove tool_use_system_prompt_tokens everywhere (103 entries). It is not in
  the ModelInfo type and is read by no production code.

Update the affected config/schema tests; the reasoning-effort registry tests
now assert the Claude fleet omits supports_minimal.

* fix(anthropic): recognize output_config effort after minimal-flag prune

Pruning supports_minimal_reasoning_effort from the Claude fleet removed the
only "supports effort param" marker from 11 Opus 4.5 / mythos-preview map
entries that lack supports_output_config. _model_supports_effort_param then
returned False for them, so output_config was wrongly dropped under
drop_params=True -- regressing
test_anthropic_model_supports_effort_param_recognizes_supporting_models for
claude-opus-4-5-20251101 and the mythos preview.

- _model_supports_effort_param now treats supports_output_config as a
  sufficient signal, matching the bedrock-invoke call sites that already
  check supports_output_config OR a reasoning-effort flag. Shared map lookup
  extracted into _supports_model_capability.
- Add supports_output_config: true to the 11 Opus 4.5 / mythos entries that
  lost their only marker, restoring prior effort-forwarding behavior without
  re-adding the inert minimal flag.

@mateo-berri mateo-berri left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌙ly incoming...

LGTM

@yuneng-berri yuneng-berri enabled auto-merge May 29, 2026 02:03
@yuneng-berri yuneng-berri merged commit a021a5b into main May 29, 2026
124 of 129 checks passed
if not route.startswith(prefix):
return False
remainder = route[len(prefix) :]
return bool(remainder) and "/" not in remainder

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low: MCP health checks bypass route restrictions

A virtual key restricted to allowed_routes=["llm_api_routes"] can now call GET /v1/mcp/server/{server_id}; that handler runs health_check_server() before returning the sanitized discovery response, so the key can make the proxy repeatedly connect to MCP servers even though health/control-plane routes were not granted. This predicate also matches the static GET /v1/mcp/server/health route. Keep the carve-out to GET /v1/mcp/server only, or make the single-server discovery path skip health checks for restricted virtual keys and explicitly exclude static management subpaths like health.

@veria-ai

veria-ai Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

PR overview

One low-risk authorization gap remains: a virtual key limited to LLM API routes can still reach certain MCP server endpoints and trigger proxy health-check connections to configured MCP servers. This does not appear to expose broad data access or full control-plane access, but it does let a restricted key perform an unintended action that should be blocked. No issues have been addressed yet, so the PR still needs this route restriction tightened before merge.

Open issues (1)

Fixed/addressed: 0 · PR risk: 5/10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.