Skip to content

chore(release): backport 1.84.8 patches + #30220 deps to stable/1.86.x and cut 1.86.6#30403

Merged
yuneng-berri merged 12 commits into
stable/1.86.xfrom
litellm_backport_1_86_x_0613
Jun 14, 2026
Merged

chore(release): backport 1.84.8 patches + #30220 deps to stable/1.86.x and cut 1.86.6#30403
yuneng-berri merged 12 commits into
stable/1.86.xfrom
litellm_backport_1_86_x_0613

Conversation

@yuneng-berri

Copy link
Copy Markdown
Collaborator

Relevant issues

Backports the patch set that shipped in 1.84.8 onto the 1.86.x stable line, plus two
extras (#30220 dependency bumps and #29493 budget-reservation), and cuts 1.86.6. Every
fix was confirmed absent on 1.86.x before picking; 1.86.x branched from staging on
2026-05-16, so it predated almost the entire set. Nothing here is already on the line.

What is included

Cherry-picks in staging merge order (each carries a cherry picked from footer to its
staging commit):

Adaptation notes

The source diff of every code pick is byte-identical to its staging commit; the adaptations
below are confined to tests, build artifacts, and the dependency manifests.

Known noise on this line

  • tests/.../test_azure_container_transformation.py::...uses_azure_env_var fails locally
    only: litellm's find_dotenv walks up to the repo .env, which carries a real
    AZURE_OPENAI_API_KEY that the Azure key-resolution chain prefers over the test's
    monkeypatched value. The test is byte-identical to staging fix(router): use forwarded model_id for native Azure container IDs #27921 (already green on
    stable/1.84.x and in CI). It passes in an environment without that key.

Type

🐛 Bug Fix

Changes

See the cherry-pick list above. Source content for every pick is byte-identical to staging;
adaptations are test/artifact/dependency-manifest only.

Sameerlite and others added 12 commits June 13, 2026 16:07
…27921)

* fix(router): use forwarded model_id for native Azure container IDs in _init_containers_api_endpoints

Azure code-interpreter containers return provider-native IDs (cntr_ + hex)
that carry no LiteLLM routing payload, so _decode_container_id returns
model_id=None. The router was falling through to call the handler directly,
bypassing _ageneric_api_call_with_fallbacks and leaving api_base=None for
Azure deployments. Fall back to the model_id forwarded from the proxy
ownership check so deployment credentials are always applied.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(azure-containers): strip /openai/responses path from api_base in AzureContainerConfig.get_complete_url

When a deployment's api_base is the responses endpoint URL
(e.g. .../openai/responses?api-version=...), AzureContainerConfig was
appending /openai/containers on top of it, producing the broken path
.../openai/responses/openai/containers. Azure returns 404 for that URL
while the correct path is .../openai/containers.

Strip any /openai/responses suffix from api_base before constructing
the containers URL so the resource root is always used as the starting point.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(azure-containers): prefer api-version from api_base URL over deployment's api_version

The deployment's api_version (e.g. 2024-08-01-preview) targets the chat/responses
API and is too old for the containers API, which requires 2025-04-01-preview.
The responses endpoint api_base already carries the correct api-version in its
query string. Extract it and use it for the containers URL, overriding the
stale deployment-level version.

Fixes DELETE and file-upload operations returning 404 due to wrong api-version.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(containers): pass params=None instead of params={} to httpx to preserve api-version

httpx erases a URL's query-string when params={} (empty dict) is passed,
silently stripping ?api-version=2025-04-01-preview from every container
POST/DELETE request. Azure's GET endpoints tolerate a missing api-version;
POST (upload) and DELETE are strict, so those returned 404.

Fix: use `params or None` in container_handler._async_handle and
llm_http_handler.async_container_delete_handler (and all sibling container
handlers) so that an empty params dict falls back to None, leaving httpx to
preserve the URL's existing query string intact.

Adds a regression test that directly documents the httpx behaviour.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(router): remove elif model_id branch from _init_containers_api_endpoints

Two reviewer findings addressed:

1. Truncated comment on the model_id fallback line — now complete.

2. Security: the elif branch that fired when container_id was absent allowed
   any authenticated caller to supply model_id in a POST /v1/containers body
   and route the request through an arbitrary deployment UUID, bypassing the
   model-level access checks that only validate `model`. Removed the elif
   branch; operations without container_id (create, list) route by the
   caller-supplied `model` field as before. model_id forwarding is kept only
   inside the container_id block, where the proxy ownership check has already
   validated the container before forwarding the deployment ID.

Adds a regression test pinning the security boundary: no-container-id path
calls original_function directly even when model_id is in kwargs.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(containers): validate proxy-to-router model_id forwarding for managed IDs

Add test_regression_get_container_forwarding_params_sets_model_id_for_managed_id
to verify that get_container_forwarding_params (the proxy-side half of the Azure
routing fix) correctly extracts and forwards model_id from a LiteLLM-managed
encoded container ID.

This closes the gap identified by Greptile P1: the previous regression test
only injected model_id as a direct kwarg, validating the router in isolation.
The new test exercises the actual proxy-to-router data flow through
ownership.get_container_forwarding_params, confirming that kwargs["model_id"]
is populated before _init_containers_api_endpoints is reached.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(azure-containers): tighten endpoint-path strip to endswith match

Use path.endswith() instead of path.find() for _AZURE_ENDPOINT_PATHS so
the suffix strip only fires when api_base actually ends with one of the
endpoint-specific path suffixes. This is the more precise check greptile
flagged on the original find()-based implementation.

* Fix sync container handler to preserve URL query string

Mirror the async path fix: pass None instead of an empty params dict so
httpx does not strip the URL's existing query string (e.g.
?api-version=...), which is required for Azure container routing.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(azure-containers): strip trailing slash before endpoint suffix match

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(containers): recover model_id from stored encoded id for native Azure container IDs

get_container_forwarding_params previously only set model_id when the
user-supplied container_id was a LiteLLM-managed encoded id. For native
upstream IDs (e.g. Azure 'cntr_<hex>') the decode fails and model_id was
never forwarded — making the router-side fallback in
_init_containers_api_endpoints unreachable in production.

Fall back to the stored 'unified_object_id' on the ownership row, which
is the encoded form captured at create time when the router selected a
specific deployment. Decoding that yields the deployment model_id and
restores router-based credential application (api_base, api_key) for
retrieve/delete and container-file operations on native IDs.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
(cherry picked from commit 7f563b2)
…28395)

* fix(proxy): expose Prisma idle/connect timeout + extra DB URL params

Operators have reported large numbers of idle Prisma connections that
never get closed. The proxy already forwards `connection_limit` and
`pool_timeout` to the DATABASE_URL, but had no knob for capping idle
or slow connections. Add three new `general_settings` keys that thread
through to the DATABASE_URL / DIRECT_URL query string:

- `database_connect_timeout`  -> Prisma `connect_timeout`
- `database_socket_timeout`   -> Prisma `socket_timeout` (the main
  knob for closing idle connections from the LiteLLM side)
- `database_extra_connection_params` -> untyped passthrough dict for
  any other Prisma URL param (`pgbouncer`, `statement_cache_size`,
  `sslmode`, ...); keys here override LiteLLM defaults.

Refactors the duplicated DATABASE_URL/DIRECT_URL param dicts into a
single `_build_db_connection_url_params` helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update litellm/proxy/proxy_cli.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
(cherry picked from commit 2f9ac77)
…29493)

* feat(proxy): add disable_budget_reservation general setting (#27639)

* feat(proxy): register disable_budget_reservation in ConfigGeneralSettings (#27639)

* docs(proxy): document disable_budget_reservation concurrency tradeoff (#27639)

* ci: re-trigger flaky docker build (prisma generate ECONNRESET)

* fix(proxy): warn and document budget enforcement tradeoff when disable_budget_reservation is set (#27639)

Provenance: #29493 landed on litellm_internal_staging inside aggregator 32c88ca
(Litellm oss staging 080626, #29932). The granular squash 1032dd7 is no longer on
any ref; its diff is content-identical to the disable_budget_reservation hunks in the
aggregator, verified before this pick.

(cherry picked from commit 1032dd7)
(cherry picked from commit 32c88ca within litellm_internal_staging)
…r genuine auth failures (#29986)

(cherry picked from commit da9d64b)
…thropic streaming logging

Targeted subset of staging commit cfcdf87 (#30202): only the
anthropic_passthrough_logging_handler.py hardening hunks and their four
tests are taken; the rest of that staging batch is intentionally excluded.

(cherry picked from commit cfcdf87)
(cherry picked from commit 973c7eb)
…combined view (#30327)

The grace-period branch assigned the recursive get_data result (a
finished LiteLLM_VerificationTokenView) back into the variable that the
combined-view dict normalization then subscripts, raising TypeError on
every request made with a rotated key inside its grace window; auth
surfaced that as a 401. Return the recursive result directly instead.

Regression test drives the full get_data flow: old hash misses the view,
deprecated table resolves to the active token, and the call must return
the view object

(cherry picked from commit 5047eaf)
Backports the dependency hardening from #30220 (staging commit d96ab46).
Adapted to this line: the proxy-runtime extras are exact-pinned here, so
pypdf is moved from ==6.10.2 to ==6.13.1 (staging used >=6.12.0,<7.0); the
[tool.uv] constraint-dependencies block (tornado>=6.5.6, aiohttp>=3.13.5,<3.14)
matches staging. The dashboard devDependency vitest, @vitest/coverage-v8 and
@vitest/ui move 3.2.4 -> 3.2.6 and the transitive brace-expansion 5.0.5 -> 5.0.6.

uv.lock and the dashboard package-lock.json were regenerated on this line rather
than cherry-picked, so the resolved set reflects 1.86.x: pypdf 6.13.1, tornado
6.5.7, aiohttp held at 3.13.5; vitest 3.2.6, brace-expansion 5.0.6.

Adapted from #30220 (cherry picked from commit d96ab46)
@yuneng-berri yuneng-berri requested a review from a team June 14, 2026 00:01
@CLAassistant

CLAassistant commented Jun 14, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
3 out of 4 committers have signed the CLA.

✅ Ar-maan05
✅ Sameerlite
✅ yuneng-berri
❌ yassin-berriai
You have signed the CLA already but the status is still pending? Let us recheck it.

@yuneng-berri

Copy link
Copy Markdown
Collaborator Author

@greptileai

@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Structured backport cutting 1.86.6 from the stable/1.86.x line. Source hunks are byte-identical to their staging counterparts; adaptations are confined to tests, build artifacts, and dependency manifests.

  • Auth/DB hardening: auth errors during DB outages now surface 503 instead of 401 (auth_exception_handler.py, exception_handler.py); the deprecated-key recursive lookup in get_data now returns the result directly instead of overwriting response and crashing on dict subscript of a LiteLLM_VerificationTokenView.
  • Azure containers routing: native upstream container IDs (e.g. cntr_<hex>) now recover model_id from the ownership row captured at create time; _normalize_api_base/_extract_api_version strip endpoint-path suffixes and prefer the api-version embedded in api_base; all httpx container calls use params or None so an empty dict no longer strips ?api-version from the URL.
  • New general_settings knobs: disable_budget_reservation, database_connect_timeout, database_socket_timeout, database_disable_prepared_statements, database_extra_connection_params exposed in _types.py and wired through proxy_cli.py; Prisma cached-plan recovery now reconnects the client instead of mutating the SQL.

Confidence Score: 4/5

The auth path changes are conservative additions (503 vs 401 on DB outage, direct return of the deprecated-key lookup result) and all have corresponding regression tests. The container routing refactor and httpx param fixes are well-isolated. The disable_budget_reservation flag is gated, clearly documented, and emits a per-request warning.

Each cherry-pick is byte-identical to its staging version, tests cover the targeted regressions comprehensively, and no backwards-incompatible changes were introduced without a flag. Two minor edge cases (broad OSError classification in the DB exception classifier, and silent query-param loss in _normalize_api_base) are unlikely to bite in practice but worth tracking.

litellm/proxy/db/exception_handler.py (broad OSError catch) and litellm/llms/azure/containers/transformation.py (_normalize_api_base dropping non-api-version query params).

Important Files Changed

Filename Overview
litellm/proxy/utils.py Fixes deprecated-key lookup returning a LiteLLM_VerificationTokenView (not a dict) then crashing on dict subscript; fix stores the recursive result in deprecated_response and returns it directly, skipping the dict-normalisation block.
litellm/proxy/db/exception_handler.py Adds is_database_service_unavailable_error and is_prisma_engine_internal_error to distinguish infrastructure outages (→503) from genuine auth failures (→401); includes traceback-walk heuristic for the prisma engine AttributeError masquerade.
litellm/proxy/auth/auth_exception_handler.py Raises 503 instead of 401 when the DB is unreachable during auth; PrismaDBExceptionHandler is already imported, control-flow ordering is correct.
litellm/proxy/auth/user_api_key_auth.py Threads general_settings into _reserve_budget_after_common_checks; disable_budget_reservation flag skips optimistic reservation with a warning log on every affected request.
litellm/proxy/proxy_cli.py Extracts _build_db_connection_url_params to centralise Prisma URL-param construction; adds connect_timeout, socket_timeout, disable_prepared_statements, and extra_params passthrough.
litellm/proxy/container_endpoints/ownership.py Makes get_container_forwarding_params async; adds _CONTAINER_STORED_ID_CACHE and _get_stored_container_id to recover model_id for native upstream Azure container IDs from the ownership row captured at create time.
litellm/router.py In _init_containers_api_endpoints, reads a forwarded model_id from kwargs as a fallback when the container_id carries no LiteLLM routing payload; security boundary test confirms the fallback is gated on container_id presence.
litellm/llms/azure/containers/transformation.py Adds _normalize_api_base to strip endpoint-path suffixes and _extract_api_version to prefer the api-version embedded in api_base over the deployment's older chat api_version when building the containers URL.
litellm/llms/custom_httpx/container_handler.py Replaces params=query_params with params=query_params or None throughout all httpx calls; fixes 404s caused by httpx stripping the ?api-version query-string when an empty dict is passed.
litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py Hardens streaming logging: skips [DONE] sentinels and non-JSON SSE frames; adds _resolve_costing_model and _extract_model_from_anthropic_chunks to recover the model name when body model is "unknown".
pyproject.toml Bumps version 1.86.5→1.86.6, pypdf 6.10.2→6.13.1, adds tornado>=6.5.6 and aiohttp>=3.13.5,<3.14 constraint-dependencies.

Reviews (1): Last reviewed commit: "chore: refresh uv.lock for 1.86.6" | Re-trigger Greptile

Comment on lines +155 to +161
so a type-only check misses real outages. ``is_database_transport_error``
keyword-matches the connection message and catches that masquerade,
while genuine data errors (no connection keyword) correctly stay 401.
The Postgres "cached plan must not change result type" error is matched
here, not in ``is_database_transport_error``: it is a transient stale-DB-
state condition (not an invalid key), but the connection is healthy so it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Broad OSError catch may misclassify non-DB errors as infrastructure outages

OSError is the base class for FileNotFoundError, PermissionError, IsADirectoryError, and many others. If a certificate file is missing, a TLS library raises ssl.SSLError (an OSError subclass), or a config file can't be opened during the auth flow, this check would return True and the caller would surface a 503 ("database temporarily unreachable") rather than letting the error propagate naturally. In practice the auth DB path is unlikely to trigger those, but the comment "OSError covers ConnectionError…" could give future contributors confidence to add more code near this block. Consider tightening to the concrete connection subclasses: ConnectionError, TimeoutError, socket.timeout, ssl.SSLError.

Comment on lines 34 to +45
litellm_params=GenericLiteLLMParams(api_key=api_key),
)

@staticmethod
def _normalize_api_base(api_base: Optional[str]) -> Optional[str]:
"""Strip endpoint-specific path suffixes from api_base to get the resource root."""
if not api_base:
return api_base
parsed = urlparse(api_base)
path = parsed.path.rstrip("/")
for ep in _AZURE_ENDPOINT_PATHS:
if path.endswith(ep):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _normalize_api_base silently drops non-api-version query params

When the path suffix is stripped, the reconstructed URL uses empty strings for the query and fragment components of urlunparse, discarding the entire original query string. Any param other than api-version in api_base — for example a subscription-key or custom auth query param some Azure deployments add — would be silently lost. _extract_api_version saves the version before normalisation, but nothing preserves the rest. If those extra params are ever needed downstream, they'll need to be forwarded explicitly.

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Backports 10 cherry-picked patches from staging onto stable/1.86.x and cuts 1.86.6. Every source hunk is byte-identical to its staging commit; adaptations are confined to test drops (where staging-only symbols are absent on this line), build artifacts, and dependency manifests.

Confidence Score: 4/5

Safe to merge; all source changes are bug fixes with targeted test coverage and no backwards-incompatible defaults.

The backport is cleanly structured: every changed hunk has a corresponding test (or a documented reason why the staging test was dropped), new settings default to off/None so existing deployments are unaffected, and the two most impactful fixes (deprecated-key crash and 503-vs-401 during DB outages) are well-covered by the new mock test suites. The one minor style nit in router.py keeps the score off the top.

litellm/proxy/container_endpoints/ownership.py — the new _get_stored_container_id path makes a direct litellm_managedobjecttable.find_first call on cache-miss; worth confirming the table has an index on (model_object_id, file_purpose) in production.

Important Files Changed

Filename Overview
litellm/proxy/utils.py Fixes two bugs: (1) deprecated-key lookup now assigns to deprecated_response and returns immediately, avoiding a crash when the Pydantic LiteLLM_VerificationTokenView was subscripted as a dict; (2) the cached-plan fallback no longer mutates the SQL and instead triggers attempt_db_reconnect before retrying.
litellm/proxy/db/exception_handler.py Adds is_prisma_engine_internal_error and is_database_service_unavailable_error to classify DB infrastructure failures so auth returns 503 instead of 401 during genuine DB outages.
litellm/proxy/auth/auth_exception_handler.py Adds a 503 branch before the default 401 raise, using is_database_service_unavailable_error; control flow ordering is correct (placed after ProxyException re-raise).
litellm/proxy/auth/user_api_key_auth.py Threads general_settings into _reserve_budget_after_common_checks; adds early-return guard when disable_budget_reservation is True with a per-request WARNING log.
litellm/proxy/proxy_cli.py Extracts DB URL param construction into _build_db_connection_url_params; wires four new general_settings fields into Prisma DATABASE_URL/DIRECT_URL; handles string-to-bool coercion for YAML-sourced values.
litellm/proxy/container_endpoints/ownership.py Makes get_container_forwarding_params async; adds _CONTAINER_STORED_ID_CACHE and _get_stored_container_id to recover the deployment model_id for native upstream Azure container IDs; includes negative sentinel and 60 s TTL cache to avoid per-request DB hits.
litellm/llms/azure/containers/transformation.py Adds _normalize_api_base and _extract_api_version; fixes container URL construction for models whose api_base was stored as the responses endpoint URL.
litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py Adds [DONE] sentinel skipping, json.JSONDecodeError tolerance, model extraction from streaming chunks, and _resolve_costing_model fallback for Anthropic-compatible providers.
litellm/router.py Falls back to the proxy-forwarded model_id kwarg when container ID decoding yields no model_id; the fallback condition calls .strip() twice.
pyproject.toml Version bump 1.86.5 → 1.86.6; pypdf pinned at 6.13.1; tornado and aiohttp constraint-dependencies added.

Reviews (2): Last reviewed commit: "chore: refresh uv.lock for 1.86.6" | Re-trigger Greptile

Comment thread litellm/router.py
Comment on lines +5602 to +5606
model_id = decoded.get("model_id") or (
_forwarded_model_id.strip()
if isinstance(_forwarded_model_id, str) and _forwarded_model_id.strip()
else None
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 .strip() is called twice on _forwarded_model_id — once in the truthiness guard and again to produce the value. Since .strip() is idempotent the double-call is harmless, but computing it once and reusing the result makes the intent clearer.

Suggested change
model_id = decoded.get("model_id") or (
_forwarded_model_id.strip()
if isinstance(_forwarded_model_id, str) and _forwarded_model_id.strip()
else None
)
_stripped_model_id = (
_forwarded_model_id.strip()
if isinstance(_forwarded_model_id, str)
else ""
)
model_id = decoded.get("model_id") or (_stripped_model_id or None)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@yuneng-berri yuneng-berri merged commit 7820496 into stable/1.86.x Jun 14, 2026
63 of 75 checks passed
@yuneng-berri yuneng-berri deleted the litellm_backport_1_86_x_0613 branch June 14, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants