chore(release): backport the 1.84.8 patch set + deps bump to stable/1.85.x and cut 1.85.6#30404
Conversation
…27921) * fix(router): use forwarded model_id for native Azure container IDs in _init_containers_api_endpoints Azure code-interpreter containers return provider-native IDs (cntr_ + hex) that carry no LiteLLM routing payload, so _decode_container_id returns model_id=None. The router was falling through to call the handler directly, bypassing _ageneric_api_call_with_fallbacks and leaving api_base=None for Azure deployments. Fall back to the model_id forwarded from the proxy ownership check so deployment credentials are always applied. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(azure-containers): strip /openai/responses path from api_base in AzureContainerConfig.get_complete_url When a deployment's api_base is the responses endpoint URL (e.g. .../openai/responses?api-version=...), AzureContainerConfig was appending /openai/containers on top of it, producing the broken path .../openai/responses/openai/containers. Azure returns 404 for that URL while the correct path is .../openai/containers. Strip any /openai/responses suffix from api_base before constructing the containers URL so the resource root is always used as the starting point. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(azure-containers): prefer api-version from api_base URL over deployment's api_version The deployment's api_version (e.g. 2024-08-01-preview) targets the chat/responses API and is too old for the containers API, which requires 2025-04-01-preview. The responses endpoint api_base already carries the correct api-version in its query string. Extract it and use it for the containers URL, overriding the stale deployment-level version. Fixes DELETE and file-upload operations returning 404 due to wrong api-version. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(containers): pass params=None instead of params={} to httpx to preserve api-version httpx erases a URL's query-string when params={} (empty dict) is passed, silently stripping ?api-version=2025-04-01-preview from every container POST/DELETE request. Azure's GET endpoints tolerate a missing api-version; POST (upload) and DELETE are strict, so those returned 404. Fix: use `params or None` in container_handler._async_handle and llm_http_handler.async_container_delete_handler (and all sibling container handlers) so that an empty params dict falls back to None, leaving httpx to preserve the URL's existing query string intact. Adds a regression test that directly documents the httpx behaviour. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(router): remove elif model_id branch from _init_containers_api_endpoints Two reviewer findings addressed: 1. Truncated comment on the model_id fallback line — now complete. 2. Security: the elif branch that fired when container_id was absent allowed any authenticated caller to supply model_id in a POST /v1/containers body and route the request through an arbitrary deployment UUID, bypassing the model-level access checks that only validate `model`. Removed the elif branch; operations without container_id (create, list) route by the caller-supplied `model` field as before. model_id forwarding is kept only inside the container_id block, where the proxy ownership check has already validated the container before forwarding the deployment ID. Adds a regression test pinning the security boundary: no-container-id path calls original_function directly even when model_id is in kwargs. Co-authored-by: Cursor <cursoragent@cursor.com> * test(containers): validate proxy-to-router model_id forwarding for managed IDs Add test_regression_get_container_forwarding_params_sets_model_id_for_managed_id to verify that get_container_forwarding_params (the proxy-side half of the Azure routing fix) correctly extracts and forwards model_id from a LiteLLM-managed encoded container ID. This closes the gap identified by Greptile P1: the previous regression test only injected model_id as a direct kwarg, validating the router in isolation. The new test exercises the actual proxy-to-router data flow through ownership.get_container_forwarding_params, confirming that kwargs["model_id"] is populated before _init_containers_api_endpoints is reached. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(azure-containers): tighten endpoint-path strip to endswith match Use path.endswith() instead of path.find() for _AZURE_ENDPOINT_PATHS so the suffix strip only fires when api_base actually ends with one of the endpoint-specific path suffixes. This is the more precise check greptile flagged on the original find()-based implementation. * Fix sync container handler to preserve URL query string Mirror the async path fix: pass None instead of an empty params dict so httpx does not strip the URL's existing query string (e.g. ?api-version=...), which is required for Azure container routing. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(azure-containers): strip trailing slash before endpoint suffix match Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(containers): recover model_id from stored encoded id for native Azure container IDs get_container_forwarding_params previously only set model_id when the user-supplied container_id was a LiteLLM-managed encoded id. For native upstream IDs (e.g. Azure 'cntr_<hex>') the decode fails and model_id was never forwarded — making the router-side fallback in _init_containers_api_endpoints unreachable in production. Fall back to the stored 'unified_object_id' on the ownership row, which is the encoded form captured at create time when the router selected a specific deployment. Decoding that yields the deployment model_id and restores router-based credential application (api_base, api_key) for retrieve/delete and container-file operations on native IDs. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> (cherry picked from commit 7f563b2)
…28395) * fix(proxy): expose Prisma idle/connect timeout + extra DB URL params Operators have reported large numbers of idle Prisma connections that never get closed. The proxy already forwards `connection_limit` and `pool_timeout` to the DATABASE_URL, but had no knob for capping idle or slow connections. Add three new `general_settings` keys that thread through to the DATABASE_URL / DIRECT_URL query string: - `database_connect_timeout` -> Prisma `connect_timeout` - `database_socket_timeout` -> Prisma `socket_timeout` (the main knob for closing idle connections from the LiteLLM side) - `database_extra_connection_params` -> untyped passthrough dict for any other Prisma URL param (`pgbouncer`, `statement_cache_size`, `sslmode`, ...); keys here override LiteLLM defaults. Refactors the duplicated DATABASE_URL/DIRECT_URL param dicts into a single `_build_db_connection_url_params` helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update litellm/proxy/proxy_cli.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> (cherry picked from commit 2f9ac77)
…thropic streaming logging Targeted subset of staging commit cfcdf87 (#30202): only the anthropic_passthrough_logging_handler.py hardening hunks and their four tests are taken; the rest of that staging batch is intentionally excluded. (cherry picked from commit cfcdf87) (cherry picked from commit 973c7eb)
…combined view (#30327) The grace-period branch assigned the recursive get_data result (a finished LiteLLM_VerificationTokenView) back into the variable that the combined-view dict normalization then subscripts, raising TypeError on every request made with a rotated key inside its grace window; auth surfaced that as a 401. Return the recursive result directly instead. Regression test drives the full get_data flow: old hash misses the view, deprecated table resolves to the active token, and the call must return the view object (cherry picked from commit 5047eaf)
* chore(deps): bump aiohttp to 3.14.1 and vitest to 3.2.6 Lockfile-only bump for aiohttp (3.13.5 -> 3.14.1, within the existing pyproject constraint) and dashboard devDependency bumps for vitest, @vitest/coverage-v8, @vitest/ui (3.2.4 -> 3.2.6) plus transitive brace-expansion (5.0.5 -> 5.0.6). Clears the currently published advisories flagged by osv.dev against uv.lock and the dashboard lockfile. Verified: 154 custom_httpx unit tests and all 3943 dashboard vitest tests pass; live proxy completion and streaming calls succeed on the bumped venv * chore(deps): raise aiohttp floor to 3.14.0 The lockfile bump alone only protects environments built from uv.lock. Raising the pyproject floor extends the same minimum to package consumers installing litellm from PyPI, and prevents a future lockfile regeneration from resolving below 3.14.0 * Revert "chore(deps): raise aiohttp floor to 3.14.0" This reverts commit d6c1c9d. * revert(deps): roll back aiohttp to 3.13.5 vcrpy is incompatible with aiohttp >= 3.14 (the aiohttp_stubs module imports a symbol removed in 3.14) and the upstream fix is merged but unreleased, so every cassette-based test suite fails on 3.14. Hold aiohttp at 3.13.5 until a vcrpy release ships; the vitest and brace-expansion bumps stay * chore(deps): bump pypdf to 6.13.1 and tornado to 6.5.7 Lockfile-only bumps clearing the advisories published for both since this branch was opened * chore(deps): add regression guards for the bumped versions Raise the pypdf floor to 6.12.0 (direct dependency, applies to package consumers too) and add uv constraint-dependencies for the transitive pins: tornado >= 6.5.6, and aiohttp held in [3.13.5, 3.14) so a lockfile regeneration can neither fall back below the current version nor move onto 3.14 while vcrpy is incompatible. Constraints live in [tool.uv] and only affect this repo's resolution, not published metadata. Verified: uv lock -P with each out-of-range version fails to resolve; in-range resolutions unchanged (pypdf 6.13.1, tornado 6.5.7, aiohttp 3.13.5) (cherry picked from commit d96ab46) # Conflicts: # pyproject.toml # ui/litellm-dashboard/package-lock.json # ui/litellm-dashboard/package.json # uv.lock
|
|
Greptile SummaryBackport of nine cherry-picks from the 1.84.8 patch set onto the
Confidence Score: 4/5This is a carefully assembled backport; all source fixes apply cleanly and the documented adaptations are sound. The two noted items are non-blocking. The core fixes for Prisma cached-plan recovery, 503-on-DB-outage, deprecated-key lookup crash, Anthropic SSE logging, and container ID routing are all correct. The litellm/proxy/db/exception_handler.py (OSError breadth in is_database_service_unavailable_error) and litellm/proxy/container_endpoints/ownership.py (direct DB call in _get_stored_container_id).
|
| Filename | Overview |
|---|---|
| litellm/proxy/utils.py | Two fixes: _query_first_with_cached_plan_fallback now reconnects the Prisma client (singleflight) instead of injecting a cache-busting comment; deprecated-key lookup now returns immediately via deprecated_response to avoid crashing dict-normalization code on a VerificationTokenView. |
| litellm/proxy/auth/auth_exception_handler.py | Adds a 503 branch for DB-infrastructure errors so they are no longer silently folded into the 401 fallthrough; the new check correctly runs after HTTPException/ProxyException guards are exhausted. |
| litellm/proxy/db/exception_handler.py | Adds is_prisma_engine_internal_error (traceback-walk heuristic) and is_database_service_unavailable_error (composite classifier); the OSError catch inside the latter is broad and could flag non-DB OS errors during auth as 503. |
| litellm/proxy/container_endpoints/ownership.py | Adds _CONTAINER_STORED_ID_CACHE and _get_stored_container_id to recover the deployment model_id for native upstream container IDs; makes a direct Prisma query on cache miss, following the same pattern as the pre-existing _get_container_owner. |
| litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py | Adds _resolve_costing_model (fallback chain: body model → deployment model → model_group) and _extract_model_from_anthropic_chunks (parses message_start SSE); skips [DONE] sentinels and non-JSON SSE frames to prevent logging pipeline errors. |
| litellm/proxy/proxy_cli.py | Extracts _build_db_connection_url_params helper; surfaces connect_timeout, socket_timeout, disable_prepared_statements, and extra_connection_params from general_settings into the Prisma DATABASE_URL / DIRECT_URL query string. |
| litellm/proxy/_types.py | Adds four new ConfigGeneralSettings fields: database_connect_timeout, database_socket_timeout, database_extra_connection_params, database_disable_prepared_statements. |
| litellm/llms/azure/containers/transformation.py | Adds _normalize_api_base (strips endpoint-specific path suffixes) and _extract_api_version (pulls api-version from query string); get_complete_url now uses the version from api_base instead of the deployment's api_version to prefer the newer containers API version. |
| litellm/router.py | Forwards model_id from kwargs as a fallback when the container_id decodes without a model_id, enabling deployment-credential lookup for native Azure container IDs. |
| litellm/llms/custom_httpx/container_handler.py | Converts empty-dict query_params to None before passing to httpx to prevent the empty dict from stripping the URL's existing query string (e.g., api-version). |
Reviews (1): Last reviewed commit: "chore: refresh uv.lock for 1.85.6" | Re-trigger Greptile
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
| so a type-only check misses real outages. ``is_database_transport_error`` | ||
| keyword-matches the connection message and catches that masquerade, | ||
| while genuine data errors (no connection keyword) correctly stay 401. | ||
|
|
||
| The Postgres "cached plan must not change result type" error is matched | ||
| here, not in ``is_database_transport_error``: it is a transient stale-DB- |
There was a problem hiding this comment.
Broad OSError catch may misclassify non-DB errors as 503
OSError is the base class for FileNotFoundError, PermissionError, BrokenPipeError, and others that have nothing to do with database connectivity. If any of these are raised during the auth flow for reasons unrelated to the DB (e.g. a missing config file loaded lazily, or a broken pipe on the client side), the auth handler will return 503 ("database temporarily unreachable") instead of the generic 401 fallthrough. Narrowing to ConnectionError and socket.timeout (both OSError subclasses) would keep the intent while avoiding false positives.
| cached = _CONTAINER_STORED_ID_CACHE.get_cache(model_object_id) | ||
| if cached == _NEGATIVE_STORED_ID_SENTINEL: | ||
| return None | ||
| if isinstance(cached, str) and cached: | ||
| return cached | ||
|
|
||
| prisma_client = await _get_prisma_client() | ||
| if prisma_client is None: | ||
| return None | ||
|
|
||
| row = await prisma_client.db.litellm_managedobjecttable.find_first( | ||
| where={ | ||
| "model_object_id": model_object_id, | ||
| "file_purpose": CONTAINER_OBJECT_PURPOSE, | ||
| } | ||
| ) | ||
| stored_id = getattr(row, "unified_object_id", None) if row is not None else None | ||
| _CONTAINER_STORED_ID_CACHE.set_cache( | ||
| model_object_id, | ||
| ( | ||
| stored_id |
There was a problem hiding this comment.
Direct DB query outside the approved helper layer
_get_stored_container_id calls prisma_client.db.litellm_managedobjecttable.find_first(...) directly, bypassing the get_team/get_user/get_key helper pattern. This is consistent with the pre-existing _get_container_owner in the same file, which already does the same thing — so this doesn't regress anything — but it means both functions carry DB queries directly in the container request path. The 60-second TTL cache mitigates frequency, but the pattern is worth noting if the ownership module is audited for the no-direct-DB rule later.
Rule Used: What: In critical path of request, there should be... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Greptile SummaryThis is a carefully documented backport of nine cherry-picks from
Confidence Score: 5/5Clean backport; all source changes are well-scoped bug fixes with comprehensive regression tests, no previously-passing assertions were weakened, and all adaptations are documented and symbol-verified. Every changed symbol was verified to exist on this branch, all five No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/utils.py | Two fixes: _query_first_with_cached_plan_fallback now reconnects the Prisma client on a cached-plan error (instead of injecting a unique comment that defeated plan caching on every call), and the deprecated-key lookup correctly returns deprecated_response directly instead of falling through to dict normalization that would crash subscripting a LiteLLM_VerificationTokenView. |
| litellm/proxy/auth/auth_exception_handler.py | Adds a PrismaDBExceptionHandler.is_database_service_unavailable_error guard after the ProxyException re-raise; DB infrastructure failures now surface as 503 instead of falling through to the generic 401 path. The ordering is correct: ProxyExceptions (including legitimate 401s) are re-raised first. |
| litellm/proxy/db/exception_handler.py | Adds is_prisma_engine_internal_error (walks the traceback to detect non-PrismaError exceptions from prisma.engine, e.g. AttributeError from malformed error payloads during teardown) and is_database_service_unavailable_error (composites connection/transport/engine checks plus the cached-plan case). Data-layer errors are correctly excluded via the explicit exclusion list in is_database_connection_error. |
| litellm/proxy/proxy_cli.py | Extracts _build_db_connection_url_params helper and adds database_connect_timeout, database_socket_timeout, database_disable_prepared_statements, database_extra_connection_params config fields. The new params are propagated correctly to both DATABASE_URL and DIRECT_URL. |
| litellm/proxy/container_endpoints/ownership.py | Makes get_container_forwarding_params async and adds a _CONTAINER_STORED_ID_CACHE (in-memory, TTL=60) plus _get_stored_container_id to recover the deployment model_id from the stored unified_object_id for native upstream IDs that carry no LiteLLM routing payload. |
| litellm/llms/azure/containers/transformation.py | Adds _normalize_api_base (strips /openai/responses suffix) and _extract_api_version (pulls api-version from the URL query string), then uses both in get_complete_url so the containers URL is built from the resource root with the correct API version. |
| litellm/router.py | Falls back to the model_id forwarded by the proxy (kwargs["model_id"]) when the container_id is a native upstream ID that carries no LiteLLM routing payload; .strip() guards against whitespace-only strings. |
| litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py | Three additions: _resolve_costing_model resolves "unknown" model sentinel via litellm_params then model_group; _extract_model_from_anthropic_chunks recovers the real model from the message_start SSE event; [DONE] sentinel and json.JSONDecodeError are now skipped in the streaming accumulator loop. |
| litellm/proxy/container_endpoints/endpoints.py | Adds await to the two get_container_forwarding_params call sites (retrieve and delete container) that were missing it after ownership.py made the function async. |
| litellm/proxy/container_endpoints/handler_factory.py | Adds await to three more get_container_forwarding_params call sites in binary, multipart-upload, and generic request handlers. |
| litellm/llms/custom_httpx/container_handler.py | Changes all httpx calls to use params=effective_params where effective_params = query_params or None, preventing an empty dict from stripping an existing query string from the URL. |
| litellm/llms/custom_httpx/llm_http_handler.py | Applies the same params or None guard to all ten container-related httpx GET/DELETE call sites in the LLM HTTP handler. |
| litellm/proxy/_types.py | Adds four new optional fields to ConfigGeneralSettings for the DB URL param expansion: database_connect_timeout, database_socket_timeout, database_extra_connection_params, database_disable_prepared_statements. |
| pyproject.toml | Version bumped to 1.85.6; pypdf floor raised to 6.13.1; tornado and aiohttp constraint-dependency pins added. |
Reviews (2): Last reviewed commit: "chore: refresh uv.lock for 1.85.6" | Re-trigger Greptile
Relevant issues
Backports the patch set that was just cut into 1.84.8 onto the 1.85.x line, plus a dependency bump that is scoped to 1.85 and newer, and cuts 1.85.6. The 1.85.x line branched from staging on 2026-05-13, before any of these merged, so it carried none of them; 1.84.x already has the fix set as of 1.84.8, so this keeps an upgrade from 1.84.8 to 1.85.x monotonic. Every code commit here is a
git cherry-pick -xof a commit reachable from litellm_internal_staging; the version bump and uv.lock refresh are the only non-pick commits.What is included
In staging-merge order:
Adaptation notes
Seven of the nine picks are adapted; the divergence from the staging commit in each case is below. #28395 and 973c7eb are byte-for-byte (patch-id) identical to staging.
litellm/proxy/_experimental/out/UI build artifact from the pick; that file is a build output the proxy regenerates, and the 1.85.x UI output uses a different layout. The router/transformation/container/ownership source and the test are taken as-istests/test_litellm/proxy/utils/prisma_and_spend/test_prisma_client_get_data.py, which lives in a staging-only pin-harness directory (a 15-file module with a shared conftest) that does not exist on 1.85.x. Importing it would drag in behavior pins for methods that have drifted on this line. The source fixes inlitellm/proxy/utils.pyapply cleanly and are symbol-verified (_query_first_with_cached_plan_fallbackandattempt_db_reconnectboth already exist on the line with the matching signature)ui/litellm-dashboard/src/lib/http/schema.d.tshunk; that generated UI types file is not present on 1.85.x. The CLI option inproxy_cli.py+_types.pyand its test apply cleanlyauth_exception_handler.seed_request_identity. That symbol comes from an unrelated identity-seeding change (fix: 400 on Anthropic context overflow; seed identity on failed auth #29848) that is not on 1.85.x; the line's auth failure path never calls it, so patching it both failed and was unnecessary. The fix itself does not reference the symboltest_passthrough_logging_sets_response_cost_with_server_tool_use_dict) that rode in on the conflict block but is not part of fix(anthropic_passthrough): resolve costing model from message_start chunk, litellm_params and model_group instead of 'unknown' #30160's diff[tool.uv]constraint-dependencies fortornado>=6.5.6andaiohttp>=3.13.5,<3.14, and the dashboard bumps vitest 3.2.6 + brace-expansion 5.0.6. uv.lock and package-lock.json were regenerated fresh rather than taken verbatim; the resolved versions match staging (pypdf 6.13.1, tornado 6.5.7, aiohttp 3.13.5, vitest 3.2.6, brace-expansion 5.0.6)Known noise on this line
One pre-existing test failure unrelated to the picks:
tests/test_litellm/containers/test_azure_container_transformation.py::TestAzureContainerConfig::test_validate_environment_uses_azure_env_var. It reads a realAZURE_API_KEYfrom a local.envthat litellm loads by walking up the directory tree, which beats the test'smonkeypatch.setenv. It is environmental and passes in clean CI. It is present identically before and after the picks.Pre-Submission checklist
Type
🐛 Bug Fix
🚄 Infrastructure
✅ Test
Changes
See the pick list above. Net: nine cherry-picks (eight source fixes and one dependency bump), the version bump to 1.85.6, and a uv.lock refresh.
Screenshots / Proof of Fix
Live proxy on a real Postgres DB with real provider keys, running the worktree's code with all nine picks applied.
Boot and health:
Real completion (router path):
Key generation, then a scoped call with that key (the auth lookup path #29986 touches):
Anthropic passthrough streaming (the [DONE]/non-JSON-SSE logging path 973c7eb hardens and the costing path #30160 fixes):
Targeted test delta vs a baseline captured on the line tip before any pick: baseline 1 failed (the known-noise above), 206 passed; after the picks 1 failed (same known-noise), 267 passed. Zero new failures, and the 61 new passing tests are the picks' own regression tests.
Gauntlet (deep, universal hypothesis over all nine picks): SURVIVED. Ten adversarial agents independently confirmed each pick's source diff matches its staging counterpart except for the documented adaptations, every referenced symbol resolves on this tree (including that the removed
seed_request_identityis absent from all production code), the kept tests pass, and no drifted caller is broken by a pick. The one failing test in the run is the known-noise Azure env-key case above; the gauntlet traced it to the host.envresolution order and confirmed it is pre-existing and out of scope.