chore(release): backport six staged fixes into stable/1.85.x and cut 1.85.4 by mateo-berri · Pull Request #29629 · BerriAI/litellm

mateo-berri · 2026-06-03T22:57:08Z

Relevant issues

Backports six fixes that already merged on litellm_internal_staging but never reached the 1.85.x line. Each one is cherry-picked from the squashed commit that landed for its PR, so the set matches what the newer lines already carry. This closes that gap and cuts 1.85.4

Linear ticket

N/A

What is included

Cherry-picked in merge order:

fix(azure): preserve AD token refresh in v1 OpenAI client path #28627 fix(azure): preserve AD token refresh in the v1 OpenAI client path
fix(proxy): map stripped batch body.model to proxy alias for auth #29264 fix(proxy): map a stripped batch body.model back to the proxy alias so key access checks pass
fix(proxy): resolve managed video model ids for auth #29545 fix(proxy): resolve managed video model ids through the router before auth, budget, and key checks
fix(key_generate): allow team members to create keys on org-scoped teams #29310 fix(key_generate): let team members create keys on org-scoped teams (regression since v1.84.0-rc.1)
fix(vertex): strip output_config.effort for Vertex Claude models that reject it (Haiku 4.5) #29585 fix(vertex): strip output_config.effort for Vertex Claude models that reject it, such as Haiku 4.5
[internal copy of #29550] fix: passthrough endpoints duplicate logs #29598 fix(passthrough): stop duplicate cost callbacks for Anthropic streaming pass-through

The last two commits are the version bump (1.85.3 to 1.85.4) and the matching uv.lock refresh

#29598 needed a manual conflict resolution. pass_through_endpoints.py is stored with CRLF endings on the stable branches, so the verbatim cherry-pick will not apply, and the new tests sit next to an unrelated test class that this line does not carry. The resolved source change and the added tests are byte-identical to the upstream commit; only the surrounding file context differs, and the dedup guard the fix relies on (has_dispatched_final_stream_success) is present on this line

Pre-Submission checklist

The cherry-picked PRs each carry their own tests
My PR passes all unit tests on make test-unit
Scope is limited to backporting already-merged fixes plus the release bump
Greptile review requested

CI (LiteLLM team)

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

These are backports of changes already merged, released, and validated on newer lines; each linked PR carries its own proof and review

Type

Bug Fix
Infrastructure

Changes

See the commit list above. No new code beyond the cherry-picks, the version bump, and the lockfile refresh

* fix(azure): preserve AD token refresh in v1 OpenAI client path The /openai/v1/ code path (api_version in {"v1", "latest", "preview"}) constructs a plain OpenAI/AsyncOpenAI client, but only forwarded `api_key` from `azure_client_params`. When `enable_azure_ad_token_refresh` is set (or any AD-only auth), `api_key` is None and the client constructor raised "The api_key client option must be set...", breaking every Azure call with a v1 api_version. The OpenAI SDK (>=2.20.0) accepts a callable for `api_key` and re-invokes it on every request via `_refresh_api_key`, so we now forward `azure_ad_token_provider` directly — preserving the per-request token refresh behavior of the regular AzureOpenAI client and avoiding the expiry hole that resolving the token once at client-creation time would introduce. Static `azure_ad_token` strings fall through to `api_key`. For the async path we wrap the sync provider returned by azure-identity in an async function since AsyncOpenAI expects `Callable[[], Awaitable[str]]`. Fixes #27945 https://claude.ai/code/session_01UnzrDSFUUgp5T2wRoPMxq5 * fix(azure): offload sync token provider to thread in v1 async wrapper * fix(azure): include AD credential identity in v1 client cache key --------- Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit 96a2e8b)

…9264) * fix(proxy): map stripped batch body.model to proxy alias for auth replace_model_in_jsonl rewrites JSONL body.model to the provider id before upload; batch file access checks must resolve that id back to model_name so keys granted the proxy alias are not rejected with 403. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): surface resolved proxy alias in batch file 403 detail --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> (cherry picked from commit 70d2748)

* fix(proxy): resolve managed video model ids for auth Co-authored-by: Cursor <cursoragent@cursor.com> * test(proxy): cover character_id router model resolution Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit d45e9e4)

…ams (#29310) * fix(key_generate): allow team members to create keys on org-scoped teams When a virtual key is created for a team, enterprise logic inherits the team's organization_id onto the key (add_team_organization_id). Since the VERIA-55 org-IDOR fix, /key/generate then required the caller to be an explicit LiteLLM_OrganizationMembership member of that org, returning 403 "Caller is not a member of organization_id=<uuid>". Admins normally only add users to teams (not orgs), so self-serve key creation regressed for any user on an org-scoped team (regression since v1.84.0-rc.1). Skip the org-membership check when organization_id was inherited from the key's team (organization_id == team_table.organization_id). Team-level authorization already gates this path, so team membership is sufficient. The membership check still runs when a caller assigns an organization_id that did not come from the key's team, preserving the IDOR protection. Adds regression tests covering both the team-inherited (allowed) and foreign-org (still blocked) cases. Co-authored-by: Cursor <cursoragent@cursor.com> * test(key_generate): cover mismatched team org IDOR path on generate Add test_generate_key_foreign_org_with_mismatched_team_still_enforces_membership for the case where a team is present but request organization_id differs from team_table.organization_id. Enterprise inheritance is no-op'd in the test so the guard is exercised directly; membership validation must still run. Addresses Greptile review on #29310. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit b11833c)

… reject it (Haiku 4.5) (#29585) * fix(vertex): strip output_config.effort for models that reject it Haiku 4.5 on Vertex AI does not support output_config.effort and 400s with "output_config.effort: Extra inputs are not permitted". PR #27074 emptied VERTEX_UNSUPPORTED_OUTPUT_CONFIG_KEYS so effort would forward for Opus/Sonnet 4.6+, but that made the strip unconditional across every Vertex Anthropic model, including ones that don't support it. Claude Code injects effort into its default Messages payload, so `claude --model claude-haiku-4.5` started failing. Make the sanitizer model-aware: drop output_config.effort for models that don't advertise output_config support (or any reasoning effort level) while forwarding it for those that do. The fix covers both the chat-completion and Messages pass-through transformation paths since they share the helper. * chore(vertex): log at debug when dropping unsupported output_config.effort Operators pointing an unregistered Vertex Claude alias that does support effort would otherwise see it stripped with no signal. Debug level keeps it out of normal logs since Claude Code sends effort on every request. (cherry picked from commit cc55662)

* fix duplicate cost callbacks for anthropic streaming pass-through Two bugs caused _PROXY_track_cost_callback to see stream=True + complete_streaming_response=None on every streaming pass-through request, making the dedup guard in dispatch_success_handlers permanently inactive: 1. pass_through_endpoints.py created the Logging object with stream=False for all requests. _is_assembled_stream_success short-circuits on self.stream is not True, so has_dispatched_final_stream_success was never set and any second dispatch went through unchecked. Fix: set logging_obj.stream = True after stream detection. 2. _create_anthropic_response_logging_payload set complete_streaming_response inside the try block after litellm.completion_cost(), so a pricing error caused an early return without setting it on model_call_details. Fix: set complete_streaming_response before the try block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix stream * add stream to logging obj * test(pass_through): give mock logging object a real model_call_details dict The anthropic passthrough logging payload now records the assembled response on model_call_details before cost calculation, which requires model_call_details to support item assignment. In production it is always a dict; the existing unit test stubbed the logging object with a bare Mock whose attribute is not subscriptable, so the new assignment raised TypeError. Use a real dict to match the production logging object. * test(pass_through): cover streaming logging-obj stream flag The streaming branch of pass_through_request that marks the logging object as streaming (logging_obj.stream and model_call_details["stream"]) had no unit coverage, so the patch coverage gate flagged it. Add a regression test that drives a streaming pass-through request through pass_through_request and asserts the logging object is flagged as a stream before dispatch. * test(pass_through): cover SSE-response stream flag fallback branch The auto-detected streaming branch of pass_through_request (when a request that was not flagged as streaming returns a text/event-stream response) sets logging_obj.stream and model_call_details["stream"] but had no unit coverage, so the codecov patch gate failed at 60%. Drive a non-streaming pass-through request whose upstream response is SSE through pass_through_request and assert the logging object is flagged as a stream before dispatch. * fix(pass_through): gate complete_streaming_response on stream flag perform_redaction only scrubs complete_streaming_response when model_call_details["stream"] is True. Setting it unconditionally for non-streaming Anthropic pass-through responses left the assembled response unredacted in model_call_details, which is handed to logging callbacks as kwargs when message logging is disabled. Only record it for actual streaming responses so redaction always applies. --------- Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 2bbdbfa)

codecov · 2026-06-03T23:01:10Z

Codecov Report

❌ Patch coverage is 26.00000% with 37 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/proxy/auth/auth_utils.py	10.00%	9 Missing ⚠️
...ai_partner_models/anthropic/output_params_utils.py	20.00%	8 Missing ⚠️
litellm/llms/azure/common_utils.py	56.25%	7 Missing ⚠️
litellm/proxy/hooks/batch_rate_limiter.py	0.00%	5 Missing ⚠️
...y/pass_through_endpoints/pass_through_endpoints.py	0.00%	4 Missing ⚠️
..._handlers/anthropic_passthrough_logging_handler.py	0.00%	2 Missing ⚠️
...rtex_ai_partner_models/anthropic/transformation.py	0.00%	1 Missing ⚠️
litellm/proxy/spend_tracking/budget_reservation.py	50.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-06-03T23:26:49Z

Greptile Summary

This release backports six targeted bug fixes from the staging line into the stable/1.85.x branch and cuts version 1.85.4. All source changes are cherry-picks of already-merged and validated commits; no new logic is introduced beyond the fixes themselves.

Azure AD token refresh (v1 path): The OpenAI v1 client now receives azure_ad_token_provider as its api_key callable so tokens are refreshed per-request; a thread-offloaded async wrapper handles the AsyncOpenAI case. Cache discrimination for the provider uses id() of the callable plus tenant/client/scope hashes, which is slightly fragile under GC memory reuse but adequate for long-lived providers.
Vertex Anthropic output_config.effort filtering: The sanitizer now accepts a model argument and consults the model map (_model_supports_effort_param) rather than hardcoding names, so Haiku 4.5 (which rejects the field) is filtered correctly while Opus/Sonnet 4.6+ pass it through.
Team-member key generation regression: Non-admin team members can now create keys for their team's org without an explicit org-membership row; the bypass is correctly gated on the team's database org matching the request's org, preventing privilege escalation to unrelated orgs.
Managed video/character model ID resolution: Auth, budget, and key-access checks now resolve managed resource IDs through the router before validation, so proxy model aliases are matched correctly.
Batch rate-limiter model mapping: The batch file's provider model ID is resolved to its proxy alias before can_key_call_model, fixing auth failures for batch jobs.
Anthropic pass-through duplicate cost callbacks: logging_obj.stream is set to True early in both streaming paths so the has_dispatched_final_stream_success dedup guard activates; complete_streaming_response is also guarded behind a stream check to protect message redaction for non-streaming responses.

Confidence Score: 4/5

The backport is safe to merge; all six fixes address clear regressions and are each covered by new mock-only unit tests.

The changes are well-scoped backports of already-validated fixes. The only notable observation is in the Azure AD cache key, which uses Python's id() of the token provider callable as part of the cache discriminator. This works reliably for long-lived module-level providers (the common case) but could produce a stale-client cache hit in rare scenarios where the provider function is garbage-collected and a new one lands at the same memory address with the same tenant/scope parameters. The rest of the changes — the Vertex effort-param filter, the org-bypass logic, the router model-ID resolution, and the streaming dedup guard — all look correct and are backed by targeted regression tests.

litellm/llms/azure/common_utils.py — the id()-based cache discriminator for azure_ad_token_provider is worth a second look if short-lived provider closures are a realistic usage pattern in the deployment.

Important Files Changed

Filename	Overview
litellm/llms/azure/common_utils.py	Adds Azure AD token provider support for v1 API path; correctly wraps sync token providers in async, but uses ephemeral id() as cache discriminator which could cause rare stale-client hits.
litellm/llms/vertex_ai/vertex_ai_partner_models/anthropic/output_params_utils.py	Adds model-aware filtering of output_config.effort; delegates to AnthropicConfig._model_supports_effort_param which reads from the model map — no hardcoded model names.
litellm/proxy/management_endpoints/key_management_endpoints.py	Allows team members to generate keys for their team's inherited org without an explicit org-membership row; bypass is properly gated on the team's org matching the request org (from DB, not user-controlled).
litellm/proxy/auth/auth_utils.py	Plumbs llm_router through model-extraction helpers so managed video/character IDs are resolved to proxy aliases before auth checks.
litellm/proxy/auth/user_api_key_auth.py	Passes llm_router to all _get_model_from_request_context call sites so model resolution is consistent across auth, budget, and key-access checks.
litellm/proxy/hooks/batch_rate_limiter.py	Maps batch body's provider model ID to its proxy alias via the router before the can_key_call_model check; fixes batch authorization when the file contains the provider ID rather than the alias.
litellm/proxy/pass_through_endpoints/pass_through_endpoints.py	Sets logging_obj.stream = True at the start of each streaming path so the dedup guard in dispatch_success_handlers activates correctly, preventing duplicate cost callbacks.
litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py	Guards complete_streaming_response assignment behind a stream==True check to prevent bypassing message redaction on non-streaming pass-through responses.
litellm/proxy/spend_tracking/budget_reservation.py	Passes llm_router to get_model_from_request calls so budget reservation also resolves managed resource model IDs through the router.
litellm/proxy/auth/auth_checks.py	Passes llm_router into common_checks so the router is available during shared auth validation.

_{Reviews (1): Last reviewed commit: "chore: update uv.lock for 1.85.4" | Re-trigger Greptile}

greptile-apps · 2026-06-03T23:26:52Z

+        client_initialization_params["azure_ad_token_provider"] = (
+            f"provider_id={id(_ad_provider) if callable(_ad_provider) else None}"
+            f"|tenant_id={_lp.get('tenant_id')}"
+            f"|client_id={_lp.get('client_id')}"
+            f"|client_secret={hashlib.sha256(_client_secret.encode()).hexdigest() if isinstance(_client_secret, str) else None}"
+            f"|azure_username={_lp.get('azure_username')}"
+            f"|azure_password={hashlib.sha256(_azure_password.encode()).hexdigest() if isinstance(_azure_password, str) else None}"
+            f"|azure_scope={_lp.get('azure_scope')}"


Ephemeral object ID used as a stable cache discriminator

Python's id() returns the memory address of a live object. Once a token provider is garbage-collected, a subsequently created (or already-existing) provider may receive the same address. Two different azure_ad_token_provider callables with the same tenant/client/scope but colliding id() values would share a cached OpenAI client built for the first provider. In high-throughput servers where many short-lived provider closures are created, this can silently return a stale client whose cached token belongs to a different identity. A more stable discriminant (e.g. the id of the underlying credential object from azure-identity, or a provider-specific string label) would eliminate the race.

mateo-berri and others added 8 commits June 3, 2026 22:53

bump: version 1.85.3 → 1.85.4

c38c2bf

chore: update uv.lock for 1.85.4

aaa3016

mateo-berri marked this pull request as ready for review June 3, 2026 23:20

mateo-berri requested review from a team and ryan-crabbe-berri June 3, 2026 23:20

mateo-berri enabled auto-merge June 3, 2026 23:25

ryan-crabbe-berri approved these changes Jun 3, 2026

View reviewed changes

mateo-berri merged commit 57d3cad into stable/1.85.x Jun 3, 2026
66 of 75 checks passed

mateo-berri deleted the litellm_cherrypick_1_85_4 branch June 3, 2026 23:25

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(release): backport six staged fixes into stable/1.85.x and cut 1.85.4#29629

chore(release): backport six staged fixes into stable/1.85.x and cut 1.85.4#29629
mateo-berri merged 8 commits into
stable/1.85.xfrom
litellm_cherrypick_1_85_4

mateo-berri commented Jun 3, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Important Files Changed

Uh oh!

greptile-apps Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

mateo-berri commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Linear ticket

What is included

Pre-Submission checklist

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mateo-berri commented Jun 3, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading