Skip to content

chore(release): backport six staged fixes into stable/1.86.x and cut 1.86.4#29630

Merged
mateo-berri merged 8 commits into
stable/1.86.xfrom
litellm_cherrypick_1_86_4
Jun 3, 2026
Merged

chore(release): backport six staged fixes into stable/1.86.x and cut 1.86.4#29630
mateo-berri merged 8 commits into
stable/1.86.xfrom
litellm_cherrypick_1_86_4

Conversation

@mateo-berri

@mateo-berri mateo-berri commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Relevant issues

Backports six fixes that already merged on litellm_internal_staging but never reached the 1.86.x line. Each one is cherry-picked from the squashed commit that landed for its PR, so the set matches what the newer lines already carry. This closes that gap and cuts 1.86.4

Linear ticket

N/A

What is included

Cherry-picked in merge order:

The last two commits are the version bump (1.86.3 to 1.86.4) and the matching uv.lock refresh

#29598 needed a manual conflict resolution. pass_through_endpoints.py is stored with CRLF endings on the stable branches, so the verbatim cherry-pick will not apply, and the new tests sit next to an unrelated test class that this line does not carry. The resolved source change and the added tests are byte-identical to the upstream commit; only the surrounding file context differs, and the dedup guard the fix relies on (has_dispatched_final_stream_success) is present on this line

Pre-Submission checklist

  • The cherry-picked PRs each carry their own tests
  • My PR passes all unit tests on make test-unit
  • Scope is limited to backporting already-merged fixes plus the release bump
  • Greptile review requested

CI (LiteLLM team)

  • Branch creation CI run
    Link:
  • CI run for the last commit
    Link:
  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

These are backports of changes already merged, released, and validated on newer lines; each linked PR carries its own proof and review

Type

Bug Fix
Infrastructure

Changes

See the commit list above. No new code beyond the cherry-picks, the version bump, and the lockfile refresh

mateo-berri and others added 8 commits June 3, 2026 22:53
* fix(azure): preserve AD token refresh in v1 OpenAI client path

The /openai/v1/ code path (api_version in {"v1", "latest", "preview"})
constructs a plain OpenAI/AsyncOpenAI client, but only forwarded
`api_key` from `azure_client_params`. When `enable_azure_ad_token_refresh`
is set (or any AD-only auth), `api_key` is None and the client
constructor raised "The api_key client option must be set...", breaking
every Azure call with a v1 api_version.

The OpenAI SDK (>=2.20.0) accepts a callable for `api_key` and re-invokes
it on every request via `_refresh_api_key`, so we now forward
`azure_ad_token_provider` directly — preserving the per-request token
refresh behavior of the regular AzureOpenAI client and avoiding the
expiry hole that resolving the token once at client-creation time would
introduce. Static `azure_ad_token` strings fall through to `api_key`.

For the async path we wrap the sync provider returned by azure-identity
in an async function since AsyncOpenAI expects `Callable[[], Awaitable[str]]`.

Fixes #27945

https://claude.ai/code/session_01UnzrDSFUUgp5T2wRoPMxq5

* fix(azure): offload sync token provider to thread in v1 async wrapper

* fix(azure): include AD credential identity in v1 client cache key

---------

Co-authored-by: Claude <noreply@anthropic.com>
(cherry picked from commit 96a2e8b)
…9264)

* fix(proxy): map stripped batch body.model to proxy alias for auth

replace_model_in_jsonl rewrites JSONL body.model to the provider id before
upload; batch file access checks must resolve that id back to model_name
so keys granted the proxy alias are not rejected with 403.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(proxy): surface resolved proxy alias in batch file 403 detail

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
(cherry picked from commit 70d2748)
* fix(proxy): resolve managed video model ids for auth

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(proxy): cover character_id router model resolution

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit d45e9e4)
…ams (#29310)

* fix(key_generate): allow team members to create keys on org-scoped teams

When a virtual key is created for a team, enterprise logic inherits the
team's organization_id onto the key (add_team_organization_id). Since the
VERIA-55 org-IDOR fix, /key/generate then required the caller to be an
explicit LiteLLM_OrganizationMembership member of that org, returning
403 "Caller is not a member of organization_id=<uuid>". Admins normally
only add users to teams (not orgs), so self-serve key creation regressed
for any user on an org-scoped team (regression since v1.84.0-rc.1).

Skip the org-membership check when organization_id was inherited from the
key's team (organization_id == team_table.organization_id). Team-level
authorization already gates this path, so team membership is sufficient.
The membership check still runs when a caller assigns an organization_id
that did not come from the key's team, preserving the IDOR protection.

Adds regression tests covering both the team-inherited (allowed) and
foreign-org (still blocked) cases.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(key_generate): cover mismatched team org IDOR path on generate

Add test_generate_key_foreign_org_with_mismatched_team_still_enforces_membership
for the case where a team is present but request organization_id differs from
team_table.organization_id. Enterprise inheritance is no-op'd in the test so
the guard is exercised directly; membership validation must still run.

Addresses Greptile review on #29310.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit b11833c)
… reject it (Haiku 4.5) (#29585)

* fix(vertex): strip output_config.effort for models that reject it

Haiku 4.5 on Vertex AI does not support output_config.effort and 400s with
"output_config.effort: Extra inputs are not permitted". PR #27074 emptied
VERTEX_UNSUPPORTED_OUTPUT_CONFIG_KEYS so effort would forward for Opus/Sonnet
4.6+, but that made the strip unconditional across every Vertex Anthropic
model, including ones that don't support it. Claude Code injects effort into
its default Messages payload, so `claude --model claude-haiku-4.5` started
failing.

Make the sanitizer model-aware: drop output_config.effort for models that
don't advertise output_config support (or any reasoning effort level) while
forwarding it for those that do. The fix covers both the chat-completion and
Messages pass-through transformation paths since they share the helper.

* chore(vertex): log at debug when dropping unsupported output_config.effort

Operators pointing an unregistered Vertex Claude alias that does support
effort would otherwise see it stripped with no signal. Debug level keeps it
out of normal logs since Claude Code sends effort on every request.

(cherry picked from commit cc55662)
* fix duplicate cost callbacks for anthropic streaming pass-through

Two bugs caused _PROXY_track_cost_callback to see stream=True +
complete_streaming_response=None on every streaming pass-through request,
making the dedup guard in dispatch_success_handlers permanently inactive:

1. pass_through_endpoints.py created the Logging object with stream=False
   for all requests. _is_assembled_stream_success short-circuits on
   self.stream is not True, so has_dispatched_final_stream_success was
   never set and any second dispatch went through unchecked.
   Fix: set logging_obj.stream = True after stream detection.

2. _create_anthropic_response_logging_payload set complete_streaming_response
   inside the try block after litellm.completion_cost(), so a pricing error
   caused an early return without setting it on model_call_details.
   Fix: set complete_streaming_response before the try block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix stream

* add stream to logging obj

* test(pass_through): give mock logging object a real model_call_details dict

The anthropic passthrough logging payload now records the assembled
response on model_call_details before cost calculation, which requires
model_call_details to support item assignment. In production it is always
a dict; the existing unit test stubbed the logging object with a bare Mock
whose attribute is not subscriptable, so the new assignment raised
TypeError. Use a real dict to match the production logging object.

* test(pass_through): cover streaming logging-obj stream flag

The streaming branch of pass_through_request that marks the logging object
as streaming (logging_obj.stream and model_call_details["stream"]) had no
unit coverage, so the patch coverage gate flagged it. Add a regression test
that drives a streaming pass-through request through pass_through_request and
asserts the logging object is flagged as a stream before dispatch.

* test(pass_through): cover SSE-response stream flag fallback branch

The auto-detected streaming branch of pass_through_request (when a request
that was not flagged as streaming returns a text/event-stream response) sets
logging_obj.stream and model_call_details["stream"] but had no unit coverage,
so the codecov patch gate failed at 60%. Drive a non-streaming pass-through
request whose upstream response is SSE through pass_through_request and assert
the logging object is flagged as a stream before dispatch.

* fix(pass_through): gate complete_streaming_response on stream flag

perform_redaction only scrubs complete_streaming_response when
model_call_details["stream"] is True. Setting it unconditionally for
non-streaming Anthropic pass-through responses left the assembled
response unredacted in model_call_details, which is handed to logging
callbacks as kwargs when message logging is disabled. Only record it for
actual streaming responses so redaction always applies.

---------

Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 2bbdbfa)
@mateo-berri mateo-berri marked this pull request as ready for review June 3, 2026 23:18
@mateo-berri mateo-berri requested review from a team and yassin-berriai June 3, 2026 23:18
@greptile-apps

greptile-apps Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR cherry-picks six bug fixes from litellm_internal_staging into the stable/1.86.x line and cuts version 1.86.4. Each fix targets a specific regression or missing feature: Azure AD token refresh in the v1 OpenAI client path, batch model alias resolution for key-access checks, managed video model ID resolution through the router, team-member key creation on org-scoped teams, output_config.effort stripping for Vertex Claude models that reject it, and deduplication of cost callbacks on Anthropic streaming pass-through.

  • Azure (common_utils.py): The v1 client path now receives the AD token provider as api_key (callable), preserving token rotation; a stable composite string is used as the cache key instead of serializing the callable object directly.
  • Key management (key_management_endpoints.py): Org-membership validation is skipped when organization_id is inherited from the caller's team, restoring pre-v1.84.0-rc.1 behavior for team members on org-scoped teams.
  • Anthropic pass-through (pass_through_endpoints.py, anthropic_passthrough_logging_handler.py): logging_obj.stream is set eagerly on both streaming-detection branches; complete_streaming_response is gated behind stream is True, preventing duplicate cost callbacks and inadvertent redaction bypass on non-streaming responses.

Confidence Score: 4/5

Safe to merge; all six backported fixes target specific regressions with matching tests, and no existing test coverage is weakened.

Each cherry-pick is narrow and well-scoped. The Azure client cache-key change uses id() of the token provider, which is stable during an in-process lifetime but can produce cache misses for callers that recreate the provider on each call — a minor reliability nuance rather than a correctness bug. The org-bypass in key management is secure because team membership is enforced by key_generation_check before the bypass is reached. The streaming dedup and model-resolution changes are additive and guarded by new unit tests.

litellm/llms/azure/common_utils.py — the token-provider cache-key logic uses id() and is worth a second look if callers tend to construct fresh provider objects per request.

Important Files Changed

Filename Overview
litellm/llms/azure/common_utils.py Preserves Azure AD token refresh in v1 OpenAI client path; uses id() of the callable for cache-key stability (fragile for short-lived callables but not a regression).
litellm/llms/vertex_ai/vertex_ai_partner_models/anthropic/output_params_utils.py Drops output_config.effort for Vertex Claude models that reject it (e.g. Haiku 4.5); delegates support check to AnthropicConfig._model_supports_effort_param which reads from the model-cost JSON (data-driven, not hardcoded).
litellm/proxy/management_endpoints/key_management_endpoints.py Restores pre-regression behavior: team members creating keys for an org-scoped team bypass the explicit org-membership check because team membership (already validated by key_generation_check) implies org access.
litellm/proxy/auth/auth_utils.py Threads llm_router into managed-resource-ID model extraction so video/character model IDs are resolved to their proxy alias before auth and budget checks.
litellm/proxy/pass_through_endpoints/pass_through_endpoints.py Sets logging_obj.stream = True and model_call_details["stream"] = True at both streaming-detection sites, enabling the deduplication guard in the cost callback.
litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py Guards complete_streaming_response assignment behind stream is True, preventing non-streaming responses from bypassing message redaction and stopping duplicate cost callbacks.

Reviews (1): Last reviewed commit: "chore: update uv.lock for 1.86.4" | Re-trigger Greptile

@mateo-berri mateo-berri enabled auto-merge June 3, 2026 23:25
Comment on lines +464 to +472
client_initialization_params["azure_ad_token_provider"] = (
f"provider_id={id(_ad_provider) if callable(_ad_provider) else None}"
f"|tenant_id={_lp.get('tenant_id')}"
f"|client_id={_lp.get('client_id')}"
f"|client_secret={hashlib.sha256(_client_secret.encode()).hexdigest() if isinstance(_client_secret, str) else None}"
f"|azure_username={_lp.get('azure_username')}"
f"|azure_password={hashlib.sha256(_azure_password.encode()).hexdigest() if isinstance(_azure_password, str) else None}"
f"|azure_scope={_lp.get('azure_scope')}"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 id() of the token provider as a cache-key component

id(_ad_provider) returns the CPython memory address, which is stable only while the object lives. If the caller constructs a fresh DefaultAzureCredential() (or any other provider) on every invocation, the ID changes each time, making the composite key unique on every request and rendering the client cache permanently cold for those callers. The remaining fields (tenant_id, client_id, client_secret, azure_username, azure_password, azure_scope) are stable and would be sufficient to identify a distinct credential configuration without the fragile id() component.

This is not a regression from the pre-patch behavior (a callable serialized via locals() also produced an address-based string), but the composite key could be tightened by removing provider_id=… and relying solely on the credential-config fields.

@mateo-berri mateo-berri merged commit 173cbc9 into stable/1.86.x Jun 3, 2026
62 of 75 checks passed
@mateo-berri mateo-berri deleted the litellm_cherrypick_1_86_4 branch June 3, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants