Fix/azure image edit auth header#27863
Conversation
[Infra] Promote Internal Staging to main
[Infra] Promote Internal Staging to main
[Infra] Promote internal staging to main
…arer Delegate `AzureImageEditConfig.validate_environment` to `BaseAzureLLM._base_validate_azure_environment` so the image-edit route follows the same auth resolution as every other Azure provider: - prefer the Azure-native `api-key` header when an API key is available - fall back to `Authorization: Bearer <azure_ad_token>` only for AAD auth The previous implementation unconditionally set `Authorization: Bearer <api_key>`, which is the OpenAI-direct convention and is rejected by Azure OpenAI / APIM-fronted deployments with `401 Access denied due to missing subscription key`. Adds regression tests covering api_key kwarg, litellm_params.api_key, and the AAD-token fallback path. Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
Greptile SummaryThis PR fixes a bug in
Confidence Score: 4/5The fix correctly aligns the image-edit auth path with every other Azure provider and is safe to merge; the precedence inversion for callers that supply both the positional key argument and a key inside litellm_params is worth documenting. The core change is a targeted, well-justified one-line delegation to a shared helper already proven across other Azure providers. Three new mock-only tests cover the primary auth paths. The only noteworthy subtlety is that the new merge logic silently gives litellm_params priority over the positional api_key argument when both are non-None, which is an inversion of the old behavior and could surprise callers who pass both — but no existing call site appears to do so. No files require special attention; both changed files are small and self-contained.
|
| Filename | Overview |
|---|---|
| litellm/llms/azure/image_edit/transformation.py | Replaces incorrect Bearer-token header with proper Azure api-key auth by delegating to the shared BaseAzureLLM helper; logic is consistent with other Azure providers. |
| tests/test_litellm/llms/azure/image_edit/test_azure_image_edit_transformation.py | Adds three well-isolated mock-only unit tests covering the fixed auth-header paths; no real network calls. |
Reviews (1): Last reviewed commit: "fix(azure/image_edit): use api-key heade..." | Re-trigger Greptile
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
🤖 litellm-agent: This PR is currently BLOCKED from merge. Score: 4/5 ❌ Why blocked:
Details: Score docked for: 2 unresolved reviewer concerns (greptile, cla-assistant[bot]). Fix the issues above and push an update — the bot will re-review automatically.
|
…sion test Address review feedback that the move to ``BaseAzureLLM._base_validate_azure_environment`` changed the relative priority of the positional ``api_key`` kwarg vs. ``litellm_params["api_key"]``. The new behavior — ``litellm_params["api_key"]`` wins, positional only fills in when ``litellm_params["api_key"]`` is empty — is intentional and matches every other Azure ``validate_environment``: ``AzureVideosConfig`` uses the exact same merge logic, while ``AzureVectorStoresConfig`` and ``AzureResponsesAPIConfig`` don't accept a positional ``api_key`` at all. The old ``or`` chain (positional wins) was the outlier and was part of the same OpenAI-vs-Azure convention drift that produced the original ``Authorization: Bearer`` bug. The only production caller (``llm_http_handler.image_edit``) sources both values from the same ``litellm_params.api_key``, so this change is behaviorally a no-op there. Document the precedence in the docstring and lock it in with an explicit test so future refactors can't quietly re-invert it. Co-authored-by: Cursor <cursoragent@cursor.com>
|
bugbot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 7e32c65. Configure here.
…n Bearer PR #27863 fixed Azure image edit to use the Azure-native api-key header instead of OpenAI's Authorization: Bearer convention, but did not update test_azure_image_edit_litellm_sdk to match. The test still asserted 'Authorization' in headers, which now fails since the new code routes through BaseAzureLLM._base_validate_azure_environment and emits api-key when an api_key is provided. Update the assertion to pin the correct Azure behavior: api-key header present with the resolved key, and no Authorization header.
* fix: strip Gemini thought-signature from tool_use.id in non-streaming path; example websearch config (#27873) - adapters/transformation.py: mirror the streaming path and strip the `__thought__<b64>` suffix off `tool_call.id` before building the AnthropicResponseContentBlockToolUse. Base64's `+ / =` characters violate Anthropic's `^[a-zA-Z0-9_-]+$` tool_use.id pattern, so when a conversation that flowed through Gemini is later replayed to an Anthropic-native provider (Bedrock or Anthropic API) the request 400s. - example_config_yaml/websearch_interception_config.yaml: register the interceptor under `callbacks:` not `success_callback:`. `success_callback` does not run pre-request hooks, so the tool-conversion step never fires on `/v1/messages` and the raw `web_search_20250305` tool is forwarded to Bedrock, which 400s. - adds a unit test pinning the non-streaming strip behavior and the surviving `^[a-zA-Z0-9_-]+$` shape of the resulting id. Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> * Fix/azure image edit auth header (#27863) * fix(azure/image_edit): use api-key header instead of Authorization Bearer Delegate `AzureImageEditConfig.validate_environment` to `BaseAzureLLM._base_validate_azure_environment` so the image-edit route follows the same auth resolution as every other Azure provider: - prefer the Azure-native `api-key` header when an API key is available - fall back to `Authorization: Bearer <azure_ad_token>` only for AAD auth The previous implementation unconditionally set `Authorization: Bearer <api_key>`, which is the OpenAI-direct convention and is rejected by Azure OpenAI / APIM-fronted deployments with `401 Access denied due to missing subscription key`. Adds regression tests covering api_key kwarg, litellm_params.api_key, and the AAD-token fallback path. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(azure/image_edit): pin api-key precedence semantics + add regression test Address review feedback that the move to ``BaseAzureLLM._base_validate_azure_environment`` changed the relative priority of the positional ``api_key`` kwarg vs. ``litellm_params["api_key"]``. The new behavior — ``litellm_params["api_key"]`` wins, positional only fills in when ``litellm_params["api_key"]`` is empty — is intentional and matches every other Azure ``validate_environment``: ``AzureVideosConfig`` uses the exact same merge logic, while ``AzureVectorStoresConfig`` and ``AzureResponsesAPIConfig`` don't accept a positional ``api_key`` at all. The old ``or`` chain (positional wins) was the outlier and was part of the same OpenAI-vs-Azure convention drift that produced the original ``Authorization: Bearer`` bug. The only production caller (``llm_http_handler.image_edit``) sources both values from the same ``litellm_params.api_key``, so this change is behaviorally a no-op there. Document the precedence in the docstring and lock it in with an explicit test so future refactors can't quietly re-invert it. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Adam Kirstein <adam.kirstein@disney.com> Co-authored-by: Cursor <cursoragent@cursor.com> * test(azure/image_edit): expect api-key header instead of Authorization Bearer PR #27863 fixed Azure image edit to use the Azure-native api-key header instead of OpenAI's Authorization: Bearer convention, but did not update test_azure_image_edit_litellm_sdk to match. The test still asserted 'Authorization' in headers, which now fails since the new code routes through BaseAzureLLM._base_validate_azure_environment and emits api-key when an api_key is provided. Update the assertion to pin the correct Azure behavior: api-key header present with the resolved key, and no Authorization header. --------- Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: Adam Kirstein <107421694+justalittleadam@users.noreply.github.com> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Adam Kirstein <adam.kirstein@disney.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
* fix(proxy): always merge caller-supplied tags into request metadata
Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`)
were silently dropped unless the key/team had
`metadata.allow_client_tags: true` set. Restore the documented behavior:
tags from the request always flow into `metadata.tags` and union with any
admin-configured static tags from key/team/project metadata.
Removes the `allow_client_tags` opt-in flag from the pre-call pipeline.
The flag was only ever read here; it has no schema or endpoint footprint,
so leftover values in existing key metadata are inert.
Test cleanup mirrors the simplification: drop the three tests that
verified the strip-when-not-opted-in path, drop the `allow_client_tags`
fixture lines from the merge/union tests.
* docs(proxy): refresh stale comments referencing removed tag strip
The tag-strip block was removed in the parent commit but two surrounding
comments still referenced "tags without opt-in" and "runs AFTER the
strip". Update them to describe the remaining user_api_key_* and
_pipeline_managed_guardrails strip that the snapshot/merge ordering
actually protects against.
* fix(tests): swap dall-e to gpt-image-1 after openai deprecation
DALL-E 2 and DALL-E 3 were removed from the OpenAI API on 2026-05-12,
causing e2e image-generation tests to fail with "model does not exist".
Swap all live-API DALL-E references in proxy-backed tests to gpt-image-1
and update the dall-e-2 alias in proxy_server_config.yaml to point at
openai/gpt-image-1 (preserves any historical dall-e-2 callers).
* fix(tests): drop dall-e-only test classes; route live image tests via gpt-image-1
Second wave of failures from the 2026-05-12 DALL-E shutdown:
- tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditDallE2
and tests/image_gen_tests/test_image_generation.py::TestOpenAIDalle3
are explicitly named for the deprecated models and can't pass; remove.
gpt-image-1 coverage already exists in sibling classes.
- tests/local_testing/test_router.py image gen tests use dall-e-3 only
as a routing example; swap to gpt-image-1.
- tests/local_testing/test_custom_callback_input.py image_generation
success/failure paths swapped to gpt-image-1.
* chore: reject bare str at file-input sinks to prevent local-file read (#27762)
* chore: reject bare str at file-input sinks to prevent local-file read (#27667)
Squash-merged by litellm-agent from stuxf's PR.
* fix: use os.PathLike in ocr sink and check truthy reasoningSummary for bridge
- ocr/main.py: widen Path check to os.PathLike for consistency with other sinks
- main.py: bridge condition checks truthiness of reasoning_summary, not just None
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: remove unused pathlib.Path import in ocr/main.py
---------
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix(tests): swap dall-e to gpt-image-1 after openai deprecation
DALL-E 2 and DALL-E 3 were removed from the OpenAI API on 2026-05-12,
causing e2e image-generation tests to fail with "model does not exist".
Swap all live-API DALL-E references in proxy-backed tests to gpt-image-1
and update the dall-e-2 alias in proxy_server_config.yaml to point at
openai/gpt-image-1 (preserves any historical dall-e-2 callers).
* fix(tests): drop dall-e-only test classes; route live image tests via gpt-image-1
Second wave of failures from the 2026-05-12 DALL-E shutdown:
- tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditDallE2
and tests/image_gen_tests/test_image_generation.py::TestOpenAIDalle3
are explicitly named for the deprecated models and can't pass; remove.
gpt-image-1 coverage already exists in sibling classes.
- tests/local_testing/test_router.py image gen tests use dall-e-3 only
as a routing example; swap to gpt-image-1.
- tests/local_testing/test_custom_callback_input.py image_generation
success/failure paths swapped to gpt-image-1.
* fix(proxy): always merge caller-supplied tags into request metadata
Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`)
were silently dropped unless the key/team had
`metadata.allow_client_tags: true` set. Restore the documented behavior:
tags from the request always flow into `metadata.tags` and union with any
admin-configured static tags from key/team/project metadata.
Removes the `allow_client_tags` opt-in flag from the pre-call pipeline.
The flag was only ever read here; it has no schema or endpoint footprint,
so leftover values in existing key metadata are inert.
Test cleanup mirrors the simplification: drop the three tests that
verified the strip-when-not-opted-in path, drop the `allow_client_tags`
fixture lines from the merge/union tests.
* docs(proxy): refresh stale comments referencing removed tag strip
The tag-strip block was removed in the parent commit but two surrounding
comments still referenced "tags without opt-in" and "runs AFTER the
strip". Update them to describe the remaining user_api_key_* and
_pipeline_managed_guardrails strip that the snapshot/merge ordering
actually protects against.
* feat(ui): add Vertex AI Search as vector store provider (#27790)
* feat(ui): add Vertex AI Search as vector store provider
Adds a "Vertex AI Search" entry to the provider dropdown
(custom_llm_provider=vertex_ai/search_api) with fields for project,
location (global/us/eu select), and optional collection ID. Extends
VectorStoreFieldConfig with `options` so select fields can be
data-driven instead of falling through to the embedding-model list.
* fix(ui): clarify vertex_collection_id placeholder copy
Placeholder previously displayed "default_collection" — the literal
fallback value — which invited users to type it instead of leaving the
field blank. Switch to an example placeholder and tighten the tooltip.
* Litellm key rotation bug (#27756)
* fix(proxy): resolve cache handling issues in _lookup_deprecated_key
- Updated the in-memory cache for deprecated key lookups to store a 3-tuple (active_token_id, cache_expires_at_ts, revoke_at_ts) instead of a 2-tuple, ensuring proper unpacking and backward compatibility.
- Removed duplicate cache reads and added logic to handle legacy cache entries gracefully.
- Enhanced unit tests to cover scenarios for cache hits, DB misses, and respect for revoke_at timestamps, ensuring robust handling of the grace-period key-rotation feature.
* refactor(proxy): streamline cache handling in _lookup_deprecated_key
- Simplified the cache retrieval logic by directly unpacking the 3-tuple cache entries, removing the need for backward compatibility checks for 2-tuple entries.
- Updated unit tests to ensure that pre-warmed 3-tuple cache entries are served correctly without unnecessary database lookups.
* chore(ci): add new unit test for deprecated key grace period
- Included `test_deprecated_key_grace_period.py` in the CI workflow to enhance coverage for deprecated key handling scenarios.
* fix(proxy): remove unnecessary check for revoke_at in _lookup_deprecated_key
- Eliminated the redundant check for None on revoke_at, streamlining the logic for handling deprecated keys in the cache. This change enhances the efficiency of the key lookup process.
* test(proxy): add end-to-end tests for deprecated key lookup behavior
- Introduced a new test class `TestDeprecatedKeyLookupDbE2E` to validate the behavior of deprecated key lookups against a real Prisma-backed database.
- The test ensures that old key hashes resolve correctly and that repeated lookups utilize the in-memory cache without errors.
- Cleaned up the `_lookup_deprecated_key` function by removing an unnecessary check for `revoke_at`, enhancing the efficiency of the key lookup process.
* chore(proxy): close /key/regenerate ownership-rebind + premium-gate bypass
A non-admin caller could rebind their own key's ``user_id`` via
``/key/regenerate``. ``_execute_virtual_key_regeneration`` had org/team
guards but no ``user_id`` guard, and ``prepare_key_update_data`` did not
strip the field — it survived ``model_dump(exclude_unset=True)`` into
the Prisma update. On the next request,
``_return_user_api_key_auth_obj`` resolved the rebound ``user_id``
against ``litellm_usertable`` and returned ``PROXY_ADMIN`` whenever
the target row's ``user_role`` was admin (e.g. the default
``user_id="default_user_id"`` created on first password-UI login).
``/key/update`` had the equivalent guard inline at
``_validate_update_key_data``; extract it to a shared helper
``_validate_caller_can_change_key_ownership`` and call from both
``/key/update`` and ``_execute_virtual_key_regeneration``. Future
regenerate-style endpoints inherit the guard for free.
Also tighten the premium gate that allowed the master-key rotation
branch to skip the enterprise check. The previous predicate was
``data.new_master_key is not None`` — a field-presence test, not an
identity check. Any non-premium caller could send any value in that
field and the premium check would no-op. Verify the caller actually
holds the master key via ``_is_master_key`` before allowing the
non-premium path.
Tests:
- ``test_regenerate_user_id_rebind_guard`` — parametrized table over
cross-user rebind (blocked), empty-string removal (blocked), and
same-user no-op rebind (allowed).
- ``test_regenerate_premium_gate_requires_actual_master_key`` /
``test_regenerate_premium_gate_allows_actual_master_key_holder`` —
ensure the premium check requires the caller actually present the
master key, and that legitimate master-key rotation still works.
* test(vcr): classify cache verdicts, detect live calls, surface cost leaks
Convert the per-test VCR verdict line from a single 'NOOP / HIT / MISS /
PARTIAL' tag into a classified outcome that distinguishes the cases that
silently bill the live API on every CI run from the ones that don't:
HIT pure replay
PARTIAL mixed replay + new recordings
MISS:RECORDED new cassette saved to Redis (cached next run)
MISS:OVERFLOW cassette > MAX_EPISODES_PER_CASSETTE; persister
refused to save; re-bills every run
MISS:NOT_PERSISTED test failed; save_cassette skipped; re-bills
NOOP VCR-marked but no HTTP traffic (mocked elsewhere)
UNMARKED:LIVE_CALL test bypassed VCR AND opened a TCP connection
to a known LLM provider host -> wasted spend
UNMARKED:NO_TRAFFIC test bypassed VCR but didn't call out
The UNMARKED:LIVE_CALL signal is what converts 'this test probably hits
live' into 'this test connected to api.openai.com'. We install a
socket.connect / socket.create_connection wrapper for the duration of
each non-VCR-marked test and record any outbound TCP to a known LLM
provider hostname. The probe sits below the httpx layer so vcrpy and
respx (which both patch above the socket) are unaffected.
Replace the file-level _RESPX_CONFLICTING_FILES blacklists in the
llm_translation and local_testing conftests with per-item respx
detection in apply_vcr_auto_marker_to_items. A test now skips VCR when
it actually carries @pytest.mark.respx or has respx_mock in its fixture
chain - not just because some other test in the same file imports
MockRouter. Items skipped by skip_files are split into respx_conflict
(real conflict, the module wires up respx) vs file_opt_out (dead skip-
list entry whose module never touches respx) so the session summary
makes pruning obvious.
Stabilize the AWS SigV4 fingerprint: the Authorization header on
Bedrock requests rotates its Credential date and Signature on every
call, which previously pushed every Bedrock test past the 50-episode
overflow threshold. Extract the access-key id only
('aws-sigv4:AKIA...') so two requests with the same identity match.
Always emit verdict logging when VCR is active (set
LITELLM_VCR_VERBOSE=0 to opt back into the legacy quiet mode). Add a
session-end classification summary that lists overflow tests, unmarked
live-call tests, and the skip-reason breakdown.
Wire the live-call probe + summary hook into every test directory that
already uses the Redis-backed VCR cache (audio_tests, guardrails_tests,
image_gen_tests, litellm_utils_tests, llm_responses_api_testing,
llm_translation, local_testing, logging_callback_tests, ocr_tests,
pass_through_unit_tests, router_unit_tests, search_tests,
unified_google_tests).
Add tests/llm_translation/test_vcr_classification.py covering the
verdict classifier, skip-reason tagging, AWS SigV4 fingerprint stability,
live-host classification, and session summary rendering.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(vcr): drop dead 'from respx import MockRouter' imports
These seven test files were on _RESPX_CONFLICTING_FILES, which made the
auto-marker skip them entirely. Inspecting the source shows the only
respx artifact is a top-level 'from respx import MockRouter' that no
test ever uses - no @pytest.mark.respx, no respx_mock fixture, no
respx.mock context manager. The import is dead code left over from a
previous mocking pattern.
Now that apply_vcr_auto_marker_to_items detects respx per-item via the
marker / fixture chain (b637d9f64a), the file-level skip is no longer
needed for these files - they were the reason the OpenAI tests
(test_o3_reasoning_effort, test_streaming_response[o1/o3-mini],
TestOpenAIO1::test_streaming, TestOpenAIChatCompletion::test_web_search,
TestOpenAIO3::test_web_search, etc.) ran live every CI build despite
the cassette cache being healthy.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(image_edits): regenerate fixtures per call instead of holding open module-level file handles
Module-level
TEST_IMAGES = [
open(os.path.join(pwd, 'ishaan_github.png'), 'rb'),
open(os.path.join(pwd, 'litellm_site.png'), 'rb'),
]
SINGLE_TEST_IMAGE = open(...)
opens the file once at import. After the first multipart upload, the
file pointer is at EOF, so every subsequent test in the same xdist
worker sends an empty multipart body. That non-determinism (a) blows
the recorded cassette past MAX_EPISODES_PER_CASSETTE (50) so
_RedisPersister.save_cassette refuses to save it, and (b) re-bills the
live image edit endpoint on every CI run.
Recent CI runs confirm the leak: tests/image_gen_tests/test_image_edits.py
shows six tests parking at 51-52 cassette entries
(TestOpenAIImageEditGPTImage1::test_openai_image_edit_litellm_sdk[False],
TestOpenAIImageEditDallE2::..., test_openai_image_edit_with_bytesio,
test_openai_image_edit_litellm_router, test_multiple_vs_single_image_edit[False],
test_multiple_image_edit_with_different_formats).
Replace the module-level file handles with _make_test_images() /
_make_single_test_image() factories that return fresh _RewindableImage
(BytesIO subclass) objects whose pointer always starts at 0. The image
bytes are read once at import into module-level constants
(_ISHAAN_GITHUB_BYTES, _LITELLM_SITE_BYTES), so disk I/O cost is
unchanged.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* chore(proxy): clarify ownership-rebind error message (actor vs target)
Previous wording read "User=<new_owner> is not allowed to update the
key to belong to user=<current_owner>" — easy to misread as "caller
wants to keep the key on its current owner". Reframe as
"Non-admin caller is not allowed to rebind the key from
user=<existing> to user=<incoming>" so the direction of the failed
operation is unambiguous.
Same shape preserved (HTTPException 403); only the ``detail`` string
changes. Regression test substring updated.
* fix(vcr): match real Bedrock hostnames in live-call probe
The suffix '.bedrock-runtime.amazonaws.com' never matched real Bedrock
endpoints, which use the format 'bedrock-runtime[-fips].{region}.amazonaws.com'
(region between 'bedrock-runtime' and 'amazonaws.com'). Add an explicit
host check for that pattern so Bedrock live calls are visible to the
probe, and update the unit test accordingly. Also drop the unused
'_LIVE_CALL_PROBE_INSTALLED' module variable.
* test(proxy): drop allow_client_tags opt-in gate and add credential rename cascade tests
Removes the allow_client_tags metadata check from apply_client_tag_policy_pre_auth so
x-litellm-tags headers are always merged into request metadata, matching the post-auth
behavior in add_litellm_data_to_request. Updates pre-call tests accordingly and adds a
new test suite covering cascading credential renames into model rows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(proxy): block explicit-null user_id in ownership rebind guard
``model_dump(exclude_unset=True)`` in ``prepare_key_update_data``
includes any field the caller explicitly set, even when the value is
``None``. The previous guard short-circuited on ``getattr(data,
'user_id', None) is None``, which conflated "field omitted" (safe)
with "field explicitly set to null" (writes NULL to the token row,
detaching the key from its user and bypassing user-row role
checks).
Switch the omitted-vs-set distinction to ``data.model_fields_set``;
treat explicit-null and explicit-empty-string identically as a
removal attempt, both 403-rejected for non-admin callers.
Parametrized regression adds ``explicit_null_blocked`` alongside the
existing ``rebind_blocked`` / ``empty_blocked`` / ``same_user_id_allowed``
cases.
* fix(vcr): cover full RFC1918 172.16.0.0/12 range in local prefixes
* fix(image_edits): drop _RewindableImage to prevent infinite multipart upload
The _RewindableImage(BytesIO) wrapper auto-rewound on every read after
EOF, which made the OpenAI SDK's multipart upload writer read the same
bytes forever instead of seeing EOF. Workers OOM'd / SIGKILL'd:
[gw0] node down: Not properly terminated
replacing crashed worker gw0
...
worker 'gw1' crashed while running
'tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditGPTImage1::test_openai_image_edit_litellm_sdk[False]'
The auto-rewind was added defensively for parametrized + flaky-retried
tests, but BaseLLMImageEditTest::test_openai_image_edit_litellm_sdk
already calls get_base_image_edit_call_args() once per invocation and
that helper now constructs fresh streams via _make_test_images(), so
rewinding inside the stream is unnecessary. Replace with plain BytesIO
seeded with the cached image bytes.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* chore(proxy): refuse remote-URL instance-fn loads outside config-file path
``get_instance_fn`` previously routed any ``s3://`` / ``gcs://``
value into ``_load_instance_from_remote_storage`` regardless of how
the value got there. The function ultimately calls
``spec.loader.exec_module(module)`` — Python in the proxy process. On
admin-callable endpoints that accept a ``target`` / ``custom_handler``
field from the request body (e.g. ``/config/pass_through_endpoint``,
custom-callback registration), that is a one-step admin-to-RCE
primitive: any future privilege-escalation bug becomes immediate
code execution.
The documented operator flow for remote-module loading is
``litellm_settings.callbacks: ["s3://bucket/module.instance"]`` in
``config.yaml``. That path always carries the YAML's
``config_file_path`` through to ``get_instance_fn``. Use the presence
of ``config_file_path`` as the discriminator: refuse remote URLs
when it is absent (the request-body path) unless the operator
explicitly opts back in via
``LITELLM_ALLOW_REMOTE_INSTANCE_FN_FROM_API=true``.
The three success/failure/audit-log callback-loop call sites in
``proxy_server.py:load_config`` were already running inside the
startup config-file load but had stopped threading
``config_file_path`` through. Pass it through so the documented
``s3://`` callback flow continues to work unchanged.
Tests cover: remote URL without ``config_file_path`` raises;
remote URL with the opt-in env reaches the loader; remote URL
with ``config_file_path`` passes (documented startup flow); local
dotted-name imports unaffected.
* fix(proxy): parse string metadata before pre-auth tag merge
`apply_client_tag_policy_pre_auth` overwrote string-typed metadata
with `{}` before merging header tags, dropping any tags inside. A
caller could send `metadata='{"tags":["over-budget"]}'` plus
`x-litellm-tags: within-budget` and bypass `_tag_max_budget_check`
on the body tag. Parse the string via `safe_json_loads` first so
existing tags survive the merge.
Also drop the empty `tests/test_litellm/proxy/credential_endpoints/`
directory — the cascade-rename tests it held imported a function
that was never implemented (out of scope for this PR).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(tests): thread config_file_path through s3/gcs custom-logger tests
The pre-existing s3:// / gcs:// custom-logger tests called
``get_instance_fn`` without ``config_file_path``, which means the
new runtime gate (refuse remote URLs unless invoked from a
config-file load) now raises ``ValueError`` before reaching the
mocked download paths. Each test was exercising the documented
startup config-file load scenario; pass ``config_file_path="/any/path"``
to make that intent explicit and route past the gate.
Affected: test_s3_download_success, test_gcs_download_success,
test_invalid_url_format, test_download_failure_handling,
test_file_cleanup.
* test(vcr): mark Bedrock prompt-caching cross-call tests VCR-incompatible
The pass_through prompt-caching tests
(test_prompt_caching_returns_cache_read_tokens_on_second_call,
test_prompt_caching_streaming_second_call_returns_cache_read) make a
warm-up call and then assert the *second* call sees a non-zero
cache_read_input_tokens count from the upstream's prompt-cache. VCR
replay can't model cross-call provider state — both calls match the
same cassette episode, so the second call returns the first call's
pre-warmup response and the assertion fails:
AssertionError: Expected cache_read_input_tokens > 0 on second call,
but got 0. Full usage: {'input_tokens': 4986,
'cache_creation_input_tokens': 4974, 'cache_read_input_tokens': 0}
This started biting after the AWS SigV4 fingerprint stabilization
(b637d9f64a): Bedrock requests now produce a stable per-access-key
fingerprint instead of a per-request signature, so cassettes
successfully replay where they previously always missed and re-recorded
live. Opt these tests out via skip_nodeid_suffixes so they run live and
match the existing pattern in tests/llm_translation/conftest.py
(::test_prompt_caching).
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* Fix 3 OpenTelemetry tracing bugs in proxy integration (#27757)
1. Missing litellm_request child span when proxy parent in metadata:
_get_span_context now returns (ctx, None) for the metadata-injected
proxy parent so the primary span is always emitted as a child of ctx.
Proxy span lifecycle managed by new _end_proxy_span_from_kwargs.
2. open_telemetry_logger overwrite by later handlers:
_init_otel_logger_on_litellm_proxy now uses first-registered-wins —
only assigns proxy_server.open_telemetry_logger when currently None.
3. Duplicate litellm_request success spans in streaming paths:
Added _mark_success_span_once with per-handler dedupe key stored in
kwargs metadata, suppressing the second span when both sync and async
success callbacks fire for the same request.
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: update Next.js build artifacts (2026-05-13 01:42 UTC, node v20.20.2)
* test(vcr): tighten OVERFLOW classification and switch respx detection to AST
Address two greptile P2 review concerns on PR #27795:
1. MISS:OVERFLOW was firing whenever total > MAX_EPISODES_PER_CASSETTE
regardless of cassette state. A cassette that grew past the cap
historically but this run only *replayed* (dirty=False) is
healthy — the persister never tries to save, so the cache state is
stable and the next run will replay too. Only flag OVERFLOW when
dirty=True (new episodes were recorded that the persister would
refuse to save). Add a regression test covering the
dirty=False + large-total case.
2. _module_uses_respx did substring matching on the module source,
which false-positives on comments / docstrings / string literals.
A comment like # Previously tried respx.mock but switched to
vcrpy would keep a file pinned on the opt-out list, defeating the
dead-import pruning goal of this PR. Replace the substring scan
with an ast.NodeVisitor (_RespxUsageVisitor) that only
counts:
- @pytest.mark.respx / @respx.mock decorators
- with respx.mock(): ... (sync + async) context managers
- respx.mock(...) calls outside a with/decorator
- function parameters / fixture names equal to respx_mock
Add tests for the comment / docstring / string-literal cases plus
each real-usage pattern.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(types_utils): drop opt-in env from remote-module runtime gate
The runtime gate on s3://gcs:// loading in get_instance_fn previously
allowed an opt-in via LITELLM_ALLOW_REMOTE_INSTANCE_FN_FROM_API. That
env var is admin-flippable at runtime (DB-overlay environment_variables
flow into os.environ), which defeats the gate's purpose, and it isn't
needed for the documented operator flow: config.yaml callbacks always
pass config_file_path through to the loader.
Remove the helper, raise unconditionally when config_file_path is None,
and drop the corresponding test for the opt-in branch.
* fix(proxy): thread config_file_path into pass-through and MCP-tool YAML loaders
The previous commit's gate broke two legitimate startup paths for
operators using s3://gcs:// remote module loading from their config.yaml:
- general_settings.pass_through_endpoints[].custom_handler
- mcp_tools[].handler
Both call sites called get_instance_fn without a config_file_path, so
the new gate rejected them at startup. Thread config_file_path through:
- create_pass_through_route accepts config_file_path and forwards it to
get_instance_fn. add_exact_path_route, add_subpath_route,
_register_pass_through_endpoint, and initialize_pass_through_endpoints
accept and propagate it.
- The YAML-load call site in proxy_server.load_config now passes
config_file_path; the DB-overlay call site in _update_general_settings
leaves it as the default None so the gate still fires on admin-written
s3:// values.
- MCPToolRegistry.load_tools_from_config accepts config_file_path and
threads it into get_instance_fn; _init_non_llm_configs forwards it
from load_config.
Adds two regression tests verifying that the YAML-source callers thread
the path through to get_instance_fn.
* Strip SERVER_ROOT_PATH before lazy-feature prefix match
LazyFeatureMiddleware compared the raw scope path against registered
prefixes (e.g. /policies), so requests under a server root path like
/api/v1/policies/... never matched, the feature never loaded, and the
endpoint returned 404. Strip the configured root path before matching,
normalizing trailing slashes and enforcing a component boundary so
/api does not falsely match /apiv2.
* Cache normalized SERVER_ROOT_PATH at middleware init
SERVER_ROOT_PATH is a process-startup env var. Read it once in
__init__ instead of calling get_server_root_path() + rstrip on every
request that arrives before all lazy features have loaded.
* test: replace dall-e-3 with gpt-image-1 in health check and router tests (#27813)
OpenAI returns 'The model dall-e-3 does not exist' for the test account,
breaking test_openai_img_gen_health_check and test_image_generation.
Switch to gpt-image-1, matching the existing TestOpenAIGPTImage1 pattern.
* fix(gemini): normalize response_schema on native generateContent (#27775)
* fix(gemini): normalize response_schema on native generateContent
The /v1beta/models/{model}:generateContent passthrough forwarded
generationConfig.response_schema verbatim, so schemas containing $defs,
$ref, anyOf-with-null, default, or title were rejected by Gemini even
though /chat/completions already handles them.
GoogleGenAIConfig.transform_generate_content_request now calls a new
_normalize_response_schema helper that mirrors the chat/completions
path: Gemini 2.0+ models get the schema promoted to responseJsonSchema
via _build_json_schema (preserving $defs/$ref natively), older models
keep responseSchema but the schema is flattened with
_build_vertex_schema. VertexAIGoogleGenAIConfig (which overrides the
transform entirely) calls the same helper before building the request.
* fix(gemini): preserve caller-supplied responseJsonSchema when responseSchema co-present
Previously, when both responseJsonSchema and responseSchema were present
on Gemini 2.0+, _normalize_response_schema processed responseJsonSchema
first (no-op normalization) then unconditionally promoted responseSchema
to responseJsonSchema, clobbering the caller-supplied value.
Now skip the promotion (and drop the redundant responseSchema) when the
caller already supplied responseJsonSchema.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* chore: strip restating comments from response-schema normalize
Drop the docstring on _normalize_response_schema and the two inline
comments that just restated what the surrounding code/asserts already
say. Function name + variable names carry the intent; PR description
covers the why-it-exists context.
* perf(gemini): drop redundant deepcopy on responseJsonSchema normalize
_build_json_schema is a no-op (returns its argument unchanged), so the
deepcopy + round-trip on the responseJsonSchema branch allocated a full
schema copy on every request with no observable effect. Forward the
caller's value as-is, and just move the popped responseSchema value when
promoting on Gemini 2.0+.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* style: remove unneeded comment
* fix(gemini): drop unsupported responseJsonSchema for older models
* test(gemini): add parity test between native and chat schema normalization
Per @Sameerlite review: lock the two Gemini schema-normalization paths
together. If either GoogleGenAIConfig._normalize_response_schema (native
generateContent) or VertexGeminiConfig.apply_response_schema_transformation
(/chat/completions) drifts, the parity test fails — forcing both to be
updated together.
* fix(google_genai): preserve key naming convention in _normalize_response_schema
When the input schema key is snake_case (response_schema), the promoted
JSON schema key should also be snake_case (response_json_schema) instead
of mixing in camelCase (responseJsonSchema). This matters for the Vertex
AI google_genai path which converts all keys to snake_case before
calling _normalize_response_schema.
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
* fix(vcr): aggregate worker stats on the controller so the session summary actually renders under xdist
`_session_stats` is a module-level dict mutated inside `_vcr_outcome_gate`
— which runs in each xdist worker process. The controller's
`pytest_terminal_summary` then reads its own empty `_session_stats` and
bails on `if not counts: return`, so the OVERFLOW / LIVE_CALL sections
the rest of this PR adds never make it into CI logs in the dist mode CI
actually uses.
Ship a structured `vcr_outcome` payload via `user_properties` (which
xdist round-trips) and add `aggregate_report_outcome` on the controller
to fold worker outcomes into `_session_stats`. The recording process
tags `vcr_recorded_by` with `PYTEST_XDIST_WORKER` so the controller can
tell "single-process — already counted locally" apart from "produced by
a worker — needs aggregation here", and not double-count when there's
no xdist.
Covered by 9 new unit tests in test_vcr_classification.py including the
end-to-end summary render path.
* fix(responses): register cooldowns on failure + fail fast on stale encrypted_content (#27820)
* feat(proxy): skip disable_background_health_check models on GET /health when flag set (#27716)
* feat(proxy): skip disable_background_health_check models on GET /health when flag set
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix comment
* fix greptile comments
* Fix health check fallback kwargs
* Format health endpoint
* Harden direct health check kwargs compatibility for monkeypatched perform_health_check
Replace substring-based TypeError detection with unexpected-keyword checks
and a short retry chain (full kwargs, instrumentation only, filter only,
minimal) so partial stubs work regardless of which optional kwarg fails first.
Add proxy unit tests for legacy three-arg stubs and single-kwarg variants.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix black
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(bedrock-converse): drop blank-text fallback for empty thinking blocks (#27850)
* fix(bedrock-converse): drop blank-text fallback for empty thinking blocks
Claude Code with extended thinking replays prior assistant turns that
include an empty thinking block (`thinking=""`, `signature=""`) alongside
tool_use blocks. The unsigned-reasoning fallback in
`add_thinking_blocks_to_assistant_content` was emitting
`BedrockContentBlock(text="")`, which Bedrock Converse rejects with:
"The text field in the ContentBlock object at messages.X.content.0
is blank."
Guard the fallback with a strip() check, matching the existing
empty-text guards elsewhere in `_bedrock_converse_messages_pt`.
* style: remove unneeded comments
* fix(proxy): thread config_file_path through LiteLLM_JWTAuth.custom_validate
LiteLLM_JWTAuth.__init__ calls get_instance_fn(custom_validate) without
config_file_path, so an operator who configures custom_validate:
s3://bucket/module.fn in their YAML JWT auth section would hit the
runtime gate on startup and break their deployment.
Accept config_file_path as a non-field kwarg (popped before the
invalid-keys check), thread it into get_instance_fn, and pass it from
the startup-load callsite via the existing user_config_file_path
module-level path. Admin-API JWT config writes leave the kwarg at None
and still hit the gate.
* fix(mcp): surface upstream 401 for token-forwarding MCP servers (#27847)
* fix(mcp): surface upstream 401 for token-forwarding MCP servers
For MCP servers configured with extra_headers: [Authorization], the gateway
forwards the client token directly to the upstream. When that token is rejected
(expired or invalid) the upstream returns 401, but the MCP SDK starts the SSE
stream with 200 OK before calling handlers, so the 401 can't be returned
mid-stream.
Fix: add a pre-flight httpx probe in handle_streamable_http_mcp — before the
SDK opens the session — so the gateway can still return HTTP 401 with
WWW-Authenticate: Bearer authorization_uri=<gateway-discovery-url> when the
upstream rejects the token. The probe fails-open (returns 200) on network
errors so a transient hiccup does not block valid requests.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): parallelize pre-flight auth probes and use HEAD to avoid side effects
- Extract forwarded_auth outside the pass-through server loop (was called N times for the same scope value)
- Gather all upstream auth probes concurrently with asyncio.gather instead of sequentially; eliminates N×5 s worst-case latency
- Switch probe from POST+initialize JSON-RPC body to HEAD request; HEAD carries the Authorization header so the upstream rejects invalid tokens with 401 but never allocates a session or writes an audit entry
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): use get_async_httpx_client in _probe_upstream_auth
Replaces bare httpx.AsyncClient with the project-standard
get_async_httpx_client(httpxSpecialProvider.MCP) to satisfy the
ensure_async_clients_test code coverage check and avoid the +500 ms
per-request overhead of creating a new client on every probe call.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(mcp): extract pre-flight probe into _check_passthrough_upstream_auth
Moves the parallel upstream auth probe logic out of
handle_streamable_http_mcp into a dedicated helper to satisfy
Ruff PLR0915 (Too many statements > 50).
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): gate pre-flight probes on authorized server set to prevent bypass
_check_passthrough_upstream_auth was resolving user-supplied server names
directly before authorization ran, letting any permitted LiteLLM key
trigger an upstream HEAD probe to a server it was not allowed to use.
Changes:
- Call _get_allowed_mcp_servers inside the helper so only servers the
caller's key is authorized for are probed.
- Move the call site to after toolset scoping so the auth context is
fully resolved before the probe list is built.
- Thread user_api_key_auth into the helper signature (replaces the raw
mcp_servers name list).
Co-authored-by: Cursor <cursoragent@cursor.com>
* Add async HTTP HEAD support
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): use Scope type annotation in _get_forwarded_auth_from_scope
Co-authored-by: Cursor <cursoragent@cursor.com>
* Fix MCP upstream auth probe method
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* Remove unused AsyncHTTPHandler head method
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): exclude has_client_credentials servers from pre-flight auth probe
_prepare_mcp_server_headers skips caller Authorization when the server
uses OAuth client-credentials (M2M), but the pre-flight probe was still
selecting those servers and forwarding the caller's raw token in the HEAD
request. Exclude servers with has_client_credentials from the probe list
to match the actual downstream header-preparation logic.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): propagate upstream 403 as 403, not 401 with WWW-Authenticate
Per RFC 9110, 401 means "go get new credentials." Mapping an upstream 403
to a gateway 401 causes OAuth clients to restart the authorization flow,
obtain a fresh token with identical scopes, hit 403 again, and loop
indefinitely.
401 from upstream → gateway 401 + WWW-Authenticate (re-authorize)
403 from upstream → gateway 403 (no WWW-Authenticate hint)
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): skip auth probe when Authorization may be the LiteLLM proxy key
The pre-flight upstream probe must not forward the caller's Authorization
header when it could itself be the LiteLLM proxy API key. Restrict the
probe to requests that supply x-litellm-api-key explicitly — only then is
the Authorization header unambiguously the upstream OAuth token the
caller wants forwarded.
* Fix MCP ASGI HTTPException propagation
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): use public AsyncHTTPHandler.post() in auth probe
Use AsyncHTTPHandler.post() and catch httpx.HTTPStatusError explicitly so
the 401/403 we want to surface is not silently swallowed by the broad
fail-open except Exception block. Avoids reaching into the handler's
private client attribute, which would silently regress to fail-open if
AsyncHTTPHandler is ever refactored.
* Fix MCP auth probe tests
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(mcp): add coverage for httpx.HTTPStatusError path in auth probe
AsyncHTTPHandler.post() calls raise_for_status() internally, so a real
upstream 401/403 lands as httpx.HTTPStatusError. Add a test that exercises
that specific exception path so a regression that swallows the error in
the broad fail-open except Exception would be caught.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: claude-bot <claude-bot@anthropic.com>
* fix(cost): align vertex_ai/gemini-embedding-2-preview with Vertex multimodal pricing (#27848)
* fix(cost): align vertex_ai/gemini-embedding-2-preview with Vertex multimodal pricing
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(cost): align vertex_ai/gemini-embedding-2 GA source URL with preview
Per Greptile review on #27848: GA entry referenced ai.google.dev while
the preview entry was updated to the canonical Vertex AI pricing page.
Both share identical pricing values; sync the source URL for consistency.
https://claude.ai/code/session_01W8jRwstnmduadGw8Z8egxe
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude <noreply@anthropic.com>
* feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough (#27834)
* feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough
Adds an opt-in per-server flag that lets clients (e.g. VS Code) complete
PKCE directly with an upstream OAuth2 MCP server, instead of LiteLLM
double-gating with its own API-key/SSO check. Only honored when
auth_type=oauth2 and the operator explicitly sets the flag; mixed-target
or non-oauth2 requests fail closed.
- Adds the field to Pydantic models, Prisma schema, and a migration
- New MCPRequestHandler._target_servers_delegate_auth_to_upstream gate
that runs only when no x-litellm-api-key is present, so authenticated
users still get user_id resolution + stored-credential lookup
- Anonymous callers now see delegate servers in get_allowed_mcp_servers
(scoped to delegate servers only; the upstream still enforces auth)
- mcp_management_endpoints: allow anonymous /authorize and /token for
delegate servers so VS Code can complete PKCE without a LiteLLM session
- UI toggle (shown only for oauth2) + payload/view wiring
- Tests covering: oauth2 on/off, non-oauth2 with flag, mixed targets,
no resolvable target, explicit key precedence, and 401 emission
Co-authored-by: Cursor <cursoragent@cursor.com>
* Enforce oauth2 for delegated MCP auth bypass
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): close secondary Authorization bypass for delegate servers
The delegate-auth bypass gated only on the primary `x-litellm-api-key`
header, so a LiteLLM key sent via `Authorization: Bearer sk-...` (the
secondary header) was silently dropped — skipping spend tracking and
rate limiting. Gate on the resolved litellm_api_key (which considers
both headers) so the bypass fires only when neither is present.
Also update the existing "Authorization header present" test to reflect
that an upstream OAuth token now flows through the existing oauth2
fallback (LiteLLM auth attempt → fail → anonymous), not via the
delegate branch.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Avoid duplicate MCP OAuth credential lookup
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): block delegate bypass for M2M and internal-only servers
Two security issues flagged in code review:
1. High – client_credentials (M2M) servers must not be delegatable:
LiteLLM auto-fetches the upstream token using stored credentials, so
allowing anonymous bypass would let any external caller invoke tools
authenticated as LiteLLM's service account.
Fix: check `server.has_client_credentials` in
`_target_servers_delegate_auth_to_upstream`, the anonymous
allow-list in `get_allowed_mcp_servers`, and `_mcp_oauth_user_api_key_auth`.
2. Medium – internal-only servers exposed to public internet:
The anonymous delegate allow-list was not filtering by
`available_on_public_internet`, so external callers with an upstream
OAuth token could invoke tools on servers marked internal-only.
Fix: add `available_on_public_internet` guard to the anonymous
delegate server list in `get_allowed_mcp_servers`.
Tests added for both cases.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Require public MCP delegate auth servers
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): align delegate auth path parsing with downstream routing
`_extract_target_server_names_from_path` used a naive segments-based
split while `server.py::_get_mcp_servers_in_path` uses a regex that
allows server names with one embedded slash and comma-separated lists.
With the old parser, a request to `/mcp/<delegated>/<garbage>` was
parsed as targeting `<delegated>` by the auth gate (bypassing LiteLLM
auth) while the routing layer parsed it as `<delegated>/<garbage>` —
when that name did not resolve, the request fell back to the anonymous
allow-list, which can include `allow_all_keys` servers that normally
require a LiteLLM key.
Replace the parser with the same regex logic as
`_get_mcp_servers_in_path` so auth gating sees the exact target name(s)
downstream routing sees. Add regression tests covering parser parity
and the specific extra-path-segment bypass attempt.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* fix(mcp): close header/path TOCTOU in MCP delegate auth gate
`_target_servers_delegate_auth_to_upstream` and
`_target_servers_use_oauth2` trusted the `x-mcp-servers` header when
present, but `server.py::extract_mcp_auth_context` overrides that
header with the path-derived list for `/mcp/...` routes. An attacker
could set `x-mcp-servers: <delegated>` while pointing the URL path at
a non-delegate server, flipping the auth gate without changing the
target downstream routing actually uses.
Extract a shared `_resolve_target_server_names` helper that mirrors
the downstream override (path-derived names for `/mcp/...` routes,
header value otherwise). Add regression tests covering the TOCTOU
attempt and the helper's path-vs-header precedence.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* Fix delegated MCP OAuth test mock
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): drop unreachable /{server}/mcp branch in auth path parser
`_extract_target_server_names_from_path` also matched the
``/{server_name}/mcp`` form, but the downstream parser
``_get_mcp_servers_in_path`` only handles ``/mcp/...`` — and
``dynamic_mcp_route`` in ``proxy_server`` rewrites ``/{name}/mcp``
to ``/mcp/{name}`` on the scope before the MCP handler runs. Parsing
the un-rewritten form on the auth side was therefore unreachable in
production, and contradicted the docstring's claim of mirroring the
downstream parser — exactly the kind of mismatch that risks a future
header/path TOCTOU if any new entry point skips the rewrite.
Drop the branch; the canonical ``/mcp/...`` path matches both
parsers. Update the regression test to assert the new behavior.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* Fix MCP path auth target resolution
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): require auth for refresh_token grants on delegate-auth servers
`_mcp_oauth_user_api_key_auth` gates the unauthenticated PKCE flow for
``delegate_auth_to_upstream`` servers, but the bypass applied to BOTH
``/authorize`` and ``/token`` regardless of grant type. ``mcp_token``
accepts ``grant_type=refresh_token`` as well as ``authorization_code``,
and ``exchange_token_with_server`` attaches the server's stored
``client_secret`` to whatever is forwarded upstream. An unauthenticated
caller holding a refresh token issued to that OAuth client could mint
fresh upstream access tokens through LiteLLM.
Limit the anonymous bypass on ``/token`` to ``grant_type=authorization_code``
(the only grant PKCE actually protects via ``code_verifier``); fall
through to normal LiteLLM auth for ``refresh_token`` and any other grant.
``/authorize`` continues to allow anonymous PKCE redirects.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* fix(ui): clear delegate_auth_to_upstream when switching off oauth2
The ``delegate_auth_to_upstream`` form field is rendered inside an
``isOAuth2 && (...)`` conditional, so the Form.Item unmounts when the
user changes ``auth_type`` away from ``oauth2``. The follow-up
``form.setFieldValue("delegate_auth_to_upstream", false)`` runs after
the field has already deregistered, so ``onFinish`` receives
``undefined`` and the fallback ``?? mcpServer.delegate_auth_to_upstream``
preserved the old ``true``. The flag then persisted in the database for
a non-oauth2 server and silently re-activated if ``auth_type`` was later
switched back to ``oauth2``.
In the edit payload, force the flag to ``false`` whenever
``auth_type !== oauth2``; only trust the form value (and the existing
DB fallback) when the server is actually oauth2. Backend defense-in-depth
already ignores the flag for non-oauth2 servers, but the DB state should
stay clean too.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* Fix MCP delegate auth reset on edit
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <claude@anthropic.com>
* fix(responses): preserve cache_control in Responses API -> Chat Completion transformation (#27727)
* fix(responses): preserve cache_control in Responses API -> Chat Completion transformation
cache_control injected by AnthropicCacheControlHook was silently dropped when
_transform_responses_api_content_to_chat_completion_content rebuilt content blocks
with only {type, text}. Now copies cache_control through so Anthropic prompt caching
works correctly when using client.responses.create with cache_control_injection_points.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(responses): preserve cache_control for input_image and input_file blocks
Extends the cache_control fix to image and file content blocks, which were
also silently dropping cache_control during the Responses API -> Chat Completion
transformation. Adds tests for all three content block types.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Babysitter <claude@anthropic.com>
* fix(proxy): expose db status on public /health/readiness
External readiness probes consumed the legacy detailed payload's `db`
field to drive alerting and pod-rotation decisions. Stripping the body
to `{"status": "healthy"}` broke those probes silently — the HTTP code
still flipped to 503, but probes checking `body.db == "connected"`
treated the response as healthy.
Add `db` back to the unauthenticated payload. Keep the rest of the
diagnostic fields (litellm_version, callbacks, cache, log_level) gated
behind /health/readiness/details so the recon-leak gate from #26912
holds. Values match the legacy contract: "connected", "disconnected",
"Not connected".
* docs(budget_manager): add docstring to BudgetManager.reset_cost (#27867)
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
* docs: add class docstring to _LoopWrapper (#27870)
Document the purpose of the daemon thread that backs the sync
branch of the timeout decorator.
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
* fix: Fix Redis Sentinel client handling to solve authentication error… (#26302)
* fix: Fix Redis Sentinel client handling to solve authentication error with password protected sentinel (#25625)
* fix Redis Sentinel authentication handling
* test: cover Redis Sentinel auth routing
* refactor: align Redis Sentinel kwargs threading
* fix: avoid duplicate Redis Sentinel socket timeouts
* Address review comments
* refactor(_redis): return set from _get_redis_kwargs for O(1) lookup
Align _get_redis_kwargs() with the cluster helper by returning a set
instead of a list, so the sentinel connection-kwargs filter uses O(1)
membership tests. Addresses Greptile review feedback on PR #26302.
* fix(_redis): restore Azure-specific kwargs in cluster kwargs set
The set-literal refactor of _get_redis_cluster_kwargs dropped four
LiteLLM-custom Azure keys (azure_redis_ad_token, azure_client_id,
azure_tenant_id, azure_client_secret) that the prior list form had
explicitly appended. Because they are not in RedisCluster's argspec,
they were silently stripped, breaking Azure IAM auth on cluster
clients. Re-add them to the explicit include set.
---------
Co-authored-by: Kristin Cowalcijk <kristincowalcijk@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: krrish-berri-2 <krrish-berri-2@users.noreply.github.com>
Co-authored-by: claude <claude@anthropic.com>
* Litellm agent oss staging 05 11 2026 (#27733)
* fix(ollama): Include provider in model list for ollama (#26135)
* Include provider in model names for ollama
* Fix unit tests
* fix(ollama): process both thinking and content in same streaming chunk (#26098)
* fix(health_check): skip max_tokens for image_generation mode (#26417)
* fix(health_check): skip max_tokens for image_generation mode
`_update_litellm_params_for_health_check` injected `max_tokens` for
every deployment. OpenAI `/v1/images/generations` strictly rejects
unknown fields, so health checks for dall-e-* and gpt-image-1 always
failed with `400 "Unknown parameter: 'max_tokens'"` even though the
actual image endpoint calls succeed. Skip the `max_tokens` injection
when `model_info.mode == "image_generation"`. `messages` still gets
injected (downstream `_filter_model_params` already strips it for
non-chat handlers).
* Switch to allow-list with per-deployment override
Per @krrishdholakia review: deny-listing image_generation only re-introduces
the same bug for every other non-chat mode (embedding, audio_*, rerank,
video_generation, ocr, search, moderation, ...).
Replace the single image_generation skip with `_MAX_TOKEN_SUPPORT_MODES =
{chat, completion, responses}`. Missing `mode` is treated as chat for
backward compatibility. New modes are safe by default.
Add `model_info.health_check_supports_max_tokens` as an operator escape
hatch — True forces injection on a non-listed deployment (operator wants
to bound probe tokens), False suppresses it on a chat-style deployment
behind a strict-schema provider.
Tests: parametrize over 3 chat-style + 10 non-chat modes, plus override
on/off and the no-mode legacy path.
* fix(http_handler): handle RequestNotRead in MaskedHTTPStatusError for multipart uploads (#26718)
Squash-merged by litellm-agent from dawidkulpa's PR.
* fix(ollama): guard against double 'ollama/' prefix in live model listing
Greptile flagged that Ollama servers can return names that already start
with 'ollama/'. Check the prefix before prepending so we don't produce
'ollama/ollama/...'. Adds a regression test.
* Fix Ollama empty reasoning stream chunks
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: James Myatt <james@jamesmyatt.co.uk>
Co-authored-by: VHash <225398745+vhash0@users.noreply.github.com>
Co-authored-by: hayden <sewhan.kim+@a-bly.com>
Co-authored-by: dawidkulpa <84176950+dawidkulpa@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* Ishaan - May 13th Staging LiteLLM (#27877)
* fix: strip Gemini thought-signature from tool_use.id in non-streaming path; example websearch config (#27873)
- adapters/transformation.py: mirror the streaming path and strip the
`__thought__<b64>` suffix off `tool_call.id` before building the
AnthropicResponseContentBlockToolUse. Base64's `+ / =` characters
violate Anthropic's `^[a-zA-Z0-9_-]+$` tool_use.id pattern, so when a
conversation that flowed through Gemini is later replayed to an
Anthropic-native provider (Bedrock or Anthropic API) the request 400s.
- example_config_yaml/websearch_interception_config.yaml: register the
interceptor under `callbacks:` not `success_callback:`. `success_callback`
does not run pre-request hooks, so the tool-conversion step never fires
on `/v1/messages` and the raw `web_search_20250305` tool is forwarded
to Bedrock, which 400s.
- adds a unit test pinning the non-streaming strip behavior and the
surviving `^[a-zA-Z0-9_-]+$` shape of the resulting id.
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
* Fix/azure image edit auth header (#27863)
* fix(azure/image_edit): use api-key header instead of Authorization Bearer
Delegate `AzureImageEditConfig.validate_environment` to
`BaseAzureLLM._base_validate_azure_environment` so the image-edit route
follows the same auth resolution as every other Azure provider:
- prefer the Azure-native `api-key` header when an API key is available
- fall back to `Authorization: Bearer <azure_ad_token>` only for AAD auth
The previous implementation unconditionally set
`Authorization: Bearer <api_key>`, which is the OpenAI-direct convention
and is rejected by Azure OpenAI / APIM-fronted deployments with
`401 Access denied due to missing subscription key`.
Adds regression tests covering api_key kwarg, litellm_params.api_key, and
the AAD-token fallback path.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs(azure/image_edit): pin api-key precedence semantics + add regression test
Address review feedback that the move to
``BaseAzureLLM._base_validate_azure_environment`` changed the relative
priority of the positional ``api_key`` kwarg vs. ``litellm_params["api_key"]``.
The new behavior — ``litellm_params["api_key"]`` wins, positional only fills
in when ``litellm_params["api_key"]`` is empty — is intentional and matches
every other Azure ``validate_environment``: ``AzureVideosConfig`` uses the
exact same merge logic, while ``AzureVectorStoresConfig`` and
``AzureResponsesAPIConfig`` don't accept a positional ``api_key`` at all.
The old ``or`` chain (positional wins) was the outlier and was part of the
same OpenAI-vs-Azure convention drift that produced the original
``Authorization: Bearer`` bug.
The only production caller (``llm_http_handler.image_edit``) sources both
values from the same ``litellm_params.api_key``, so this change is
behaviorally a no-op there. Document the precedence in the docstring and
lock it in with an explicit test so future refactors can't quietly
re-invert it.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Adam Kirstein <adam.kirstein@disney.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(azure/image_edit): expect api-key header instead of Authorization Bearer
PR #27863 fixed Azure image edit to use the Azure-native api-key header
instead of OpenAI's Authorization: Bearer convention, but did not update
test_azure_image_edit_litellm_sdk to match. The test still asserted
'Authorization' in headers, which now fails since the new code routes
through BaseAzureLLM._base_validate_azure_environment and emits
api-key when an api_key is provided.
Update the assertion to pin the correct Azure behavior: api-key header
present with the resolved key, and no Authorization header.
---------
Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai>
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: Adam Kirstein <107421694+justalittleadam@users.noreply.github.com>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Adam Kirstein <adam.kirstein@disney.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
* fix(fireworks_ai): strip `thinking_blocks` from chat messages before Fireworks API call (#27881)
* fix(fireworks_ai): strip thinking_blocks from chat messages before API call
Fireworks OpenAI-compatible ChatMessage schema uses additionalProperties:false
and rejects Anthropic-style messages[].thinking_blocks (e.g. Claude Code replays),
returning invalid_request_error. Remove the field in _transform_messages_helper
alongside provider_specific_fields.
Adds unit test test_transform_messages_helper_strips_thinking_blocks.
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(fireworks_ai): drop inline comments from message sanitization
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs(fireworks_ai): explain why provider_specific_fields and thinking_blocks are stripped
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: block client-side pricing injection via request body
Authenticated clients could supply CustomPricingLiteLLMParams fields
(input_cost_per_token, output_cost_per_token, etc.) in the request body.
These were forwarded to register_model() in main.py, permanently mutating
the shared global litellm.model_cost dict for all users on the instance.
Adds all CustomPricingLiteLLMParams fields to _BANNED_REQUEST_BODY_PARAMS
so is_request_body_safe() rejects them before they reach completion().
New pricing fields added to CustomPricingLiteLLMParams are auto-covered.
Admin opt-in via allow_client_side_credentials or
configurable_clientside_auth_params still works as before.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* chore(proxy): scrub remote-URL module loads from DB-overlay config
When ``ProxyConfig`` merges DB-persisted ``litellm_settings`` /
``general_settings`` on top of the YAML config, the merged dict is
later iterated by ``load_config`` which threads ``config_file_path``
(the YAML path) into ``get_instance_fn``. The runtime gate that
refuses ``s3://`` / ``gcs://`` modules when ``config_file_path`` is
``None`` therefore can't distinguish a YAML-sourced value from a
DB-sourced one: both look the same to ``get_instance_fn``.
Strip ``s3://`` / ``gcs://`` entries from the DB-overlay value for
every field whose contents reach ``get_instance_fn`` during config
load:
- litellm_settings: ``callbacks``, ``success_callback``,
``failure_callback``, ``audit_log_callbacks``, ``post_call_rules``,
``custom_provider_map[].custom_handler``
- general_settings: ``custom_auth``, ``custom_key_generate``,
``custom_key_update``, ``custom_sso``,
``custom_ui_sso_sign_in_handler``,
``litellm_jwtauth.custom_validate``
The YAML config-file load path is unchanged — the documented operator
flow (``callbacks: ["s3://bucket/module.instance"]`` in ``config.yaml``)
still works. Only DB-overlay writes (e.g. via ``/config/update``) are
stripped.
Adds 16 regression tests covering the scrub matrix.
* chore(proxy): also scrub pass_through_endpoints[].target from DB overlay
A pass-through endpoint's ``target`` field is passed through
``create_pass_through_route`` into ``get_instance_fn`` during config
load. A PROXY_ADMIN persisting ``target: "s3://attacker/m.i"`` via
the DB-overlay ``pass_through_endpoints`` write path was not covered
by the previous scrub matrix, so the remote module load would still
reach the loader because the YAML-load chain has ``config_file_path``
set.
Walk each entry in ``general_settings.pass_through_endpoints`` and
null out any ``target`` that starts with ``s3://`` or ``gcs://``. The
entry itself is preserved so the path-registration helper can choose
how to handle a missing target (the existing code skips the route
when ``target is None``).
Adds two regression tests.
* fix(prometheus): emit `litellm_remaining_tokens_metric` for Bedrock and Vertex (#27705)
* fix(prometheus): emit remaining_tokens/requests gauges for bedrock + vertex (LIT-2719)
Bedrock and Vertex AI never return x-ratelimit-remaining-* response headers,
so litellm_remaining_tokens_metric / litellm_remaining_requests_metric only
fired for OpenAI / Azure / Anthropic deployments even when tpm/rpm was
configured on the router.
Add a provider-agnostic fallback in PrometheusLogger.async_log_success_event
that asks Router.get_remaining_model_group_usage() for the same model_group
and emits the gauges with configured_limit - current_usage when the upstream
provider didn't populate the headers itself. Existing OpenAI / Azure /
Anthropic flows are unchanged because the fallback short-circuits when both
header values are already present.
Tests: 8 new tests covering bedrock + vertex emission, header short-circuit,
partial-header fill, llm_router=None, missing model_group, empty router
result, and router exception swallowing.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(prometheus): narrow except to ImportError, log router lookup failures via verbose_logger.exception
Address greptile review:
- The optional 'from litellm.proxy.proxy_server import llm_router' should
guard against ImportError specifically, not all exceptions, so that
unexpected errors (e.g. AttributeError from partially-initialized state)
stay visible.
- get_remaining_model_group_usage failures are now logged via
verbose_logger.exception (with traceback) instead of debug, matching the
PR description's intent and avoiding silent loss of router-cache errors
in production.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(prometheus): subtract in-flight delta in router-remaining fallback
The router's TPM/RPM counter is incremented by
Router.deployment_callback_on_success, which f…
Verified end-to-end through LiteLLM Proxy → Azure APIM → Azure OpenAI against
gpt-image-1.5for/v1/images/edits. Returns a real edited image (~300 KB JSON payload).Type
🐛 Bug Fix
Changes
litellm/llms/azure/image_edit/transformation.pyAzureImageEditConfig.validate_environmentnow delegates toBaseAzureLLM._base_validate_azure_environment— the canonical helper already used by every other Azure provider in the codebase (videos,vector_stores,responses,containers,assistants,batches,files,fine_tuning, …). This:api-key: <key>header when an API key is available (env vars,litellm.api_key,litellm.azure_key,litellm_params.api_key, or theapi_keykwarg).Authorization: Bearer <azure_ad_token>only when AAD auth is configured (azure_ad_token/azure_ad_token_providerinlitellm_params).Authorization: Bearer <api_key>header that was incorrect for Azure-style deployments.Why delegate instead of just renaming the header
Naively swapping
Authorization: Bearer→api-key:would fix subscription-key auth but silently break AAD/Entra ID auth, which legitimately needsAuthorization: Bearer <token>.BaseAzureLLM._base_validate_azure_environmentalready encodes the correct branching and is the source of truth for Azure auth across the rest of the provider surface — image-edit was the only outlier hand-rolling its own header logic. This change brings it back in line and means future improvements to Azure auth (e.g., new token providers) automatically apply to image-edit too.tests/test_litellm/llms/azure/image_edit/test_azure_image_edit_transformation.pyThree new regression tests:
test_validate_environment_uses_api_key_header_from_kwarg— passingapi_key=results in anapi-keyheader and noAuthorization: Bearerleak.test_validate_environment_uses_api_key_header_from_litellm_params—litellm_params={"api_key": ...}resolves to theapi-keyheader.test_validate_environment_falls_back_to_aad_bearer— when onlyazure_ad_tokenis supplied, the call falls back toAuthorization: Bearer <token>, so AAD users are unaffected.Note
Medium Risk
Changes request authentication for Azure image-edit calls, which can affect connectivity for different Azure deployments (API key vs AAD) if any edge-case header precedence differs from prior behavior.
Overview
Fixes Azure
/images/editsauthentication by replacing the image-edit-specific header logic withBaseAzureLLM._base_validate_azure_environment, so requests useapi-keywhen an API key is available and only fall back toAuthorization: Bearer <AAD token>when configured.Adds regression tests covering subscription-key header usage,
litellm_params["api_key"]precedence over the positionalapi_key, and the AAD bearer fallback path.Reviewed by Cursor Bugbot for commit 7e32c65. Bugbot is set up for automated code reviews on this repo. Configure here.