chore: reject bare str at file-input sinks to prevent local-file read by stuxf · Pull Request #27667 · BerriAI/litellm

stuxf · 2026-05-11T19:22:04Z

Type

🐛 Bug Fix
✅ Test

Changes

extract_file_data, audio_utils.process_audio_file (plus its tuple-content variant), audio_utils.calculate_request_duration, and the OCR _extract_file_metadata helpers accepted isinstance(x, (str, PathLike)) inputs and called open(x, "rb") on them. When these helpers run inside a proxy request handler the value comes from an attacker-controlled HTTP form field, so the open() is an arbitrary local file read on the proxy host — the file content is then forwarded to the upstream provider and base64-encoded into the response, giving the caller an exfiltration primitive (incl. /proc/self/environ for env-stored secrets).

Drop the str branch from the path-open union at every sink and raise ValueError with a clear migration message. pathlib.Path is preserved because it's a Python-level type HTTP form values can't fabricate. The same shape was already in place for the Black Forest Labs image-edit handler — this brings the rest of the file/audio/ocr pipeline in line.

Affected sinks (all in the litellm core, not provider-specific):

litellm/litellm_core_utils/prompt_templates/common_utils.py:extract_file_data
litellm/litellm_core_utils/audio_utils/utils.py:process_audio_file (and the tuple-content sub-branch)
litellm/litellm_core_utils/audio_utils/utils.py:calculate_request_duration
litellm/litellm_core_utils/audio_utils/utils.py:get_audio_file_content_hash (bare str now hashes the string itself rather than opening it — the hash function already had this fallback shape)
litellm/ocr/main.py:_extract_file_metadata (also used by convert_file_document_to_url_document, the path the OCR proxy endpoint takes for {"type": "file", "file": ...} documents)

Variants checked and confirmed already safe in current code (no change needed):

BFL image-edit handler — raises for non-URL strings, uses safe_get for URLs
Bedrock Nova Canvas image-edit — str branch returns base64 as-is, only PathLike opens
Bedrock and Vertex batch files transformations — str branch treats as JSONL content, only PathLike opens

Compatibility

Operators: unaffected. The proxy form parser converts multipart UploadFile into the tuple shape these helpers accept; nothing in the proxy path supplied bare strings.
SDK callers passing file=open("path", "rb") / file=("name", content) / file=Path("path") / file=b"...": unaffected. Every documented example in BerriAI/litellm-docs already uses one of these shapes.
SDK callers passing file="literal/path.csv" as a bare string: minor breaking change. Get a ValueError pointing to the fix. Migrate to file=Path("literal/path.csv") (6 extra characters) or file=open("literal/path.csv", "rb") (the OpenAI SDK convention litellm mimics).

Test Plan

New / updated tests cover bare-str rejection at every patched sink plus the positive Path() path and existing tuple/bytes paths. Existing tests that passed bare-string paths to drive open() were updated to use Path().

Commits

chore: reject bare str at file-input sinks to prevent local-file read

… Responses API - Extend responses_api_bridge_check when reasoning_effort + summary aliases (including nested extra_body) without tools - Merge summary into reasoning_effort for responses bridge; helpers in utils - Strip summary aliases in GPT-5 chat mapping when not bridged - Tests for bridge + merge behavior Co-authored-by: Cursor <cursoragent@cursor.com>

merge main

…_bridge fix(openai): route reasoningSummary for gpt-5.4+ chat without tools to Responses API

extract_file_data, audio_utils.process_audio_file (+ tuple-content variant), audio_utils.calculate_request_duration, and ocr/main.py's _extract_file_metadata all accepted ``isinstance(x, (str, PathLike))`` inputs and called ``open(x, "rb")`` on them. When these helpers run inside a proxy request handler the value comes from an attacker-controlled HTTP form field, so the open() is an arbitrary local file read on the proxy host. The file content is then forwarded to the upstream provider (OpenAI/Mistral/Bedrock/etc.) and base64-encoded into the response, giving the caller an exfiltration primitive. Drop the str branch from the path-open union at every sink and raise ValueError with a clear migration message. pathlib.Path is preserved because it's a Python-level type that HTTP form values can't fabricate. The same shape was already in place for the Black Forest Labs image-edit handler — this brings the rest of the file/audio/ocr pipeline in line. For get_audio_file_content_hash: bare str now hashes the string itself rather than opening it (the hash function's fallback path is already designed for this). Documented examples uniformly use ``file=open("path", "rb")`` already (docs/audio_transcription.md, every provider doc, OpenAI SDK convention), so the migration surface is "literally use Path() or open() if you want a path-based upload." Variants checked and confirmed already safe in current code: - BFL image_edit: raises for non-URL str; uses safe_get for URLs. - Bedrock Nova Canvas: str branch returns base64 as-is, never opens. - Bedrock/Vertex batch files transformations: str branch treats as JSONL content, only PathLike opens. Tests cover bare-str rejection at every patched sink plus the positive Path() path and existing tuple/bytes paths. Existing tests that passed bare-string paths updated to use Path().

codecov · 2026-05-11T19:25:24Z

Codecov Report

❌ Patch coverage is 75.00000% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/ocr/main.py	0.00%	3 Missing ⚠️
litellm/litellm_core_utils/audio_utils/utils.py	84.61%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-11T19:25:31Z

Greptile Summary

This PR closes an arbitrary local file read vulnerability in five file-input sinks (extract_file_data, process_audio_file, calculate_request_duration, the audio hash helper, and the OCR convert_file_document_to_url_document) by rejecting bare str inputs that could be attacker-controlled HTTP form values in proxy deployments. It also bundles a responses_api_bridge_check extension to route GPT-5 calls with AI-SDK-style reasoningSummary aliases through the Responses API and strip those aliases before non-bridged chat calls.

LFI fix: str branches that called open(x, "rb") are replaced by ValueError; pathlib.Path (a Python-level type HTTP form values can't fabricate) remains accepted as an SDK convenience.
Breaking change: SDK callers passing file="literal/path.csv" will now receive a ValueError with a migration hint; the error message directs them to Path(…) or open(…, "rb").
reasoningSummary routing: New peek_reasoning_summary_aliases / strip_reasoning_summary_aliases_from_optional_params helpers in litellm/utils.py handle camelCase and snake_case aliases in top-level params and extra_body, merging them into reasoning_effort dict before the Responses bridge.

Confidence Score: 5/5

Safe to merge; the security fix is correctly applied at all five sinks and the reasoningSummary routing additions are well-tested.

The LFI patches are mechanical (split str/PathLike branches, raise on str) and the diff confirms no new open(str_input) call is introduced. The reasoningSummary bridge logic is covered by dedicated unit tests that exercise both the bridged and non-bridged paths. The only remaining gap — one stale docstring example — does not affect runtime behavior.

The aocr docstring in litellm/ocr/main.py still shows a bare-string file path that now raises at runtime; worth a quick update before the release notes go out.

Important Files Changed

Filename	Overview
litellm/litellm_core_utils/audio_utils/utils.py	Splits bare-str and PathLike branches in process_audio_file, calculate_request_duration, and get_audio_file_content_hash to close LFI; str raises ValueError except in hash helper which silently falls back
litellm/litellm_core_utils/prompt_templates/common_utils.py	extract_file_data now raises ValueError for bare str; PathLike branch is preserved; logic is correct
litellm/ocr/main.py	convert_file_document_to_url_document rejects bare str; aocr docstring still shows the now-broken bare-str example
litellm/main.py	responses_api_bridge_check extended to route on reasoning_summary alias; strip helpers strip aliases before non-bridged GPT-5 chat calls
litellm/utils.py	Adds peek_reasoning_summary_aliases and strip_reasoning_summary_aliases_from_optional_params; correctly uses membership tests to handle falsy values
litellm/llms/openai/chat/gpt_5_transformation.py	Import style reformatting only, no logic changes
tests/test_litellm/litellm_core_utils/test_audio_utils.py	Existing file-path tests updated to use pathlib.Path; new test_process_bare_str_path_rejected added
tests/test_litellm/ocr/test_ocr_file_input.py	Existing path tests migrated to pathlib.Path; test_should_reject_bare_str_path added; coverage is thorough
tests/test_litellm/test_main.py	New bridge-check tests for reasoning_summary routing and merging; reasoning_effort dict tests pass
tests/test_litellm/llms/openai/test_gpt5_transformation.py	New tests for peek/strip helpers and chat-path alias stripping; correctly verify alias removal
tests/test_litellm/llms/github_copilot/test_github_copilot_transformation.py	Mock patching hardened to handle both class-method and module-level instance paths; no regressions
tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_common_utils.py	New TestExtractFileDataBareStr suite covers rejection, Path acceptance, bytes, and tuple inputs
tests/test_litellm/llms/vertex_ai/gemini/test_vertex_ai_gemini_transformation.py	String-path tests migrated to pathlib.Path; assertions preserved; no regressions

_{Reviews (3): Last reviewed commit: "chore: reject bare str at file-input sin..." | Re-trigger Greptile}

stuxf · 2026-05-11T19:28:23Z

@greptileai

oss-pr-review-agent-shin · 2026-05-11T19:35:31Z

🤖 litellm-agent: Squash-merged into staging branch litellm_agent_oss_staging_05_11_2026. Staging PR: #27664

Triage Summary
Gathered PR data only — the triage LLM step did not produce a valid report, so failing-check classification and prior-signal reconciliation were skipped. 249 line(s) across 7 file(s) (+185/-64).

249 lines across 7 files (+185 / -64)

Merge Confidence: 5/5 ✅ READY
Ready to ship.

All checks green. Greptile 4/5, no blocking pattern findings, no CircleCI runs (OSS-typical).

stuxf · 2026-05-11T19:53:43Z

@greptileai

…#27762) * chore: reject bare str at file-input sinks to prevent local-file read (#27667) Squash-merged by litellm-agent from stuxf's PR. * fix: use os.PathLike in ocr sink and check truthy reasoningSummary for bridge - ocr/main.py: widen Path check to os.PathLike for consistency with other sinks - main.py: bridge condition checks truthiness of reasoning_summary, not just None Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: remove unused pathlib.Path import in ocr/main.py --------- Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(proxy): always merge caller-supplied tags into request metadata Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`) were silently dropped unless the key/team had `metadata.allow_client_tags: true` set. Restore the documented behavior: tags from the request always flow into `metadata.tags` and union with any admin-configured static tags from key/team/project metadata. Removes the `allow_client_tags` opt-in flag from the pre-call pipeline. The flag was only ever read here; it has no schema or endpoint footprint, so leftover values in existing key metadata are inert. Test cleanup mirrors the simplification: drop the three tests that verified the strip-when-not-opted-in path, drop the `allow_client_tags` fixture lines from the merge/union tests. * docs(proxy): refresh stale comments referencing removed tag strip The tag-strip block was removed in the parent commit but two surrounding comments still referenced "tags without opt-in" and "runs AFTER the strip". Update them to describe the remaining user_api_key_* and _pipeline_managed_guardrails strip that the snapshot/merge ordering actually protects against. * fix(tests): swap dall-e to gpt-image-1 after openai deprecation DALL-E 2 and DALL-E 3 were removed from the OpenAI API on 2026-05-12, causing e2e image-generation tests to fail with "model does not exist". Swap all live-API DALL-E references in proxy-backed tests to gpt-image-1 and update the dall-e-2 alias in proxy_server_config.yaml to point at openai/gpt-image-1 (preserves any historical dall-e-2 callers). * fix(tests): drop dall-e-only test classes; route live image tests via gpt-image-1 Second wave of failures from the 2026-05-12 DALL-E shutdown: - tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditDallE2 and tests/image_gen_tests/test_image_generation.py::TestOpenAIDalle3 are explicitly named for the deprecated models and can't pass; remove. gpt-image-1 coverage already exists in sibling classes. - tests/local_testing/test_router.py image gen tests use dall-e-3 only as a routing example; swap to gpt-image-1. - tests/local_testing/test_custom_callback_input.py image_generation success/failure paths swapped to gpt-image-1. * chore: reject bare str at file-input sinks to prevent local-file read (#27762) * chore: reject bare str at file-input sinks to prevent local-file read (#27667) Squash-merged by litellm-agent from stuxf's PR. * fix: use os.PathLike in ocr sink and check truthy reasoningSummary for bridge - ocr/main.py: widen Path check to os.PathLike for consistency with other sinks - main.py: bridge condition checks truthiness of reasoning_summary, not just None Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: remove unused pathlib.Path import in ocr/main.py --------- Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(tests): swap dall-e to gpt-image-1 after openai deprecation DALL-E 2 and DALL-E 3 were removed from the OpenAI API on 2026-05-12, causing e2e image-generation tests to fail with "model does not exist". Swap all live-API DALL-E references in proxy-backed tests to gpt-image-1 and update the dall-e-2 alias in proxy_server_config.yaml to point at openai/gpt-image-1 (preserves any historical dall-e-2 callers). * fix(tests): drop dall-e-only test classes; route live image tests via gpt-image-1 Second wave of failures from the 2026-05-12 DALL-E shutdown: - tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditDallE2 and tests/image_gen_tests/test_image_generation.py::TestOpenAIDalle3 are explicitly named for the deprecated models and can't pass; remove. gpt-image-1 coverage already exists in sibling classes. - tests/local_testing/test_router.py image gen tests use dall-e-3 only as a routing example; swap to gpt-image-1. - tests/local_testing/test_custom_callback_input.py image_generation success/failure paths swapped to gpt-image-1. * fix(proxy): always merge caller-supplied tags into request metadata Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`) were silently dropped unless the key/team had `metadata.allow_client_tags: true` set. Restore the documented behavior: tags from the request always flow into `metadata.tags` and union with any admin-configured static tags from key/team/project metadata. Removes the `allow_client_tags` opt-in flag from the pre-call pipeline. The flag was only ever read here; it has no schema or endpoint footprint, so leftover values in existing key metadata are inert. Test cleanup mirrors the simplification: drop the three tests that verified the strip-when-not-opted-in path, drop the `allow_client_tags` fixture lines from the merge/union tests. * docs(proxy): refresh stale comments referencing removed tag strip The tag-strip block was removed in the parent commit but two surrounding comments still referenced "tags without opt-in" and "runs AFTER the strip". Update them to describe the remaining user_api_key_* and _pipeline_managed_guardrails strip that the snapshot/merge ordering actually protects against. * feat(ui): add Vertex AI Search as vector store provider (#27790) * feat(ui): add Vertex AI Search as vector store provider Adds a "Vertex AI Search" entry to the provider dropdown (custom_llm_provider=vertex_ai/search_api) with fields for project, location (global/us/eu select), and optional collection ID. Extends VectorStoreFieldConfig with `options` so select fields can be data-driven instead of falling through to the embedding-model list. * fix(ui): clarify vertex_collection_id placeholder copy Placeholder previously displayed "default_collection" — the literal fallback value — which invited users to type it instead of leaving the field blank. Switch to an example placeholder and tighten the tooltip. * Litellm key rotation bug (#27756) * fix(proxy): resolve cache handling issues in _lookup_deprecated_key - Updated the in-memory cache for deprecated key lookups to store a 3-tuple (active_token_id, cache_expires_at_ts, revoke_at_ts) instead of a 2-tuple, ensuring proper unpacking and backward compatibility. - Removed duplicate cache reads and added logic to handle legacy cache entries gracefully. - Enhanced unit tests to cover scenarios for cache hits, DB misses, and respect for revoke_at timestamps, ensuring robust handling of the grace-period key-rotation feature. * refactor(proxy): streamline cache handling in _lookup_deprecated_key - Simplified the cache retrieval logic by directly unpacking the 3-tuple cache entries, removing the need for backward compatibility checks for 2-tuple entries. - Updated unit tests to ensure that pre-warmed 3-tuple cache entries are served correctly without unnecessary database lookups. * chore(ci): add new unit test for deprecated key grace period - Included `test_deprecated_key_grace_period.py` in the CI workflow to enhance coverage for deprecated key handling scenarios. * fix(proxy): remove unnecessary check for revoke_at in _lookup_deprecated_key - Eliminated the redundant check for None on revoke_at, streamlining the logic for handling deprecated keys in the cache. This change enhances the efficiency of the key lookup process. * test(proxy): add end-to-end tests for deprecated key lookup behavior - Introduced a new test class `TestDeprecatedKeyLookupDbE2E` to validate the behavior of deprecated key lookups against a real Prisma-backed database. - The test ensures that old key hashes resolve correctly and that repeated lookups utilize the in-memory cache without errors. - Cleaned up the `_lookup_deprecated_key` function by removing an unnecessary check for `revoke_at`, enhancing the efficiency of the key lookup process. * chore(proxy): close /key/regenerate ownership-rebind + premium-gate bypass A non-admin caller could rebind their own key's ``user_id`` via ``/key/regenerate``. ``_execute_virtual_key_regeneration`` had org/team guards but no ``user_id`` guard, and ``prepare_key_update_data`` did not strip the field — it survived ``model_dump(exclude_unset=True)`` into the Prisma update. On the next request, ``_return_user_api_key_auth_obj`` resolved the rebound ``user_id`` against ``litellm_usertable`` and returned ``PROXY_ADMIN`` whenever the target row's ``user_role`` was admin (e.g. the default ``user_id="default_user_id"`` created on first password-UI login). ``/key/update`` had the equivalent guard inline at ``_validate_update_key_data``; extract it to a shared helper ``_validate_caller_can_change_key_ownership`` and call from both ``/key/update`` and ``_execute_virtual_key_regeneration``. Future regenerate-style endpoints inherit the guard for free. Also tighten the premium gate that allowed the master-key rotation branch to skip the enterprise check. The previous predicate was ``data.new_master_key is not None`` — a field-presence test, not an identity check. Any non-premium caller could send any value in that field and the premium check would no-op. Verify the caller actually holds the master key via ``_is_master_key`` before allowing the non-premium path. Tests: - ``test_regenerate_user_id_rebind_guard`` — parametrized table over cross-user rebind (blocked), empty-string removal (blocked), and same-user no-op rebind (allowed). - ``test_regenerate_premium_gate_requires_actual_master_key`` / ``test_regenerate_premium_gate_allows_actual_master_key_holder`` — ensure the premium check requires the caller actually present the master key, and that legitimate master-key rotation still works. * test(vcr): classify cache verdicts, detect live calls, surface cost leaks Convert the per-test VCR verdict line from a single 'NOOP / HIT / MISS / PARTIAL' tag into a classified outcome that distinguishes the cases that silently bill the live API on every CI run from the ones that don't: HIT pure replay PARTIAL mixed replay + new recordings MISS:RECORDED new cassette saved to Redis (cached next run) MISS:OVERFLOW cassette > MAX_EPISODES_PER_CASSETTE; persister refused to save; re-bills every run MISS:NOT_PERSISTED test failed; save_cassette skipped; re-bills NOOP VCR-marked but no HTTP traffic (mocked elsewhere) UNMARKED:LIVE_CALL test bypassed VCR AND opened a TCP connection to a known LLM provider host -> wasted spend UNMARKED:NO_TRAFFIC test bypassed VCR but didn't call out The UNMARKED:LIVE_CALL signal is what converts 'this test probably hits live' into 'this test connected to api.openai.com'. We install a socket.connect / socket.create_connection wrapper for the duration of each non-VCR-marked test and record any outbound TCP to a known LLM provider hostname. The probe sits below the httpx layer so vcrpy and respx (which both patch above the socket) are unaffected. Replace the file-level _RESPX_CONFLICTING_FILES blacklists in the llm_translation and local_testing conftests with per-item respx detection in apply_vcr_auto_marker_to_items. A test now skips VCR when it actually carries @pytest.mark.respx or has respx_mock in its fixture chain - not just because some other test in the same file imports MockRouter. Items skipped by skip_files are split into respx_conflict (real conflict, the module wires up respx) vs file_opt_out (dead skip- list entry whose module never touches respx) so the session summary makes pruning obvious. Stabilize the AWS SigV4 fingerprint: the Authorization header on Bedrock requests rotates its Credential date and Signature on every call, which previously pushed every Bedrock test past the 50-episode overflow threshold. Extract the access-key id only ('aws-sigv4:AKIA...') so two requests with the same identity match. Always emit verdict logging when VCR is active (set LITELLM_VCR_VERBOSE=0 to opt back into the legacy quiet mode). Add a session-end classification summary that lists overflow tests, unmarked live-call tests, and the skip-reason breakdown. Wire the live-call probe + summary hook into every test directory that already uses the Redis-backed VCR cache (audio_tests, guardrails_tests, image_gen_tests, litellm_utils_tests, llm_responses_api_testing, llm_translation, local_testing, logging_callback_tests, ocr_tests, pass_through_unit_tests, router_unit_tests, search_tests, unified_google_tests). Add tests/llm_translation/test_vcr_classification.py covering the verdict classifier, skip-reason tagging, AWS SigV4 fingerprint stability, live-host classification, and session summary rendering. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): drop dead 'from respx import MockRouter' imports These seven test files were on _RESPX_CONFLICTING_FILES, which made the auto-marker skip them entirely. Inspecting the source shows the only respx artifact is a top-level 'from respx import MockRouter' that no test ever uses - no @pytest.mark.respx, no respx_mock fixture, no respx.mock context manager. The import is dead code left over from a previous mocking pattern. Now that apply_vcr_auto_marker_to_items detects respx per-item via the marker / fixture chain (b637d9f64a), the file-level skip is no longer needed for these files - they were the reason the OpenAI tests (test_o3_reasoning_effort, test_streaming_response[o1/o3-mini], TestOpenAIO1::test_streaming, TestOpenAIChatCompletion::test_web_search, TestOpenAIO3::test_web_search, etc.) ran live every CI build despite the cassette cache being healthy. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(image_edits): regenerate fixtures per call instead of holding open module-level file handles Module-level TEST_IMAGES = [ open(os.path.join(pwd, 'ishaan_github.png'), 'rb'), open(os.path.join(pwd, 'litellm_site.png'), 'rb'), ] SINGLE_TEST_IMAGE = open(...) opens the file once at import. After the first multipart upload, the file pointer is at EOF, so every subsequent test in the same xdist worker sends an empty multipart body. That non-determinism (a) blows the recorded cassette past MAX_EPISODES_PER_CASSETTE (50) so _RedisPersister.save_cassette refuses to save it, and (b) re-bills the live image edit endpoint on every CI run. Recent CI runs confirm the leak: tests/image_gen_tests/test_image_edits.py shows six tests parking at 51-52 cassette entries (TestOpenAIImageEditGPTImage1::test_openai_image_edit_litellm_sdk[False], TestOpenAIImageEditDallE2::..., test_openai_image_edit_with_bytesio, test_openai_image_edit_litellm_router, test_multiple_vs_single_image_edit[False], test_multiple_image_edit_with_different_formats). Replace the module-level file handles with _make_test_images() / _make_single_test_image() factories that return fresh _RewindableImage (BytesIO subclass) objects whose pointer always starts at 0. The image bytes are read once at import into module-level constants (_ISHAAN_GITHUB_BYTES, _LITELLM_SITE_BYTES), so disk I/O cost is unchanged. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * chore(proxy): clarify ownership-rebind error message (actor vs target) Previous wording read "User=<new_owner> is not allowed to update the key to belong to user=<current_owner>" — easy to misread as "caller wants to keep the key on its current owner". Reframe as "Non-admin caller is not allowed to rebind the key from user=<existing> to user=<incoming>" so the direction of the failed operation is unambiguous. Same shape preserved (HTTPException 403); only the ``detail`` string changes. Regression test substring updated. * fix(vcr): match real Bedrock hostnames in live-call probe The suffix '.bedrock-runtime.amazonaws.com' never matched real Bedrock endpoints, which use the format 'bedrock-runtime[-fips].{region}.amazonaws.com' (region between 'bedrock-runtime' and 'amazonaws.com'). Add an explicit host check for that pattern so Bedrock live calls are visible to the probe, and update the unit test accordingly. Also drop the unused '_LIVE_CALL_PROBE_INSTALLED' module variable. * test(proxy): drop allow_client_tags opt-in gate and add credential rename cascade tests Removes the allow_client_tags metadata check from apply_client_tag_policy_pre_auth so x-litellm-tags headers are always merged into request metadata, matching the post-auth behavior in add_litellm_data_to_request. Updates pre-call tests accordingly and adds a new test suite covering cascading credential renames into model rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(proxy): block explicit-null user_id in ownership rebind guard ``model_dump(exclude_unset=True)`` in ``prepare_key_update_data`` includes any field the caller explicitly set, even when the value is ``None``. The previous guard short-circuited on ``getattr(data, 'user_id', None) is None``, which conflated "field omitted" (safe) with "field explicitly set to null" (writes NULL to the token row, detaching the key from its user and bypassing user-row role checks). Switch the omitted-vs-set distinction to ``data.model_fields_set``; treat explicit-null and explicit-empty-string identically as a removal attempt, both 403-rejected for non-admin callers. Parametrized regression adds ``explicit_null_blocked`` alongside the existing ``rebind_blocked`` / ``empty_blocked`` / ``same_user_id_allowed`` cases. * fix(vcr): cover full RFC1918 172.16.0.0/12 range in local prefixes * fix(image_edits): drop _RewindableImage to prevent infinite multipart upload The _RewindableImage(BytesIO) wrapper auto-rewound on every read after EOF, which made the OpenAI SDK's multipart upload writer read the same bytes forever instead of seeing EOF. Workers OOM'd / SIGKILL'd: [gw0] node down: Not properly terminated replacing crashed worker gw0 ... worker 'gw1' crashed while running 'tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditGPTImage1::test_openai_image_edit_litellm_sdk[False]' The auto-rewind was added defensively for parametrized + flaky-retried tests, but BaseLLMImageEditTest::test_openai_image_edit_litellm_sdk already calls get_base_image_edit_call_args() once per invocation and that helper now constructs fresh streams via _make_test_images(), so rewinding inside the stream is unnecessary. Replace with plain BytesIO seeded with the cached image bytes. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * chore(proxy): refuse remote-URL instance-fn loads outside config-file path ``get_instance_fn`` previously routed any ``s3://`` / ``gcs://`` value into ``_load_instance_from_remote_storage`` regardless of how the value got there. The function ultimately calls ``spec.loader.exec_module(module)`` — Python in the proxy process. On admin-callable endpoints that accept a ``target`` / ``custom_handler`` field from the request body (e.g. ``/config/pass_through_endpoint``, custom-callback registration), that is a one-step admin-to-RCE primitive: any future privilege-escalation bug becomes immediate code execution. The documented operator flow for remote-module loading is ``litellm_settings.callbacks: ["s3://bucket/module.instance"]`` in ``config.yaml``. That path always carries the YAML's ``config_file_path`` through to ``get_instance_fn``. Use the presence of ``config_file_path`` as the discriminator: refuse remote URLs when it is absent (the request-body path) unless the operator explicitly opts back in via ``LITELLM_ALLOW_REMOTE_INSTANCE_FN_FROM_API=true``. The three success/failure/audit-log callback-loop call sites in ``proxy_server.py:load_config`` were already running inside the startup config-file load but had stopped threading ``config_file_path`` through. Pass it through so the documented ``s3://`` callback flow continues to work unchanged. Tests cover: remote URL without ``config_file_path`` raises; remote URL with the opt-in env reaches the loader; remote URL with ``config_file_path`` passes (documented startup flow); local dotted-name imports unaffected. * fix(proxy): parse string metadata before pre-auth tag merge `apply_client_tag_policy_pre_auth` overwrote string-typed metadata with `{}` before merging header tags, dropping any tags inside. A caller could send `metadata='{"tags":["over-budget"]}'` plus `x-litellm-tags: within-budget` and bypass `_tag_max_budget_check` on the body tag. Parse the string via `safe_json_loads` first so existing tags survive the merge. Also drop the empty `tests/test_litellm/proxy/credential_endpoints/` directory — the cascade-rename tests it held imported a function that was never implemented (out of scope for this PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(tests): thread config_file_path through s3/gcs custom-logger tests The pre-existing s3:// / gcs:// custom-logger tests called ``get_instance_fn`` without ``config_file_path``, which means the new runtime gate (refuse remote URLs unless invoked from a config-file load) now raises ``ValueError`` before reaching the mocked download paths. Each test was exercising the documented startup config-file load scenario; pass ``config_file_path="/any/path"`` to make that intent explicit and route past the gate. Affected: test_s3_download_success, test_gcs_download_success, test_invalid_url_format, test_download_failure_handling, test_file_cleanup. * test(vcr): mark Bedrock prompt-caching cross-call tests VCR-incompatible The pass_through prompt-caching tests (test_prompt_caching_returns_cache_read_tokens_on_second_call, test_prompt_caching_streaming_second_call_returns_cache_read) make a warm-up call and then assert the *second* call sees a non-zero cache_read_input_tokens count from the upstream's prompt-cache. VCR replay can't model cross-call provider state — both calls match the same cassette episode, so the second call returns the first call's pre-warmup response and the assertion fails: AssertionError: Expected cache_read_input_tokens > 0 on second call, but got 0. Full usage: {'input_tokens': 4986, 'cache_creation_input_tokens': 4974, 'cache_read_input_tokens': 0} This started biting after the AWS SigV4 fingerprint stabilization (b637d9f64a): Bedrock requests now produce a stable per-access-key fingerprint instead of a per-request signature, so cassettes successfully replay where they previously always missed and re-recorded live. Opt these tests out via skip_nodeid_suffixes so they run live and match the existing pattern in tests/llm_translation/conftest.py (::test_prompt_caching). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * Fix 3 OpenTelemetry tracing bugs in proxy integration (#27757) 1. Missing litellm_request child span when proxy parent in metadata: _get_span_context now returns (ctx, None) for the metadata-injected proxy parent so the primary span is always emitted as a child of ctx. Proxy span lifecycle managed by new _end_proxy_span_from_kwargs. 2. open_telemetry_logger overwrite by later handlers: _init_otel_logger_on_litellm_proxy now uses first-registered-wins — only assigns proxy_server.open_telemetry_logger when currently None. 3. Duplicate litellm_request success spans in streaming paths: Added _mark_success_span_once with per-handler dedupe key stored in kwargs metadata, suppressing the second span when both sync and async success callbacks fire for the same request. Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: update Next.js build artifacts (2026-05-13 01:42 UTC, node v20.20.2) * test(vcr): tighten OVERFLOW classification and switch respx detection to AST Address two greptile P2 review concerns on PR #27795: 1. MISS:OVERFLOW was firing whenever total > MAX_EPISODES_PER_CASSETTE regardless of cassette state. A cassette that grew past the cap historically but this run only *replayed* (dirty=False) is healthy — the persister never tries to save, so the cache state is stable and the next run will replay too. Only flag OVERFLOW when dirty=True (new episodes were recorded that the persister would refuse to save). Add a regression test covering the dirty=False + large-total case. 2. _module_uses_respx did substring matching on the module source, which false-positives on comments / docstrings / string literals. A comment like # Previously tried respx.mock but switched to vcrpy would keep a file pinned on the opt-out list, defeating the dead-import pruning goal of this PR. Replace the substring scan with an ast.NodeVisitor (_RespxUsageVisitor) that only counts: - @pytest.mark.respx / @respx.mock decorators - with respx.mock(): ... (sync + async) context managers - respx.mock(...) calls outside a with/decorator - function parameters / fixture names equal to respx_mock Add tests for the comment / docstring / string-literal cases plus each real-usage pattern. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(types_utils): drop opt-in env from remote-module runtime gate The runtime gate on s3://gcs:// loading in get_instance_fn previously allowed an opt-in via LITELLM_ALLOW_REMOTE_INSTANCE_FN_FROM_API. That env var is admin-flippable at runtime (DB-overlay environment_variables flow into os.environ), which defeats the gate's purpose, and it isn't needed for the documented operator flow: config.yaml callbacks always pass config_file_path through to the loader. Remove the helper, raise unconditionally when config_file_path is None, and drop the corresponding test for the opt-in branch. * fix(proxy): thread config_file_path into pass-through and MCP-tool YAML loaders The previous commit's gate broke two legitimate startup paths for operators using s3://gcs:// remote module loading from their config.yaml: - general_settings.pass_through_endpoints[].custom_handler - mcp_tools[].handler Both call sites called get_instance_fn without a config_file_path, so the new gate rejected them at startup. Thread config_file_path through: - create_pass_through_route accepts config_file_path and forwards it to get_instance_fn. add_exact_path_route, add_subpath_route, _register_pass_through_endpoint, and initialize_pass_through_endpoints accept and propagate it. - The YAML-load call site in proxy_server.load_config now passes config_file_path; the DB-overlay call site in _update_general_settings leaves it as the default None so the gate still fires on admin-written s3:// values. - MCPToolRegistry.load_tools_from_config accepts config_file_path and threads it into get_instance_fn; _init_non_llm_configs forwards it from load_config. Adds two regression tests verifying that the YAML-source callers thread the path through to get_instance_fn. * Strip SERVER_ROOT_PATH before lazy-feature prefix match LazyFeatureMiddleware compared the raw scope path against registered prefixes (e.g. /policies), so requests under a server root path like /api/v1/policies/... never matched, the feature never loaded, and the endpoint returned 404. Strip the configured root path before matching, normalizing trailing slashes and enforcing a component boundary so /api does not falsely match /apiv2. * Cache normalized SERVER_ROOT_PATH at middleware init SERVER_ROOT_PATH is a process-startup env var. Read it once in __init__ instead of calling get_server_root_path() + rstrip on every request that arrives before all lazy features have loaded. * test: replace dall-e-3 with gpt-image-1 in health check and router tests (#27813) OpenAI returns 'The model dall-e-3 does not exist' for the test account, breaking test_openai_img_gen_health_check and test_image_generation. Switch to gpt-image-1, matching the existing TestOpenAIGPTImage1 pattern. * fix(gemini): normalize response_schema on native generateContent (#27775) * fix(gemini): normalize response_schema on native generateContent The /v1beta/models/{model}:generateContent passthrough forwarded generationConfig.response_schema verbatim, so schemas containing $defs, $ref, anyOf-with-null, default, or title were rejected by Gemini even though /chat/completions already handles them. GoogleGenAIConfig.transform_generate_content_request now calls a new _normalize_response_schema helper that mirrors the chat/completions path: Gemini 2.0+ models get the schema promoted to responseJsonSchema via _build_json_schema (preserving $defs/$ref natively), older models keep responseSchema but the schema is flattened with _build_vertex_schema. VertexAIGoogleGenAIConfig (which overrides the transform entirely) calls the same helper before building the request. * fix(gemini): preserve caller-supplied responseJsonSchema when responseSchema co-present Previously, when both responseJsonSchema and responseSchema were present on Gemini 2.0+, _normalize_response_schema processed responseJsonSchema first (no-op normalization) then unconditionally promoted responseSchema to responseJsonSchema, clobbering the caller-supplied value. Now skip the promotion (and drop the redundant responseSchema) when the caller already supplied responseJsonSchema. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * chore: strip restating comments from response-schema normalize Drop the docstring on _normalize_response_schema and the two inline comments that just restated what the surrounding code/asserts already say. Function name + variable names carry the intent; PR description covers the why-it-exists context. * perf(gemini): drop redundant deepcopy on responseJsonSchema normalize _build_json_schema is a no-op (returns its argument unchanged), so the deepcopy + round-trip on the responseJsonSchema branch allocated a full schema copy on every request with no observable effect. Forward the caller's value as-is, and just move the popped responseSchema value when promoting on Gemini 2.0+. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * style: remove unneeded comment * fix(gemini): drop unsupported responseJsonSchema for older models * test(gemini): add parity test between native and chat schema normalization Per @Sameerlite review: lock the two Gemini schema-normalization paths together. If either GoogleGenAIConfig._normalize_response_schema (native generateContent) or VertexGeminiConfig.apply_response_schema_transformation (/chat/completions) drifts, the parity test fails — forcing both to be updated together. * fix(google_genai): preserve key naming convention in _normalize_response_schema When the input schema key is snake_case (response_schema), the promoted JSON schema key should also be snake_case (response_json_schema) instead of mixing in camelCase (responseJsonSchema). This matters for the Vertex AI google_genai path which converts all keys to snake_case before calling _normalize_response_schema. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> * fix(vcr): aggregate worker stats on the controller so the session summary actually renders under xdist `_session_stats` is a module-level dict mutated inside `_vcr_outcome_gate` — which runs in each xdist worker process. The controller's `pytest_terminal_summary` then reads its own empty `_session_stats` and bails on `if not counts: return`, so the OVERFLOW / LIVE_CALL sections the rest of this PR adds never make it into CI logs in the dist mode CI actually uses. Ship a structured `vcr_outcome` payload via `user_properties` (which xdist round-trips) and add `aggregate_report_outcome` on the controller to fold worker outcomes into `_session_stats`. The recording process tags `vcr_recorded_by` with `PYTEST_XDIST_WORKER` so the controller can tell "single-process — already counted locally" apart from "produced by a worker — needs aggregation here", and not double-count when there's no xdist. Covered by 9 new unit tests in test_vcr_classification.py including the end-to-end summary render path. * fix(responses): register cooldowns on failure + fail fast on stale encrypted_content (#27820) * feat(proxy): skip disable_background_health_check models on GET /health when flag set (#27716) * feat(proxy): skip disable_background_health_check models on GET /health when flag set Co-authored-by: Cursor <cursoragent@cursor.com> * fix comment * fix greptile comments * Fix health check fallback kwargs * Format health endpoint * Harden direct health check kwargs compatibility for monkeypatched perform_health_check Replace substring-based TypeError detection with unexpected-keyword checks and a short retry chain (full kwargs, instrumentation only, filter only, minimal) so partial stubs work regardless of which optional kwarg fails first. Add proxy unit tests for legacy three-arg stubs and single-kwarg variants. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix black --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix(bedrock-converse): drop blank-text fallback for empty thinking blocks (#27850) * fix(bedrock-converse): drop blank-text fallback for empty thinking blocks Claude Code with extended thinking replays prior assistant turns that include an empty thinking block (`thinking=""`, `signature=""`) alongside tool_use blocks. The unsigned-reasoning fallback in `add_thinking_blocks_to_assistant_content` was emitting `BedrockContentBlock(text="")`, which Bedrock Converse rejects with: "The text field in the ContentBlock object at messages.X.content.0 is blank." Guard the fallback with a strip() check, matching the existing empty-text guards elsewhere in `_bedrock_converse_messages_pt`. * style: remove unneeded comments * fix(proxy): thread config_file_path through LiteLLM_JWTAuth.custom_validate LiteLLM_JWTAuth.__init__ calls get_instance_fn(custom_validate) without config_file_path, so an operator who configures custom_validate: s3://bucket/module.fn in their YAML JWT auth section would hit the runtime gate on startup and break their deployment. Accept config_file_path as a non-field kwarg (popped before the invalid-keys check), thread it into get_instance_fn, and pass it from the startup-load callsite via the existing user_config_file_path module-level path. Admin-API JWT config writes leave the kwarg at None and still hit the gate. * fix(mcp): surface upstream 401 for token-forwarding MCP servers (#27847) * fix(mcp): surface upstream 401 for token-forwarding MCP servers For MCP servers configured with extra_headers: [Authorization], the gateway forwards the client token directly to the upstream. When that token is rejected (expired or invalid) the upstream returns 401, but the MCP SDK starts the SSE stream with 200 OK before calling handlers, so the 401 can't be returned mid-stream. Fix: add a pre-flight httpx probe in handle_streamable_http_mcp — before the SDK opens the session — so the gateway can still return HTTP 401 with WWW-Authenticate: Bearer authorization_uri=<gateway-discovery-url> when the upstream rejects the token. The probe fails-open (returns 200) on network errors so a transient hiccup does not block valid requests. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): parallelize pre-flight auth probes and use HEAD to avoid side effects - Extract forwarded_auth outside the pass-through server loop (was called N times for the same scope value) - Gather all upstream auth probes concurrently with asyncio.gather instead of sequentially; eliminates N×5 s worst-case latency - Switch probe from POST+initialize JSON-RPC body to HEAD request; HEAD carries the Authorization header so the upstream rejects invalid tokens with 401 but never allocates a session or writes an audit entry Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): use get_async_httpx_client in _probe_upstream_auth Replaces bare httpx.AsyncClient with the project-standard get_async_httpx_client(httpxSpecialProvider.MCP) to satisfy the ensure_async_clients_test code coverage check and avoid the +500 ms per-request overhead of creating a new client on every probe call. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(mcp): extract pre-flight probe into _check_passthrough_upstream_auth Moves the parallel upstream auth probe logic out of handle_streamable_http_mcp into a dedicated helper to satisfy Ruff PLR0915 (Too many statements > 50). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): gate pre-flight probes on authorized server set to prevent bypass _check_passthrough_upstream_auth was resolving user-supplied server names directly before authorization ran, letting any permitted LiteLLM key trigger an upstream HEAD probe to a server it was not allowed to use. Changes: - Call _get_allowed_mcp_servers inside the helper so only servers the caller's key is authorized for are probed. - Move the call site to after toolset scoping so the auth context is fully resolved before the probe list is built. - Thread user_api_key_auth into the helper signature (replaces the raw mcp_servers name list). Co-authored-by: Cursor <cursoragent@cursor.com> * Add async HTTP HEAD support Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): use Scope type annotation in _get_forwarded_auth_from_scope Co-authored-by: Cursor <cursoragent@cursor.com> * Fix MCP upstream auth probe method Co-authored-by: Yassin Kortam <yassin@berri.ai> * Remove unused AsyncHTTPHandler head method Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): exclude has_client_credentials servers from pre-flight auth probe _prepare_mcp_server_headers skips caller Authorization when the server uses OAuth client-credentials (M2M), but the pre-flight probe was still selecting those servers and forwarding the caller's raw token in the HEAD request. Exclude servers with has_client_credentials from the probe list to match the actual downstream header-preparation logic. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): propagate upstream 403 as 403, not 401 with WWW-Authenticate Per RFC 9110, 401 means "go get new credentials." Mapping an upstream 403 to a gateway 401 causes OAuth clients to restart the authorization flow, obtain a fresh token with identical scopes, hit 403 again, and loop indefinitely. 401 from upstream → gateway 401 + WWW-Authenticate (re-authorize) 403 from upstream → gateway 403 (no WWW-Authenticate hint) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): skip auth probe when Authorization may be the LiteLLM proxy key The pre-flight upstream probe must not forward the caller's Authorization header when it could itself be the LiteLLM proxy API key. Restrict the probe to requests that supply x-litellm-api-key explicitly — only then is the Authorization header unambiguously the upstream OAuth token the caller wants forwarded. * Fix MCP ASGI HTTPException propagation Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): use public AsyncHTTPHandler.post() in auth probe Use AsyncHTTPHandler.post() and catch httpx.HTTPStatusError explicitly so the 401/403 we want to surface is not silently swallowed by the broad fail-open except Exception block. Avoids reaching into the handler's private client attribute, which would silently regress to fail-open if AsyncHTTPHandler is ever refactored. * Fix MCP auth probe tests Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(mcp): add coverage for httpx.HTTPStatusError path in auth probe AsyncHTTPHandler.post() calls raise_for_status() internally, so a real upstream 401/403 lands as httpx.HTTPStatusError. Add a test that exercises that specific exception path so a regression that swallows the error in the broad fail-open except Exception would be caught. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: claude-bot <claude-bot@anthropic.com> * fix(cost): align vertex_ai/gemini-embedding-2-preview with Vertex multimodal pricing (#27848) * fix(cost): align vertex_ai/gemini-embedding-2-preview with Vertex multimodal pricing Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cost): align vertex_ai/gemini-embedding-2 GA source URL with preview Per Greptile review on #27848: GA entry referenced ai.google.dev while the preview entry was updated to the canonical Vertex AI pricing page. Both share identical pricing values; sync the source URL for consistency. https://claude.ai/code/session_01W8jRwstnmduadGw8Z8egxe --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude <noreply@anthropic.com> * feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough (#27834) * feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough Adds an opt-in per-server flag that lets clients (e.g. VS Code) complete PKCE directly with an upstream OAuth2 MCP server, instead of LiteLLM double-gating with its own API-key/SSO check. Only honored when auth_type=oauth2 and the operator explicitly sets the flag; mixed-target or non-oauth2 requests fail closed. - Adds the field to Pydantic models, Prisma schema, and a migration - New MCPRequestHandler._target_servers_delegate_auth_to_upstream gate that runs only when no x-litellm-api-key is present, so authenticated users still get user_id resolution + stored-credential lookup - Anonymous callers now see delegate servers in get_allowed_mcp_servers (scoped to delegate servers only; the upstream still enforces auth) - mcp_management_endpoints: allow anonymous /authorize and /token for delegate servers so VS Code can complete PKCE without a LiteLLM session - UI toggle (shown only for oauth2) + payload/view wiring - Tests covering: oauth2 on/off, non-oauth2 with flag, mixed targets, no resolvable target, explicit key precedence, and 401 emission Co-authored-by: Cursor <cursoragent@cursor.com> * Enforce oauth2 for delegated MCP auth bypass Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): close secondary Authorization bypass for delegate servers The delegate-auth bypass gated only on the primary `x-litellm-api-key` header, so a LiteLLM key sent via `Authorization: Bearer sk-...` (the secondary header) was silently dropped — skipping spend tracking and rate limiting. Gate on the resolved litellm_api_key (which considers both headers) so the bypass fires only when neither is present. Also update the existing "Authorization header present" test to reflect that an upstream OAuth token now flows through the existing oauth2 fallback (LiteLLM auth attempt → fail → anonymous), not via the delegate branch. Co-authored-by: Cursor <cursoragent@cursor.com> * Avoid duplicate MCP OAuth credential lookup Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): block delegate bypass for M2M and internal-only servers Two security issues flagged in code review: 1. High – client_credentials (M2M) servers must not be delegatable: LiteLLM auto-fetches the upstream token using stored credentials, so allowing anonymous bypass would let any external caller invoke tools authenticated as LiteLLM's service account. Fix: check `server.has_client_credentials` in `_target_servers_delegate_auth_to_upstream`, the anonymous allow-list in `get_allowed_mcp_servers`, and `_mcp_oauth_user_api_key_auth`. 2. Medium – internal-only servers exposed to public internet: The anonymous delegate allow-list was not filtering by `available_on_public_internet`, so external callers with an upstream OAuth token could invoke tools on servers marked internal-only. Fix: add `available_on_public_internet` guard to the anonymous delegate server list in `get_allowed_mcp_servers`. Tests added for both cases. Co-authored-by: Cursor <cursoragent@cursor.com> * Require public MCP delegate auth servers Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): align delegate auth path parsing with downstream routing `_extract_target_server_names_from_path` used a naive segments-based split while `server.py::_get_mcp_servers_in_path` uses a regex that allows server names with one embedded slash and comma-separated lists. With the old parser, a request to `/mcp/<delegated>/<garbage>` was parsed as targeting `<delegated>` by the auth gate (bypassing LiteLLM auth) while the routing layer parsed it as `<delegated>/<garbage>` — when that name did not resolve, the request fell back to the anonymous allow-list, which can include `allow_all_keys` servers that normally require a LiteLLM key. Replace the parser with the same regex logic as `_get_mcp_servers_in_path` so auth gating sees the exact target name(s) downstream routing sees. Add regression tests covering parser parity and the specific extra-path-segment bypass attempt. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * fix(mcp): close header/path TOCTOU in MCP delegate auth gate `_target_servers_delegate_auth_to_upstream` and `_target_servers_use_oauth2` trusted the `x-mcp-servers` header when present, but `server.py::extract_mcp_auth_context` overrides that header with the path-derived list for `/mcp/...` routes. An attacker could set `x-mcp-servers: <delegated>` while pointing the URL path at a non-delegate server, flipping the auth gate without changing the target downstream routing actually uses. Extract a shared `_resolve_target_server_names` helper that mirrors the downstream override (path-derived names for `/mcp/...` routes, header value otherwise). Add regression tests covering the TOCTOU attempt and the helper's path-vs-header precedence. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * Fix delegated MCP OAuth test mock Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): drop unreachable /{server}/mcp branch in auth path parser `_extract_target_server_names_from_path` also matched the ``/{server_name}/mcp`` form, but the downstream parser ``_get_mcp_servers_in_path`` only handles ``/mcp/...`` — and ``dynamic_mcp_route`` in ``proxy_server`` rewrites ``/{name}/mcp`` to ``/mcp/{name}`` on the scope before the MCP handler runs. Parsing the un-rewritten form on the auth side was therefore unreachable in production, and contradicted the docstring's claim of mirroring the downstream parser — exactly the kind of mismatch that risks a future header/path TOCTOU if any new entry point skips the rewrite. Drop the branch; the canonical ``/mcp/...`` path matches both parsers. Update the regression test to assert the new behavior. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * Fix MCP path auth target resolution Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): require auth for refresh_token grants on delegate-auth servers `_mcp_oauth_user_api_key_auth` gates the unauthenticated PKCE flow for ``delegate_auth_to_upstream`` servers, but the bypass applied to BOTH ``/authorize`` and ``/token`` regardless of grant type. ``mcp_token`` accepts ``grant_type=refresh_token`` as well as ``authorization_code``, and ``exchange_token_with_server`` attaches the server's stored ``client_secret`` to whatever is forwarded upstream. An unauthenticated caller holding a refresh token issued to that OAuth client could mint fresh upstream access tokens through LiteLLM. Limit the anonymous bypass on ``/token`` to ``grant_type=authorization_code`` (the only grant PKCE actually protects via ``code_verifier``); fall through to normal LiteLLM auth for ``refresh_token`` and any other grant. ``/authorize`` continues to allow anonymous PKCE redirects. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * fix(ui): clear delegate_auth_to_upstream when switching off oauth2 The ``delegate_auth_to_upstream`` form field is rendered inside an ``isOAuth2 && (...)`` conditional, so the Form.Item unmounts when the user changes ``auth_type`` away from ``oauth2``. The follow-up ``form.setFieldValue("delegate_auth_to_upstream", false)`` runs after the field has already deregistered, so ``onFinish`` receives ``undefined`` and the fallback ``?? mcpServer.delegate_auth_to_upstream`` preserved the old ``true``. The flag then persisted in the database for a non-oauth2 server and silently re-activated if ``auth_type`` was later switched back to ``oauth2``. In the edit payload, force the flag to ``false`` whenever ``auth_type !== oauth2``; only trust the form value (and the existing DB fallback) when the server is actually oauth2. Backend defense-in-depth already ignores the flag for non-oauth2 servers, but the DB state should stay clean too. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * Fix MCP delegate auth reset on edit Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Claude <claude@anthropic.com> * fix(responses): preserve cache_control in Responses API -> Chat Completion transformation (#27727) * fix(responses): preserve cache_control in Responses API -> Chat Completion transformation cache_control injected by AnthropicCacheControlHook was silently dropped when _transform_responses_api_content_to_chat_completion_content rebuilt content blocks with only {type, text}. Now copies cache_control through so Anthropic prompt caching works correctly when using client.responses.create with cache_control_injection_points. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(responses): preserve cache_control for input_image and input_file blocks Extends the cache_control fix to image and file content blocks, which were also silently dropping cache_control during the Responses API -> Chat Completion transformation. Adds tests for all three content block types. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Babysitter <claude@anthropic.com> * fix(proxy): expose db status on public /health/readiness External readiness probes consumed the legacy detailed payload's `db` field to drive alerting and pod-rotation decisions. Stripping the body to `{"status": "healthy"}` broke those probes silently — the HTTP code still flipped to 503, but probes checking `body.db == "connected"` treated the response as healthy. Add `db` back to the unauthenticated payload. Keep the rest of the diagnostic fields (litellm_version, callbacks, cache, log_level) gated behind /health/readiness/details so the recon-leak gate from #26912 holds. Values match the legacy contract: "connected", "disconnected", "Not connected". * docs(budget_manager): add docstring to BudgetManager.reset_cost (#27867) Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> * docs: add class docstring to _LoopWrapper (#27870) Document the purpose of the daemon thread that backs the sync branch of the timeout decorator. Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> * fix: Fix Redis Sentinel client handling to solve authentication error… (#26302) * fix: Fix Redis Sentinel client handling to solve authentication error with password protected sentinel (#25625) * fix Redis Sentinel authentication handling * test: cover Redis Sentinel auth routing * refactor: align Redis Sentinel kwargs threading * fix: avoid duplicate Redis Sentinel socket timeouts * Address review comments * refactor(_redis): return set from _get_redis_kwargs for O(1) lookup Align _get_redis_kwargs() with the cluster helper by returning a set instead of a list, so the sentinel connection-kwargs filter uses O(1) membership tests. Addresses Greptile review feedback on PR #26302. * fix(_redis): restore Azure-specific kwargs in cluster kwargs set The set-literal refactor of _get_redis_cluster_kwargs dropped four LiteLLM-custom Azure keys (azure_redis_ad_token, azure_client_id, azure_tenant_id, azure_client_secret) that the prior list form had explicitly appended. Because they are not in RedisCluster's argspec, they were silently stripped, breaking Azure IAM auth on cluster clients. Re-add them to the explicit include set. --------- Co-authored-by: Kristin Cowalcijk <kristincowalcijk@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: krrish-berri-2 <krrish-berri-2@users.noreply.github.com> Co-authored-by: claude <claude@anthropic.com> * Litellm agent oss staging 05 11 2026 (#27733) * fix(ollama): Include provider in model list for ollama (#26135) * Include provider in model names for ollama * Fix unit tests * fix(ollama): process both thinking and content in same streaming chunk (#26098) * fix(health_check): skip max_tokens for image_generation mode (#26417) * fix(health_check): skip max_tokens for image_generation mode `_update_litellm_params_for_health_check` injected `max_tokens` for every deployment. OpenAI `/v1/images/generations` strictly rejects unknown fields, so health checks for dall-e-* and gpt-image-1 always failed with `400 "Unknown parameter: 'max_tokens'"` even though the actual image endpoint calls succeed. Skip the `max_tokens` injection when `model_info.mode == "image_generation"`. `messages` still gets injected (downstream `_filter_model_params` already strips it for non-chat handlers). * Switch to allow-list with per-deployment override Per @krrishdholakia review: deny-listing image_generation only re-introduces the same bug for every other non-chat mode (embedding, audio_*, rerank, video_generation, ocr, search, moderation, ...). Replace the single image_generation skip with `_MAX_TOKEN_SUPPORT_MODES = {chat, completion, responses}`. Missing `mode` is treated as chat for backward compatibility. New modes are safe by default. Add `model_info.health_check_supports_max_tokens` as an operator escape hatch — True forces injection on a non-listed deployment (operator wants to bound probe tokens), False suppresses it on a chat-style deployment behind a strict-schema provider. Tests: parametrize over 3 chat-style + 10 non-chat modes, plus override on/off and the no-mode legacy path. * fix(http_handler): handle RequestNotRead in MaskedHTTPStatusError for multipart uploads (#26718) Squash-merged by litellm-agent from dawidkulpa's PR. * fix(ollama): guard against double 'ollama/' prefix in live model listing Greptile flagged that Ollama servers can return names that already start with 'ollama/'. Check the prefix before prepending so we don't produce 'ollama/ollama/...'. Adds a regression test. * Fix Ollama empty reasoning stream chunks Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: James Myatt <james@jamesmyatt.co.uk> Co-authored-by: VHash <225398745+vhash0@users.noreply.github.com> Co-authored-by: hayden <sewhan.kim+@a-bly.com> Co-authored-by: dawidkulpa <84176950+dawidkulpa@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> * Ishaan - May 13th Staging LiteLLM (#27877) * fix: strip Gemini thought-signature from tool_use.id in non-streaming path; example websearch config (#27873) - adapters/transformation.py: mirror the streaming path and strip the `__thought__<b64>` suffix off `tool_call.id` before building the AnthropicResponseContentBlockToolUse. Base64's `+ / =` characters violate Anthropic's `^[a-zA-Z0-9_-]+$` tool_use.id pattern, so when a conversation that flowed through Gemini is later replayed to an Anthropic-native provider (Bedrock or Anthropic API) the request 400s. - example_config_yaml/websearch_interception_config.yaml: register the interceptor under `callbacks:` not `success_callback:`. `success_callback` does not run pre-request hooks, so the tool-conversion step never fires on `/v1/messages` and the raw `web_search_20250305` tool is forwarded to Bedrock, which 400s. - adds a unit test pinning the non-streaming strip behavior and the surviving `^[a-zA-Z0-9_-]+$` shape of the resulting id. Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> * Fix/azure image edit auth header (#27863) * fix(azure/image_edit): use api-key header instead of Authorization Bearer Delegate `AzureImageEditConfig.validate_environment` to `BaseAzureLLM._base_validate_azure_environment` so the image-edit route follows the same auth resolution as every other Azure provider: - prefer the Azure-native `api-key` header when an API key is available - fall back to `Authorization: Bearer <azure_ad_token>` only for AAD auth The previous implementation unconditionally set `Authorization: Bearer <api_key>`, which is the OpenAI-direct convention and is rejected by Azure OpenAI / APIM-fronted deployments with `401 Access denied due to missing subscription key`. Adds regression tests covering api_key kwarg, litellm_params.api_key, and the AAD-token fallback path. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(azure/image_edit): pin api-key precedence semantics + add regression test Address review feedback that the move to ``BaseAzureLLM._base_validate_azure_environment`` changed the relative priority of the positional ``api_key`` kwarg vs. ``litellm_params["api_key"]``. The new behavior — ``litellm_params["api_key"]`` wins, positional only fills in when ``litellm_params["api_key"]`` is empty — is intentional and matches every other Azure ``validate_environment``: ``AzureVideosConfig`` uses the exact same merge logic, while ``AzureVectorStoresConfig`` and ``AzureResponsesAPIConfig`` don't accept a positional ``api_key`` at all. The old ``or`` chain (positional wins) was the outlier and was part of the same OpenAI-vs-Azure convention drift that produced the original ``Authorization: Bearer`` bug. The only production caller (``llm_http_handler.image_edit``) sources both values from the same ``litellm_params.api_key``, so this change is behaviorally a no-op there. Document the precedence in the docstring and lock it in with an explicit test so future refactors can't quietly re-invert it. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Adam Kirstein <adam.kirstein@disney.com> Co-authored-by: Cursor <cursoragent@cursor.com> * test(azure/image_edit): expect api-key header instead of Authorization Bearer PR #27863 fixed Azure image edit to use the Azure-native api-key header instead of OpenAI's Authorization: Bearer convention, but did not update test_azure_image_edit_litellm_sdk to match. The test still asserted 'Authorization' in headers, which now fails since the new code routes through BaseAzureLLM._base_validate_azure_environment and emits api-key when an api_key is provided. Update the assertion to pin the correct Azure behavior: api-key header present with the resolved key, and no Authorization header. --------- Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: Adam Kirstein <107421694+justalittleadam@users.noreply.github.com> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Adam Kirstein <adam.kirstein@disney.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> * fix(fireworks_ai): strip `thinking_blocks` from chat messages before Fireworks API call (#27881) * fix(fireworks_ai): strip thinking_blocks from chat messages before API call Fireworks OpenAI-compatible ChatMessage schema uses additionalProperties:false and rejects Anthropic-style messages[].thinking_blocks (e.g. Claude Code replays), returning invalid_request_error. Remove the field in _transform_messages_helper alongside provider_specific_fields. Adds unit test test_transform_messages_helper_strips_thinking_blocks. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(fireworks_ai): drop inline comments from message sanitization Co-authored-by: Cursor <cursoragent@cursor.com> * docs(fireworks_ai): explain why provider_specific_fields and thinking_blocks are stripped Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix: block client-side pricing injection via request body Authenticated clients could supply CustomPricingLiteLLMParams fields (input_cost_per_token, output_cost_per_token, etc.) in the request body. These were forwarded to register_model() in main.py, permanently mutating the shared global litellm.model_cost dict for all users on the instance. Adds all CustomPricingLiteLLMParams fields to _BANNED_REQUEST_BODY_PARAMS so is_request_body_safe() rejects them before they reach completion(). New pricing fields added to CustomPricingLiteLLMParams are auto-covered. Admin opt-in via allow_client_side_credentials or configurable_clientside_auth_params still works as before. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * chore(proxy): scrub remote-URL module loads from DB-overlay config When ``ProxyConfig`` merges DB-persisted ``litellm_settings`` / ``general_settings`` on top of the YAML config, the merged dict is later iterated by ``load_config`` which threads ``config_file_path`` (the YAML path) into ``get_instance_fn``. The runtime gate that refuses ``s3://`` / ``gcs://`` modules when ``config_file_path`` is ``None`` therefore can't distinguish a YAML-sourced value from a DB-sourced one: both look the same to ``get_instance_fn``. Strip ``s3://`` / ``gcs://`` entries from the DB-overlay value for every field whose contents reach ``get_instance_fn`` during config load: - litellm_settings: ``callbacks``, ``success_callback``, ``failure_callback``, ``audit_log_callbacks``, ``post_call_rules``, ``custom_provider_map[].custom_handler`` - general_settings: ``custom_auth``, ``custom_key_generate``, ``custom_key_update``, ``custom_sso``, ``custom_ui_sso_sign_in_handler``, ``litellm_jwtauth.custom_validate`` The YAML config-file load path is unchanged — the documented operator flow (``callbacks: ["s3://bucket/module.instance"]`` in ``config.yaml``) still works. Only DB-overlay writes (e.g. via ``/config/update``) are stripped. Adds 16 regression tests covering the scrub matrix. * chore(proxy): also scrub pass_through_endpoints[].target from DB overlay A pass-through endpoint's ``target`` field is passed through ``create_pass_through_route`` into ``get_instance_fn`` during config load. A PROXY_ADMIN persisting ``target: "s3://attacker/m.i"`` via the DB-overlay ``pass_through_endpoints`` write path was not covered by the previous scrub matrix, so the remote module load would still reach the loader because the YAML-load chain has ``config_file_path`` set. Walk each entry in ``general_settings.pass_through_endpoints`` and null out any ``target`` that starts with ``s3://`` or ``gcs://``. The entry itself is preserved so the path-registration helper can choose how to handle a missing target (the existing code skips the route when ``target is None``). Adds two regression tests. * fix(prometheus): emit `litellm_remaining_tokens_metric` for Bedrock and Vertex (#27705) * fix(prometheus): emit remaining_tokens/requests gauges for bedrock + vertex (LIT-2719) Bedrock and Vertex AI never return x-ratelimit-remaining-* response headers, so litellm_remaining_tokens_metric / litellm_remaining_requests_metric only fired for OpenAI / Azure / Anthropic deployments even when tpm/rpm was configured on the router. Add a provider-agnostic fallback in PrometheusLogger.async_log_success_event that asks Router.get_remaining_model_group_usage() for the same model_group and emits the gauges with configured_limit - current_usage when the upstream provider didn't populate the headers itself. Existing OpenAI / Azure / Anthropic flows are unchanged because the fallback short-circuits when both header values are already present. Tests: 8 new tests covering bedrock + vertex emission, header short-circuit, partial-header fill, llm_router=None, missing model_group, empty router result, and router exception swallowing. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(prometheus): narrow except to ImportError, log router lookup failures via verbose_logger.exception Address greptile review: - The optional 'from litellm.proxy.proxy_server import llm_router' should guard against ImportError specifically, not all exceptions, so that unexpected errors (e.g. AttributeError from partially-initialized state) stay visible. - get_remaining_model_group_usage failures are now logged via verbose_logger.exception (with traceback) instead of debug, matching the PR description's intent and avoiding silent loss of router-cache errors in production. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(prometheus): subtract in-flight delta in router-remaining fallback The router's TPM/RPM counter is incremented by Router.deployment_callback_on_success, which f…

…oped JWT auth (#28356) * fix(proxy): point /metrics 401 at the opt-out flag Operators upgrading past 35bbca60b0 (which made /metrics auth default-on) see "Malformed API Key passed in. Ensure Key has 'Bearer ' prefix." with no hint that litellm_settings.require_auth_for_metrics_endpoint: false restores the previous unauthenticated behavior. Append that discovery hint to the existing 401 body so a Prometheus scraper that breaks after upgrade has a clear migration path. No behavior change. * fix(proxy): bound budget reservation per request instead of pinning to remaining headroom reserve_budget_for_request fell back to reserving the entire remaining team/key/user headroom whenever a request omitted max_tokens, which pinned the spend counter at max_budget for the duration of the in-flight request and false-positive-blocked every concurrent or back-to-back request until the success callback reconciled. Surfaced as an integration-test team being budget-blocked at its $2000 cap while DB spend was $0.144. Switch the missing-max_tokens path to a fixed default of 16384 output tokens (mirrors parallel_request_limiter_v3's DEFAULT_MAX_TOKENS_ESTIMATE precedent), and clamp explicit max_tokens at the model's max_output_tokens for reservation accounting only. The outbound request body is unchanged, so providers see whatever the caller actually sent; only the local integer used to compute reservation cost is bounded. This also prevents a hostile max_tokens=999999999 from inflating one request's reservation up to the entire team headroom. For Opus 4.7 (output $25/M, max_output 128K) on a $2000 budget the worst-case per-request reservation drops from "everything left" to $3.20, raising admittable concurrency from 1 to ~625. * fix(proxy): reserve per-image cost for image-generation requests Image-generation routes (dall-e-3, flux, etc.) have no per-token output cost so they fell through to the no-reservation read-time-only path. Concurrent image requests against a depleted budget could all pass common_checks (counter exactly at max_budget passes the strict-`>` gate) and reach the provider before reconciliation caught up. Add per-image reservation in _estimate_request_max_cost_for_model: when the model has a per-image cost field, reserve `n × cost_per_image` upfront. The atomic counter increment serializes concurrent admissions, so the second request sees the post-first-reservation counter and raises BudgetExceededError instead of silently leaking through. Both `output_cost_per_image` and `input_cost_per_image` are honored — naming is inconsistent across providers (OpenAI dall-e-3 uses input_cost_per_image, aiml/dall-e-3 uses output_cost_per_image for the same per-generated-image price). Per-pixel pricing (DALL-E 2 size variants) and TTS/STT routes still fall through to read-time enforcement; those are follow-ups. * fix(proxy): gate image-gen reservation strictly on model mode The previous detection treated any model with input_cost_per_image or output_cost_per_image as image generation. Several chat and embedding models carry those fields to price multimodal vision input, not generated images: - gemini-3.1-pro-preview (mode=chat) has output_cost_per_image=0.00012 alongside input/output token pricing. - azure/gpt-realtime-* (mode=chat) has input_cost_per_image=5e-6. - amazon.titan-embed-image-v1 (mode=embedding) has input_cost_per_image=6e-5. For these models the image-gen branch fired first and reserved a fraction of a cent per request, short-circuiting the token-priced path entirely. Long Gemini chats reserved 1 × $0.00012 instead of the true token cost. Gate strictly on mode in {"image_generation", "image_edit"}. All 197 real image_generation entries and all 31 image_edit entries (Flux Kontext, Stability inpaint/outpaint, etc.) carry the right mode, so the field-presence fallback was unnecessary. Adds regression tests for the chat-model-with-image-cost-field case and for image_edit reservation. * build(packaging): relax core runtime pins to ranges Backport of #27241 onto litellm_1.84.0rc2. The 12 entries in `[project.dependencies]` were exact `==` pins, a side effect of the Poetry -> uv migration. This forces every downstream package that lists litellm as a dependency to downgrade common runtime libraries (openai, pydantic, aiohttp, click, jsonschema, ...) to the exact versions we ship. Switch to lower-bounded ranges with upper bounds where the upstream package is pre-1.0 or has a known breaking-major-version policy. Reproducibility for our Docker proxy and CI continues to come from `uv.lock`, which is regenerated here as a metadata-only diff. Conflict resolution vs upstream merge: - The upstream merge commit also surfaced unrelated context entries (nvidia-riva-client, soundfile/stt-nvidia-riva extra) that exist in staging but not in rc2. Those are not part of #27241's intent and were dropped from the resolution; the rc2 uv.lock keeps its existing entry set, only the 12 specifier strings changed. - `uv lock --check` passes (392 packages resolved, no drift). * build(packaging): raise jinja2 floor to 3.1.6 Our `uv.lock` already resolves jinja2 to 3.1.6, so Docker / CI installs get that version. The `pyproject.toml` floor was lagging at 3.1.0, which means downstream consumers using `--resolution=lowest-direct` or older constraint files can land on 3.1.0-3.1.5 instead of the version we actually test against. Aligns the declared floor with the resolved version so external installers see the same baseline our test matrix exercises. `uv lock` diff is metadata-only (no resolved-version drift). * fix(mcp): forward extra_headers for OpenAPI MCP tools OpenAPI-generated tools only applied static closure headers and BYOK Authorization via ContextVar. Copy MCPServer.extra_headers from the incoming MCP request into _request_extra_headers (set in server.py before local tool dispatch), merge in openapi_to_mcp_generator via a small helper. OAuth2 M2M: do not forward caller Authorization from raw_headers (same rule as _prepare_mcp_server_headers for managed MCP). Adds TestRequestExtraHeaders and clarifies mcp_server_manager registration comment. Fixes #26794 Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(mcp): access has_client_credentials on MCPServer directly Greptile: getattr default was redundant; property exists on MCPServer and mcp_server is non-None inside the extra_headers forwarding block. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): static headers win over forwarded headers in OpenAPI MCP Match the existing MCP invariant in merge_mcp_headers and the managed MCP path: operator-configured static headers always override caller-forwarded headers on name conflict, with case-insensitive comparison so different casing cannot bypass the precedence. _request_auth_header (BYOK) still overrides Authorization last. Addresses Veria review on PR #27383. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(proxy): always merge caller-supplied tags into request metadata Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`) were silently dropped unless the key/team had `metadata.allow_client_tags: true` set. Restore the documented behavior: tags from the request always flow into `metadata.tags` and union with any admin-configured static tags from key/team/project metadata. Removes the `allow_client_tags` opt-in flag from the pre-call pipeline. The flag was only ever read here; it has no schema or endpoint footprint, so leftover values in existing key metadata are inert. Test cleanup mirrors the simplification: drop the three tests that verified the strip-when-not-opted-in path, drop the `allow_client_tags` fixture lines from the merge/union tests. * docs(proxy): refresh stale comments referencing removed tag strip The tag-strip block was removed in the parent commit but two surrounding comments still referenced "tags without opt-in" and "runs AFTER the strip". Update them to describe the remaining user_api_key_* and _pipeline_managed_guardrails strip that the snapshot/merge ordering actually protects against. * chore: reject bare str at file-input sinks to prevent local-file read (#27762) Cherry-pick of #27762 onto litellm_1.84.0rc2. * chore: reject bare str at file-input sinks to prevent local-file read (#27667) * fix: use os.PathLike in ocr sink and check truthy reasoningSummary for bridge - ocr/main.py: widen Path check to os.PathLike for consistency with other sinks - main.py: bridge condition checks truthiness of reasoning_summary, not just None * fix: remove unused pathlib.Path import in ocr/main.py Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> * Strip SERVER_ROOT_PATH before lazy-feature prefix match LazyFeatureMiddleware compared the raw scope path against registered prefixes (e.g. /policies), so requests under a server root path like /api/v1/policies/... never matched, the feature never loaded, and the endpoint returned 404. Strip the configured root path before matching, normalizing trailing slashes and enforcing a component boundary so /api does not falsely match /apiv2. * Cache normalized SERVER_ROOT_PATH at middleware init SERVER_ROOT_PATH is a process-startup env var. Read it once in __init__ instead of calling get_server_root_path() + rstrip on every request that arrives before all lazy features have loaded. * chore(proxy): backport /key/regenerate ownership-rebind + premium-gate guards (#27793) Backport of #27793 onto litellm_1.84.0rc2. A non-admin caller could rebind their own key's user_id via /key/regenerate. _execute_virtual_key_regeneration had org/team guards but no user_id guard, and prepare_key_update_data did not strip the field — it survived model_dump(exclude_unset=True) into the Prisma update. On the next request, _return_user_api_key_auth_obj resolved the rebound user_id against litellm_usertable and returned PROXY_ADMIN whenever the target row's user_role was admin. /key/update had the equivalent guard inline at _validate_update_key_data; extract it to a shared helper _validate_caller_can_change_key_ownership and call from both /key/update and _execute_virtual_key_regeneration. Also tighten the premium gate that allowed the master-key rotation branch to skip the enterprise check. The previous predicate was a field-presence test, not an identity check. Verify the caller actually holds the master key via _is_master_key before allowing the non-premium path. Block explicit-null user_id and empty-string user_id as removal attempts; both 403-reject for non-admin callers. * fix(proxy): expose db status on public /health/readiness Backport of #27866 onto litellm_1.84.0rc2. External readiness probes consumed the legacy detailed payload's `db` field to drive alerting and pod-rotation decisions. Stripping the body to {"status": "healthy"} broke those probes silently — the HTTP code still flipped to 503, but probes checking body.db == "connected" treated the response as healthy. Add `db` back to the unauthenticated payload. The rest of the diagnostic fields (litellm_version, callbacks, cache, log_level) stay behind /health/readiness/details so the recon-leak gate from #26912 holds. Values match the legacy contract: "connected", "disconnected", "Not connected". The 503-on-DB-disconnect behavior from LIT-2607 is preserved. * fix(ui): fetch version + debug flag from /health/readiness/details The proxy moved `litellm_version`, `is_detailed_debug`, and other diagnostic fields off the public `/health/readiness` payload behind an auth-gated `/health/readiness/details` endpoint. The navbar version tag and the detailed-debug-mode banner stopped working because they were still reading those fields from the unauthed response, which no longer contains them. Replace `useHealthReadiness` with a `useHealthReadinessDetails` hook that takes an `accessToken` argument and sends a Bearer header to the auth-gated endpoint. The hook stays disabled while `accessToken` is falsy, so the navbar can keep rendering on the public model hub (where the token is null) without triggering an auth redirect or a 401-loop. * fix(ui): disable retries on readiness/details + cover token forwarding Two small follow-ups on the readiness/details migration: - Set `retry: false` on the query. The payload feeds a passive navbar tag and a debug banner; a 401 from an expired token shouldn't fan out into three retries against the proxy. - Add navbar specs that assert the `accessToken` prop is forwarded into the hook (matches the DebugWarningBanner spec). Without this, the navbar could silently regress to passing `undefined` and the existing tests wouldn't catch it. * chore: update Next.js build artifacts (2026-05-14 03:52 UTC, node v20.20.2) * Merge pull request #27898 from stuxf/chore/banned-params-extra-body-cover chore(proxy): cover extra_body + azure_ad_token in banned-params check (cherry picked from commit a6a9d8edf024a7d808ba18df4aace4815e5f5925) * Merge pull request #27801 from stuxf/chore/get-instance-fn-runtime-s3-gate chore(proxy): refuse remote-URL instance-fn loads outside config-file path (cherry picked from commit e3e5209f51a605d49f4c1ef9b010ed5fdd1812c6) * fix: block client-side pricing injection via request body Authenticated clients could supply CustomPricingLiteLLMParams fields (input_cost_per_token, output_cost_per_token, etc.) in the request body. These were forwarded to register_model() in main.py, permanently mutating the shared global litellm.model_cost dict for all users on the instance. Adds all CustomPricingLiteLLMParams fields to _BANNED_REQUEST_BODY_PARAMS so is_request_body_safe() rejects them before they reach completion(). New pricing fields added to CustomPricingLiteLLMParams are auto-covered. Admin opt-in via allow_client_side_credentials or configurable_clientside_auth_params still works as before. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF fields in RAG ingest vector_store config aws_sts_endpoint, aws_web_identity_token, and aws_bedrock_runtime_endpoint in ingest_options.vector_store were passed directly to the Bedrock ingestion class, which reads them into boto3 STS client construction. Any authenticated caller could redirect AssumeRole calls to an attacker-controlled server, leaking the proxy's instance profile credentials. Calls is_request_body_safe() on ingest_options["vector_store"] before forwarding to litellm.aingest(). Same banned-params list and admin opt-in escape hatch (allow_client_side_credentials) as the /chat/completions path. ValueError from the safety check is caught and re-raised as HTTP 400. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: harden /key/update authorization checks (#27878) * fix: patch Host-header auth bypass in get_request_route Starlette reconstructs request.url from the Host header. A malformed Host like `localhost/?x=1` causes Starlette to build the full URL as `http://localhost/?x=1/health`, which url-parses to path="/". Since "/" is in LiteLLMRoutes.public_routes, all protected routes became reachable without authentication. Fix: read scope["path"] (set by uvicorn from the HTTP request line, not derivable from headers) instead of request.url.path. Sub-path deployments are handled via scope["app_root_path"] / scope["root_path"], mirroring Starlette's own base_url construction logic. Affected variants confirmed fixed: Host: localhost/?x=1 Host: localhost:4000/?x=1 Host: localhost/#test Host: localhost:4000/#test Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * style: reduce comments in route fix Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block credential fields in RAG ingest vector_store options Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.) in ingest_options.vector_store are now rejected at the API boundary with a 400 error. Credentials must be configured server-side. Previously any authenticated user could supply a vertex_credentials dict with type=external_account pointing credential_source.file at an arbitrary path (e.g. /proc/1/environ) and token_url at an attacker-controlled server. google-auth's identity_pool.Credentials refresh() would read the file and POST its contents to the attacker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block /key/update self-escalation by assigned users Non-admin users who were assigned a key (created_by != caller) could update any non-budget field — models, rpm_limit, guardrails, etc. — without admin authorization, allowing privilege self-escalation. Gate: only the key creator (created_by == caller) may edit their own key without admin check; budget changes always require admin regardless of creator status. All other callers must pass _check_key_admin_access. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block user-controlled api_base in RAG ingest vector_store options A user-supplied api_base in ingest_options.vector_store caused the server to forward its configured provider credentials (Gemini, OpenAI) to an attacker-controlled endpoint via SSRF. Add api_base to the blocked credential params set alongside api_key and the existing credential fields. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check Any authenticated internal_user could POST arbitrary provider config (aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have the server forward its credentials to an attacker-controlled endpoint. - Gate the endpoint on PROXY_ADMIN role (403 for all other roles) - Call is_request_body_safe() to reject banned params even for admins - Convert ValueError from safety check to HTTP 400 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: apply banned-param check to /utils/transform_request Without is_request_body_safe(), any authenticated user could pass aws_sts_endpoint, api_base, or aws_web_identity_token to /utils/transform_request and have the server forward its configured provider credentials to an attacker-controlled endpoint during SDK credential resolution. Applies the same banned-param blocklist already used by LLM endpoints. Endpoint remains accessible to all authenticated users. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter Any frontmatter key not in ["model","input","output"] flowed into optional_params and was merged into the LLM call data dict, bypassing is_request_body_safe. An attacker with any bearer key could set api_base in YAML to redirect the outbound LLM request — including the provider API key — to an attacker-controlled host. Fix: call is_request_body_safe on the constructed data dict after optional_params are merged, before invoking ProxyBaseLLMRequestProcessing. ValueError from the banned-param check is surfaced as HTTP 400. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Update litellm/proxy/rag_endpoints/endpoints.py Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * fix: coerce nested config strings before banned-param check _NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently skipped litellm_embedding_config when delivered as a JSON string via multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.) nested inside the stringified value were invisible to is_request_body_safe. _NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: replace substring match with prefix match in is_llm_api_route mapped_pass_through_routes used `_llm_passthrough_route in route` (substring) so any admin-only path whose URL contained a provider name (openai, anthropic, azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the admin gate in non_proxy_admin_allowed_routes_check. Confirmed live: non-admin key could GET /credentials/by_name/openai (read masked provider API key) and DELETE /credentials/openai (delete credential). Fix: use exact match or startswith(prefix + "/") — the same pattern used everywhere else in RouteChecks — so only routes that actually start with a passthrough prefix are allowed through. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize PR #27878 test failures - key_management_endpoints: extend can_skip_admin_check to team keys so team members with /key/update permission can update non-budget fields. can_team_member_execute_key_management_endpoint already validates team membership + permission and raises if unauthorized; reaching the admin check on a team key means the caller was authorized. - test: set created_by on mock key in test_update_key_non_budget_fields_allowed_for_internal_user so caller_is_creator resolves correctly (MagicMock default ≠ user_id). - auth_utils.get_request_route: guard against non-dict request.scope (e.g. MagicMock in unit tests) to prevent a MagicMock leaking into UserAPIKeyAuth.request_route and failing Pydantic validation. - ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard in test-unit-proxy-db.yml to satisfy the shard-coverage check. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(lint): add explicit str() cast in get_request_route for MyPy scope.get() returns Any|None which MyPy cannot coerce to str implicitly. Wrap both scope.get() calls in str() to satisfy the type checker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: guard bare-/ root_path strip + make total_spend migration idempotent auth_utils.get_request_route: when Starlette sets scope["app_root_path"] to "/" (e.g. behind some middleware), the old stripping logic would remove the leading slash from every path ("/team/new" → "team/new"), breaking route matching and causing auth to misclassify protected routes. Skip stripping when root_path is bare "/". migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration is safe to replay when a prior partial run already created the column. Without this guard, prisma migrate deploy fails on CI DBs that were partially migrated, causing all subsequent DB operations (including /team/new) to 500. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: require creator still owns key for personal-key bypass in /key/update caller_is_creator now requires both created_by == caller AND user_id == caller. Previously checking only created_by let a demoted admin who originally created a key for another user continue editing non-budget fields on it after reassignment, bypassing _check_key_admin_access. Adds regression test: creator whose key was reassigned is blocked (403). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: extract auth checks to fix PLR0915 + broaden max_budget assertion internal_user_endpoints._update_single_user_helper exceeded 50 statements (PLR0915). Extract authorization checks into _check_user_update_authz helper to bring statement count under the limit. test_validate_max_budget: assert "negative" (substring of both the local "cannot be negative" and the CI "non-negative finite number" messages) so the test is stable regardless of which exact wording the function uses. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * bump: version 0.4.71 → 0.4.72 * uv lock * feat(mcp): support OAuth passthrough discovery * fix(mcp): support OAuth browser auth * fix(mcp): refine upstream OAuth metadata fallback * feat(proxy): support issuer-scoped JWT auth * fix(mcp): validate oauth callback redirect sink * feat(proxy): support issuer-scoped JWT auth * test(mcp): align trusted proxy fixtures * style(mcp): satisfy black formatting * chore(ui): bump next to 16.2.6 * fix(mcp): address oauth passthrough review findings * test(mcp): split oauth passthrough regressions * fix(interactions): align openapi response fields * security: prevent forwarding litellm api keys to upstream mcp servers - Strip Authorization header from extra_headers for pass-through servers - Pass-through servers (auth_type=None with extra_headers: [Authorization]) must not receive the user's LiteLLM API key - Only OAuth2 M2M and pass-through servers skip Authorization header - Other headers (x-request-id, x-trace-id) are still forwarded normally - Fixes credential leakage / authentication bypass in MCP pass-through mode * fix(interactions): remove steps field not in google openapi spec The steps field was added but is not present in the current Google Interactions OpenAPI specification. Revert to using only the fields that are actually defined in the spec. * fix(mcp): forward Authorization in pass-through when x-litellm-api-key is admission Commit 3753970cc9 widened the Authorization strip to cover all is_oauth_passthrough servers — protecting against the LiteLLM admission key leaking upstream when the caller used Authorization for admission, but also silently stripping legitimate upstream OAuth bearers when the caller used x-litellm-api-key for admission. That broke transparent OAuth pass-through (EAI-506 V5/V6): standards- compliant MCP clients (OpenCode, Claude Code, mcp-inspector) complete PKCE against the upstream IdP and send the resulting token as plain Authorization: Bearer per the MCP spec — with the wider strip in place, that token never reaches the upstream and tools/list returns empty. Narrow the strip: skip Authorization for pass-through servers only when the caller did NOT supply x-litellm-api-key. When x-litellm-api-key is present, admission is unambiguous and Authorization is free to carry the upstream OAuth bearer. The original security guarantee is preserved — a client that sends only Authorization (no x-litellm-api-key) still has it stripped, so the LiteLLM key cannot leak upstream via that path. Tests: - new: forwards Authorization when x-litellm-api-key is present - new: still strips Authorization when only Authorization is present - existing pass-through + M2M tests unchanged Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(interactions): align status enum with openapi spec * fix(mcp,jwt): address greptile review concerns - Cache _get_agent_object_permission via user_api_key_cache (sentinel for no-permission rows) so MCP requests from agent keys don't hit the DB on every tool-list / tool-call. - Re-raise HTTPException in handle_sse_mcp so 401 + WWW-Authenticate challenges (and other HTTP errors) propagate to SSE clients instead of being swallowed as 500. - Normalise booleans in _validate_token_response so admin rules written as JSON-style "true" / "false" match upstream responses that return Python True / False. - Treat configured JWT issuer claim mappings as advisory: when a mapped field is absent or empty, leave the normalised claim unset instead of raising, matching the global litellm_jwtauth path. Co-authored-by: Claude <noreply@anthropic.com> * test: replace dall-e-3 with gpt-image-1 in health check and router tests (#27813) OpenAI returns 'The model dall-e-3 does not exist' for the test account, breaking test_openai_img_gen_health_check and test_image_generation. Switch to gpt-image-1, matching the existing TestOpenAIGPTImage1 pattern. (cherry picked from commit aee58db88057b274eab70388dce72eac31ea014f) * fix(tests): drop dall-e-only test classes; route live image tests via gpt-image-1 Second wave of failures from the 2026-05-12 DALL-E shutdown: - tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditDallE2 and tests/image_gen_tests/test_image_generation.py::TestOpenAIDalle3 are explicitly named for the deprecated models and can't pass; remove. gpt-image-1 coverage already exists in sibling classes. - tests/local_testing/test_router.py image gen tests use dall-e-3 only as a routing example; swap to gpt-image-1. - tests/local_testing/test_custom_callback_input.py image_generation success/failure paths swapped to gpt-image-1. (cherry picked from commit 945b10ded467e53fc3c9b8df0329dbc55591a56e) * test(fireworks): replace deprecated llama-v3p3-70b-instruct model Fireworks removed llama-v3p3-70b-instruct from serverless, so every live test using it now fails with NotFoundError ("Model not found, inaccessible, and/or not deployed"). Swap the 6 references (3 files) to the currently-served accounts/fireworks/models/deepseek-v3p1 — the canonical model in Fireworks' current docs examples and present in LiteLLM's cost map. test_get_model_params_fireworks_ai is a pure pricing-heuristic test (no network) asserting the >16b branch, so it uses llama-v3p1-70b- instruct instead to keep the "fireworks-ai-above-16b" assertion and branch coverage intact. (cherry picked from commit 39a1d438f23f88d1c88f3e74930ab221b3e450de) * test(fireworks): mock remaining live smoke tests test_completion_fireworks_ai and test_completion_cost_fireworks_ai made real Fireworks calls and broke whenever Fireworks rotated its serverless catalog (no externally-verifiable model list exists). They also asserted nothing — just printed. Mock the HTTP post and assert real behavior instead: the request is built with the right model/messages and the OpenAI-compatible response parses back; the cost path yields a non-zero cost against the local cost map. No network, no model dependency, stronger than the old smoke checks. (cherry picked from commit b5db7ed37da21818c4defe030e3762447fe62e15) * fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 (#28281) * fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so the live audio calls in test_stream_chunk_builder_openai_audio_output_usage and test_standard_logging_payload_audio now hard-fail with a model-not-found error on every PR. The error was not "openai-internal", so the except block swallowed it and execution fell through to an unbound completion/response (UnboundLocalError). Switch both tests to gpt-audio-1.5, OpenAI's recommended successor (GA, not deprecated, already present in the litellm cost map so the response_cost assertion still resolves). Also broaden the except to skip with the real error in the reason instead of crashing, so a transient upstream blip can't reintroduce the UnboundLocalError. * fix(tests): narrow audio-test skip to model-not-found, re-raise the rest Address review feedback: an unconditional skip on any exception would silently mask a litellm-internal regression in the audio path (broken param transformation, serialization, bad header) instead of failing CI. Skip only on the upstream-unavailable class (model_not_found / "does not exist" / openai-internal) and re-raise everything else, so genuine regressions still fail loudly. The UnboundLocalError is still fixed because the handler either skips or raises - it never falls through. * fix(tests): add budget_exceeded to expected Interaction status enum Staging added budget_exceeded to the Interaction OpenAPI status enum; the staging merge into this branch picked up the spec change but not the matching test update, so test_status_enum_values failed in CI. Align the test's expected list (exact-match by design) with the live spec. * fix(tests): mock HTTP fetch in test_img_url_token_counter The test parameterized a live third-party image URL (blog.purpureus.net) which now 404s, causing get_image_dimensions to fall through to its base64 decode path and crash with 'not enough values to unpack' on every PR run. Mock safe_get with a tiny 1x1 PNG so the URL branch is still exercised without any network dependency. * fix(tests): swap gpt-4o-audio-preview to gpt-audio-1.5 in test_gpt4o_audio OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so both live tests in test_gpt4o_audio.py (test_audio_output_from_model and test_audio_input_to_model) hard-fail model_not_found on every PR. Swap the hardcoded model to OpenAI's successor gpt-audio-1.5 (same chat-completions audio surface; already in the litellm cost map). Mirror the narrowed-skip pattern from the prior audio fixes: skip on model_not_found / does-not-exist / openai-internal, re-raise everything else so genuine litellm regressions still fail CI loudly. (cherry picked from commit 92de7423efca5756a2cb1bcf3228812628f91960) * fix(tests): migrate realtime + rerank tests off shut-down upstream models (#28191) * fix(tests): use gpt-realtime in realtime guardrails test OpenAI shut down gpt-4o-realtime-preview-2024-12-17 on 2026-05-07, so the live OpenAI realtime guardrails integration test now fails with model_not_found (session.created never arrives, _wait_for_event times out). Point OPENAI_REALTIME_URL at the current GA model, gpt-realtime. Scope limited to this test: the pricing-catalog JSON keeps the retired entries intentionally (historical cost calc + separate Azure timeline), and the Azure realtime cost-calc test is unaffected. * fix(tests): mock nvidia_nim rerank instead of hitting EOL'd endpoint NVIDIA reached end-of-life for the hosted nvidia/llama-3.2-nv-rerankqa-1b-v2 rerank API on 2026-05-18 with no published replacement, so the live BaseLLMRerankTest.test_basic_rerank for nvidia_nim now returns HTTP 410 ("Gone"). NVIDIA's hosted catalog rotates on a schedule, so swapping in another live model would only defer the failure. Override test_basic_rerank in TestNvidiaNim to mock the sync/async HTTP transport (same pattern as test_nvidia_nim_rerank_ranking_endpoint in this file) and inject a fake NVIDIA_NIM_API_KEY via monkeypatch. The request/response transformation and cost calculation stay covered offline. Scope limited to nvidia_nim; other BaseLLMRerankTest providers untouched. * fix(tests): migrate remaining realtime tests off shut-down gpt-4o-realtime-preview OpenAI's 2026-05-07 shutdown removed the entire gpt-4o-realtime-preview family, including the undated 'gpt-4o-realtime-preview' alias (not just the dated snapshot fixed earlier). Three live tests still connected with the dead alias and failed with messages_received=1 (an error event instead of session.created): - test_openai_realtime_simple.py: get_model() -> gpt-realtime (drives TestOpenAIRealtime.test_realtime_connection / test_realtime_with_query_params) - test_openai_realtime.py: test_openai_realtime_direct_call_no_intent and test_openai_realtime_direct_call_with_intent -> openai/gpt-realtime (the with_intent test shares the same dead alias even though it was not in the failing set this run) Mocked unit tests (test_realtime_query_params_construction, test_realtime_query_params_use_normalized_model_name) are left as-is: they never hit the network and assert string plumbing only. Also fixes test_text_message_blocked_by_guardrail_no_ai_response, which now connects (the earlier URL swap worked) but tripped a model-wording-brittle assertion. The guardrail flow asks the model to voice the block message verbatim; gpt-4o-realtime-preview complied (output contained 'blocked'), gpt-realtime refuses verbatim-repeat instructions ('I'm sorry, but I can't repeat that message.'). Since the original user message is blocked before it reaches OpenAI, the refusal is still a safe outcome. Assertion #3 now accepts both voicing and refusal, and adds a hard check that the blocked phrase never leaks into AI output. (cherry picked from commit ce87c411bfb33a8b37acaa630a39e4e4c8685add) * fix(model_prices): register mistral/ministral-8b-2512 Mistral's API now returns model='ministral-8b-2512' when 'mistral-tiny' is requested, so test_completion_mistral_api fails with 'This model isn't mapped yet'. Adding the entry so completion_cost can resolve the cost for that response. Author: Claude <noreply@anthropic.com> * fix(mcp,auth): address greptile review concerns - handle_sse_mcp now calls _raise_preemptive_401_for_unauthenticated_servers so SSE clients to pass-through OAuth MCP servers receive the RFC 9728 401 + WWW-Authenticate challenge that the streamable-HTTP path already emits. - get_request_route strips a trailing slash from root_path before length-based prefix removal so non-canonical ASGI root_path values like "/litellm/" don't strip the leading slash from the returned route. - _mcp_oauth_user_api_key_auth's cookie JWT decode now passes options={"verify_aud": False} so a future revision of the UI session JWT containing an aud claim cannot silently downgrade the request to unauthenticated. Co-authored-by: Claude <claude@anthropic.com> * fix(tests): backfill local model_cost into remote-fetched map litellm.model_cost is loaded at import time from LITELLM_MODEL_COST_MAP_URL (pinned to main), so pricing entries that exist only in this branch (e.g. mistral/ministral-8b-2512, freshly added because Mistral's API now returns this id from mistral-tiny) are absent at test time and completion_cost lookups raise 'This model isn't mapped yet'. Backfill the in-tree backup into litellm.model_cost in the local_testing conftest so cassette-driven cost calculations resolve against the entries that ship with the branch under test. Fixes local_testing_part1 failures on test_completion_mistral_api and test_completion_mistral_api_modified_input. * fix(mcp,jwt): address greptile concurrency and code-quality concerns - _apply_issuer_claim_mappings now builds a new dict and reads from the original token, rather than mutating its input. The change is behaviour-preserving (caller passes a fresh jwt.decode result), but avoids the surprise-mutation pattern flagged by greptile. - is_network_error uses isinstance(exc, httpx.TransportError) instead of matching type(exc).__name__ against a hand-maintained string set, so ReadError / WriteError / ProxyError / etc. are also treated as transport-level failures and surfaced as HTTP 502. - fetch_upstream_oauth_protected_resource now coalesces concurrent discovery requests per (server_id, resource_url) through an asyncio.Lock so concurrent .well-known calls share a single upstream fetch + cache write. - Drop the redundant 'if trusted_ranges:' branch in get_mcp_client_ip; it is always true on the path that reaches it (the prior 'if not trusted_ranges:' early-returns). Co-authored-by: Claude <claude@anthropic.com> * fix(jwt,mcp): fall back to global JWKS on unknown issuer; prune fetch locks - handle_jwt._get_configured_issuer now returns None for tokens whose 'iss' is not in the configured issuers list, letting auth_jwt fall through to the legacy JWT_PUBLIC_KEY_URL path instead of hard-raising. This keeps existing tokens from non-configured IdPs working when an operator adds the new 'issuers' list to a live deployment. - discoverable_endpoints._prune_oauth_metadata_cache now also prunes entries in _OAUTH_METADATA_FETCH_LOCKS whose cache entry has been evicted and whose lock isn't currently held, bounding the locks dict to match the cache it guards. Co-authored-by: Claude <claude@anthropic.com> * fix(mcp,auth): restore client_ip in oauth2 target check, drop from delegate check The merge of staging into the PR branch (d42a66adb6) misplaced the client_ip=client_ip kwarg: it landed inside _target_servers_delegate_auth_to_upstream (which never accepted client_ip and isn't called with it), while the sibling _target_servers_use_oauth2 has client_ip in its signature but stopped passing it through to get_mcp_server_by_name. That left ruff flagging F821 on the undefined name and lint failing. Move client_ip back into _target_servers_use_oauth2's lookup (matching the call site that already forwards IPAddressUtils.get_mcp_client_ip) and drop it from _target_servers_delegate_auth_to_upstream so its body matches its signature again. * fix(mcp): respect client ip for delegated auth * fix(auth): address remaining greptile style findings - get_request_route: require root_path to match whole path segments before stripping, so '/apifoo' isn't truncated to 'foo' when root_path='/api'. - get_mcp_client_ip: collapse the two trusted-proxy validation branches into a single is_request_from_trusted_proxy call so the return value drives control flow instead of being discarded for the side-effect warning. Co-authored-by: Claude <claude@anthropic.com> * fix(jwt): strip internal _litellm_* claims in global JWKS auth path Prevents identity spoofing where a token signed by the global JWKS could inject _litellm_jwt_issuer and other _litellm_* claims that downstream getters trust. The issuer-scoped path already strips these via _apply_issuer_claim_mappings; mirror that behavior for the global fallback path. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): surface MCPUpstreamAuthError as 401 in SSE/HTTP transport handlers Both handle_sse_mcp and handle_streamable_http_mcp only caught HTTPException to preserve 401 + WWW-Authenticate challenges, but MCPUpstreamAuthError (raised when a pass-through server's upstream rejects a bearer token mid-session) inherits from Exception. It was falling through to the generic handler and surfacing as an opaque 500. Mirror the REST endpoint behavior: translate MCPUpstreamAuthError into an HTTPException(status_code=e.status_code) with the upstream www-authenticate header so standards-compliant MCP clients trigger the upstream OAuth flow. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): add upstream auth pre-flight in SSE handler Mirror handle_streamable_http_mcp by calling _check_passthrough_upstream_auth after the cold-start 401 emitter so expired/invalid upstream tokens surface a proper 401 + WWW-Authenticate challenge before the SSE session commits 200 headers, instead of letting list_tools silently return [] when the upstream rejects the token. Co-authored-by: Claude <noreply@anthropic.com> * fix(mcp): tighten cold-start bypass against CSV paths + dedupe upstream auth probe - Return None from _parse_mcp_server_names_from_path for CSV multi-server paths (/mcp/a,b). The regex previously truncated at the first comma and silently passed a single server name to the cold-start gate. - Switch _is_mcp_passthrough_cold_start to all-targets semantics, matching _target_servers_use_oauth2: one non-passthrough target in a co-targeted set must not flip the anonymous-admission bypass open for the others. - Drop the redundant HTTPStatusError block in _extract_upstream_auth_failure - any HTTPStatusError carries a .response, so the preceding generic block already handles 401/403 detection. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp,tests): sync stubs and cold-start assertions with delegate-check The merge of base-branch _target_servers_delegate_auth_to_upstream into process_mcp_request inserts an additional get_mcp_server_by_name(name) lookup ahead of the cold-start path, which breaks two test patterns: 1. lookup_by_name(name) side-effect stubs in TestMCPDelegateAuthToUpstream are called positionally by the delegate check, then again by the cold-start path with client_ip=... — raising TypeError: unexpected keyword argument 'client_ip'. Accept **_kwargs to match the real signature. 2. TestMCPPassthroughColdStartAdmission assertions count the lookup exactly once with client_ip=..., but the delegate check now adds a positional-only call ahead of it. Switch assert_called_once_with to assert_any_call for the cold-start invocation, and assert client_ip was *not* passed for the aggregate /mcp test where cold-start must not fire. Both updates align with CLAUDE.md guidance to keep monkeypatch stubs in sync with the real signature when an optional parameter is added. Co-authored-by: Claude <claude@anthropic.com> * fix(mcp): correct passthrough probe 401 + slashed-name cold start parser - _check_passthrough_upstream_auth now emits 'Bearer resource_metadata="..."' pointing at the gateway's oauth-protected-resource well-known URL, mirroring the pre-emptive 401 path. Pass-through servers don't use the gateway as an authorization server, so the previous 'authorization_uri=' challenge sent clients to the wrong metadata endpoint. - _parse_mcp_server_names_from_path now accepts server names that contain a single slash (e.g. custom_solutions/user_123), mirroring MCPRequestHandler._extract_target_server_names_from_path. Without this, the cold-start bypass missed slashed-name servers and the generic admission error propagated instead of the spec-compliant 401 challenge. - _is_mcp_passthrough_cold_start drops the unused scope parameter from its signature. Co-authored-by: Yassin Kortam <yassin@berri.ai> * style(mcp): format discoverable endpoints * refactor(mcp): dedupe MCPUpstreamAuthError->HTTPException + thread client_ip into delegate-auth gate Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): handle passthrough OAuth metadata and startup auth errors - discoverable_endpoints: For pass-through MCP servers, when upstream oauth-protected-resource returns a non-200/non-dict response, raise HTTP 502 instead of falling through to default gateway metadata. Falling through would direct MCP clients at the gateway, which is not the authorization server for pass-through configs. - mcp_server_manager: Wrap _get_tools_from_server in startup tool name mapping with try/except. Since _get_tools_from_server now re-raises MCPUpstreamAuthError, an upstream 401 from a pass-through server at startup (when no user token is present) would otherwise abort the loop and leave subsequent servers unmapped. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): restrict passthrough probe challenge to OAuth passthrough servers The probe filter previously matched any server with Authorization in extra_headers, including gateway-managed OAuth2 servers. Those would then receive the resource_metadata= WWW-Authenticate challenge meant for pass-through servers, instead of the authorization_uri= challenge pointing at the gateway AS metadata. Use srv.is_oauth_passthrough so only genuine pass-through servers get the resource-metadata challenge. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(proxy): cover issuer-scoped JWT auth * fix(mcp): use resource metadata for passthrough reauth * fix(mcp,tests): assert cold-start helper directly for aggregate /mcp Threading client_ip into _target_servers_delegate_auth_to_upstream made get_mcp_server_by_name(name, client_ip=...) also fire from the delegate-auth check, so the call_args_list assertion on client_ip-in-kwargs no longer uniquely signals a cold-start lookup. Patch _is_mcp_passthrough_cold_start and assert it is not invoked, which is the actual contract the test is pinning. * fix(mcp,jwt): drop unneeded async helper + suppress misleading unscoped JWT warning - _build_oauth_authorization_server_response: revert to sync (no awaits in body). The function only does dict construction and synchronous registry lookups; async added coroutine creation overhead per discovery call without need. - _build_decode_kwargs: accept has_issuer_config so the global path's 'JWT auth is unscoped' warning is suppressed when LiteLLM_JWTAuth.issuers provides per-issuer scoping. Previously the warning fired spuriously for admins who intentionally use only the new issuers config. * fix(jwt,mcp): clarify issuers fallthrough + add TTL on mcp permission cache - LiteLLM_JWTAuth.issuers docs now state explicitly that unlisted issuers fall back to the global JWT_AUDIENCE/JWT_ISSUER path; the field is additive routing, not an allow-list. Matches actual control flow in handle_jwt.auth_jwt and the regression tests asserting backwards compatibility with the global JWKS path. - MCPRequestHandler._get_{org,agent}_object_permission now pass ttl=DEFAULT_MANAGEMENT_OBJECT_IN_MEMORY_CACHE_TTL on async_set_cache, mirroring the auth_checks.py pattern so the cache TTL is explicit on both DualCache layers. * fix(tests): align merged JWT and MCP cold-start assertions Update the tests carried over from PR #28008 to match the assertions on the staging branch: - tests/test_litellm/proxy/auth/test_handle_jwt.py: unknown issuers now fall back to the legacy JWT_PUBLIC_KEY_URL path (per litellm_feat/v1.84.0-mcp-gateway-jwt-auth's '\''fall back to global JWKS on unknown issuer'\''), and mapped issuer claims that are absent no longer fail closed — they simply leave the normalised LiteLLM internal claim absent. - tests/test_litellm/proxy/_experimental/mcp_server/auth/test_user_api_key_auth_mcp.py: the aggregate '\''/mcp'\'' route still triggers the delegate-auth-to-upstream lookup once for the header-supplied server name; cold-start admission must NOT fire on top of that. Tighten the assertion to assert_called_once_with so a future regression that re-enters cold-start is caught. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(jwt): guard litellm_jwtauth access in auth_jwt global path JWTHandler() can be constructed without update_environment() being called (tests do this directly), in which case self.litellm_jwtauth does not exist. Accessing it raises AttributeError before getattr can fall back. Use the same safe pattern other call sites use. * Gate MCP OAuth pass-through on delegate_auth_to_upstream flag Sameer's review on #28356/#28008 flagged that the new pass-through behaviors (preemptive 401 challenges, /.well-known/oauth-protected- resource proxying, upstream 401/403 propagation as MCPUpstreamAuthError, and Authorization-stripping when no x-litellm-api-key is supplied) were implicitly enabled for every server with auth_type=none plus Authorization in extra_headers. Existing users doing static bearer pass-through for non-OAuth reasons would have silently regressed. Make the detection rule explicit: extend the existing delegate_auth_to_upstream flag (previously oauth2-only) to also gate is_oauth_passthrough. Now requires flag + auth_type=None + Authorization in extra_headers, per Sameer's suggested detection rule. The UI toggle now appears for both modes (oauth2 PKCE passthrough and auth_type=none OAuth pass-through) with mode-appropriate copy. Update test fixtures to set the flag where the test intent is to exercise OAuth pass-through behavior, and add negative tests covering the new default-false case. * fix(mcp): route org object_permission lookup through shared auth helpers Replace the bespoke litellm_organizationtable.find_unique + dedicated cache key in _get_org_object_permission with get_org_object + get_object_permission so MCP requests share the same user_api_key_cache entries as the rest of the proxy and no longer fragment org-row caching. * fix(mcp): wrap get_object_permission call in shared try/except Ensure exceptions from get_object_permission in _get_org_object_permission are caught and return None, preserving the original fail-safe semantics. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(jwt): validate issuer audience at config load + dedicated key-miss exception - Move JWTIssuerConfig audience-required guard into a Pydantic model_validator so misconfiguration fails at startup instead of on the first request. - Replace the string-match `No matching public key found` filter in get_public_key's multi-URL fallback with a dedicated NoMatchingJWTPublicKeyError; only that specific exception triggers continuation, every other error still surfaces. * fix(mcp): admit and forward Authorization for passthrough OAuth return For pass-through MCP servers (auth_type=none with delegate_auth_to_upstream) the RFC 9728 cold-start flow sends the client back with only "Authorization: Bearer <upstream-token>" after upstream OAuth discovery. Previously this path 1) was rejected in process_mcp_request because the oauth2_headers fallback only covered auth_type=oauth2 targets, and 2) had the Authorization header stripped by _prepare_mcp_server_headers when no x-litellm-api-key was present, treating the upstream token as a potential LiteLLM key leak. - Extend the elif oauth2_headers fallback to also admit anonymously when every target is a pass-through server. - Pass user_api_key_auth into _prepare_mcp_server_headers so it can forward Authorization for pass-through servers when admission did not consume the bearer as a LiteLLM key (api_key is unset). Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): consistent www-authenticate casing + SSE toolset scoping - Normalize the WWW-Authenticate header key emitted by _check_passthrough_upstream_auth to lowercase to match the other 401 emitters in the OAuth pass-through flow. - Mirror the streamable HTTP handler's toolset scoping in handle_sse_mcp: strip client-supplied x-mcp-toolset-id and apply _apply_toolset_scope before _check_passthrough_upstream_auth so the upstream probe list is derived from the fully-authorized server set. - Tighten _has_client_supplied_mcp_auth signature so mcp_server_auth_headers is Optional, matching its caller in process_mcp_request. Co-authored-by: Yassin Kortam <yassin@berri.ai> * security(mcp): strip Authorization in call_tool when LiteLLM admission used legacy header Mirror the OAuth pass-through admission check from _prepare_mcp_server_headers (list-tools path) in _call_regular_mcp_tool (tool-call path): when the server is OAuth pass-through and the caller did not supply x-litellm-api-key, Authorization on the inbound request may itself be the LiteLLM API key — so strip it before forwarding instead of leaking the gateway credential upstream. When x-litellm-api-key is present, admission is unambiguous and Authorization continues to carry the upstream OAuth bearer (transparent pass-through). * refactor(mcp): centralize caller Authorization strip decision Extracted the security-sensitive logic that decides whether the caller's Authorization header is forwarded to (or stripped from) an outgoing MCP request into a single helper, _should_strip_caller_authorization, in mcp_server_manager.py. Previously the same condition was duplicated across _call_regular_mcp_tool (mcp_server_manager.py) and _prepare_mcp_server_headers (server.py). Keeping two copies of this check risked future divergence and credential-leak / broken-passthrough bugs. Both call sites now share the helper, preserving exact behavior. Co-authored-by: Yassin Kortam <yassin@berri.ai> * log MCP OAuth discovery diagnostics for unmatched paths and non-transport upstream errors * fix(jwt): include issuer-normalized team id in get_all_jwt_team_ids The aggregator for team IDs only consulted the issuer-normalized claim for the plural (team_ids) path and fell back to the global config for the singular path. When an operator configures team_id_jwt_field only at the issuer level, get_team_id correctly returned the mapped value but get_all_jwt_team_ids silently dropped it, causing membership reconciliation to disagree with request routing. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp/jwt): dedupe cold-start path parser; reject conflicting audience flags - _parse_mcp_server_names_from_path now delegates to MCPRequestHandler._extract_target_server_names_from_path so the names used by the cold-start passthrough bypass cannot drift from the names used by downstream routing. - JWTIssuerConfig now rejects the combination of audience and disable_audience_validation=True at validation time instead of silently ignoring the flag. * fix(mcp): restrict passthrough cold-start bypass to 401 only The new elif passthrough cold-start branch reused is_auth_error which matches both 401 and 403. A 403 from user_api_key_auth indicates the LiteLLM key WAS recognized but is forbidden (e.g. over budget / rate limited); falling through to anonymous UserAPIKeyAuth() in that case bypasses spend and rate-limit controls on passthrough servers. Only trigger the cold-start anonymous admission on 401, which is the signal that the bearer is an upstream OAuth token rather than a recognized LiteLLM key. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(jwt/mcp): warn on unscoped JWT fallback; route agent permission lookup through shared helper - _build_decode_kwargs no longer suppresses the unscoped-fallback warning when LiteLLM_JWTAuth.issuers is set: tokens whose iss does not match any configured issuer still fall through to the global path, and that fallback is itself unscoped when JWT_AUDIENCE/JWT_ISSUER are absent. - _get_agent_object_permission now caches the agent_id -> object_permission_id mapping and delegates the permission lookup to the shared get_object_permission helper, so the agent path reuses the same cache entries as the org / team / key paths. * fix(mcp): fabricate resource_metadata challenge when upstream 401 omits WWW-Authenticate When an upstream pass-through MCP server returns 401 without a WWW-Authenticate header (non-compliant per RFC 7235 §3.1), to_http_exception() now produces a synthetic Bearer challenge pointing at the gateway's standard-pattern oauth-protected-resource well-known endpoint for that server. This keeps MCP clients on the RFC 9728 discovery flow instead of receiving a bare 401 with no recovery hint. * fix(jwt): make _get_decode_options explicitly control verify_iss Previously, _get_decode_options only set verify_aud based on whether audience was provided. The issuer JWT path relied on always passing issuer=issuer_config.issuer to trigger PyJWT's default verify_iss=True, making the helper's behavior implicitly dependent on caller behavior. Now _get_decode_options accepts issuer as well, mirroring the verify_aud handling and matching the dimensions handled by _build_decode_kwargs. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): emit absolute resource_metadata URI in fabricated 401 challenge Per RFC 9728 §3.2 the resource_metadata Bearer challenge must be an absolute URI; strict MCP clients reject relative URIs and fail to initiate discovery. MCPUpstreamAuthError.to_http_exception now accepts the gateway base URL and prepends it when the upstream omitted WWW-Authenticate, and all four call sites (streamable HTTP, SSE, and the two REST tool-list paths) supply it. * fix(mcp): correct 403 detail text and remove dead _list_tools_for_single_server duplicate - MCPUpstreamAuthError.to_http_exception() now returns detail='Forbidden' for 403 upstream responses (and 'Unauthorized' for 401), matching the _check_passthrough_upstream_auth pre-flight probe. - Remove the shadowed first definition of _list_tools_for_single_server in rest_endpoints.py; the second definition was the live one and the dead copy was a maintenance trap. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: address potential bugs in auth_utils, mcp discoverable endpoints, and mcp auth - auth_utils.get_request_route: return '/' instead of empty string when raw_path exactly equals root_path so downstream route allowlist checks still see a leading slash - discoverable_endpoints.fetch_upstream_oauth_protected_resource: also cache negative results (no upstream metadata) for a shorter TTL so we don't re-fetch on every discovery request and so the per-key fetch lock can be pruned - user_api_key_auth_mcp: guard the oauth2_headers 401 cold-start passthrough bypass with _has_client_supplied_mcp_auth, matching the parallel bypass in the no-Authorization branch so MCP-auth-bearing requests don't silently downgrade to anonymous admission Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(vertex): tolerate transient InternalServerError in google maps tool test test_gemini_google_maps_tool_simple makes live calls to Vertex AI's Google Maps grounding backend, which intermittently returns 500 INTERNAL ("Please retry") — a transient upstream failure, not a LiteLLM bug. The test already passes on RateLimitError; treat InternalServerError the same way so transient Vertex-side failures don't fail CI. * refactor(mcp): drop redundant has_client_credentials filter on passthrough probe is_oauth_passthrough already requires auth_type in (None, MCPAuth.none), which is mutually exclusive with has_client_credentials (auth_type == MCPAuth.oauth2), so the extra guard was always True and only added confusion about whether a server could be both passthrough and M2M. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: restore unreachable InternalServerError skip handler in vertex test Co-authored-by: Yassin Kortam <yassin@berri.ai> * feat(mcp): add dedicated oauth_passthrough flag for non-oauth2 pass-through Previously is_oauth_passthrough reused delegate_auth_to_upstream — a flag scoped to oauth2 servers (PKCE bypass) — to gate OAuth pass-through for auth_type=none servers. Overloading it risked regressing existing deployments that set delegate_auth_to_upstream, since the same flag would silently start driving pass-through (discovery proxying, 401 challenges, upstream 401/403 propagation) on non-oauth2 servers. Introduce a separate oauth_passthrough opt-in so the two behaviors never imply each other: - MCPServer.is_oauth_passthrough now requires oauth_passthrough (not delegate_auth_to_upstream). - Persist oauth_passthrough on LiteLLM_MCPServerTable (new column + migration) and wire it through config/DB load and API responses. - UI splits the single toggle into two: "Delegate auth to upstream (PKCE passthrough)" for oauth2 and "OAuth pass-through" for auth_type=none servers forwarding Authorization. Adds backend tests (property, round-trip, and a regression guard that delegate_auth_to_upstream alone never enables pass-through) and UI tests for the toggle split. * fix(mcp): reconcile cold-start bypass with x-mcp-servers header and skip non-absolute WWW-Authenticate fabrication - _parse_mcp_server_names_from_path now fails closed when the x-mcp-servers header introduces any target not present in the path-derived target set, closing a header/path mismatch where the cold-start passthrough bypass could otherwise admit anonymously while the header advertises a non-passthrough server. - MCPUpstreamAuthError.to_http_exception no longer emits a relative resource_metadata URI when base_url is missing; per RFC 9728 3.2 the URI must be absolute, so we skip fabrication entirely rather than send a challenge strict MCP clients will reject. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): fabricate path-aware resource_metadata URI for upstream 401 When MCPUpstreamAuthError.to_http_exception fabricates a `WWW-Authenticate: Bearer resource_metadata=...` challenge (because the upstream 401 omitted one), the URL now matches the inbound MCP transport pattern the client originally used: - /mcp/{server_name} -> /.well-known/oauth-protected-resource/mcp/{server_name} - /{server_name}/mcp -> /.well-known/oauth-protected-resource/{server_name}/mcp This mirrors the path-aware behaviour of _get_passthrough_resource_metadata_url in server.py so strict RFC 9728 \xA73.2 clients on legacy routes get a resource_metadata URI aligned with the resource pattern they originally targeted. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(jwt+mcp): tighten issuer-scoped claim type handling, RFC-quote authorization_uri, surface MCP upstream auth errors, defense-in-depth on decode options - handle_jwt: when an issuer-scoped _litellm_team_ids claim exists but has an unexpected type, return [] instead of falling through to the global team_ids_jwt_field path (different claim semantically). - handle_jwt: _get_decode_options/_decode_jwt_with_public_key now take an explicit disable_audience_validation flag; passing audience=None without it raises, so audience checks can't silently disappear if the model validator is ever bypassed. _auth_jwt_with_issuer forwards the flag from JWTIssuerConfig. - mcp_server: quote the authorization_uri WWW-Authenticate parameter value (RFC 6750 / 9728 auth-param must be quoted-string), matching the pass-through path. - mcp_server: in _fetch_and_filter_server_tools, re-raise MCPUpstreamAuthError so the outer streamable-HTTP handler can surface a proper 401 + WWW-Authenticate challenge instead of returning an empty tool list. Co-authored-by: Yassin Kortam <yassin@berri.ai> * chore(docker): align Dockerfile.non_root/Dockerfile.database to current wolfi-base SHA The older sha256:3258be... pin has been intermittently returning 500/not-found from cgr.dev, breaking the test-server-root-path GitHub Action and the build_docker_database_image CircleCI job. Move both Dockerfiles onto the same sha256:31da65... digest already in use by Dockerfile, gateway/Dockerfile, backend/Dockerfile, and migrations/Dockerfile so the base image is consistent across the repo. * ci(docker): bump wolfi-b…

Sameerlite and others added 13 commits May 11, 2026 10:42

Fix reasoning summary alias stripping

eed6985

Fix GPT-5 reasoning summary alias stripping

0ac923c

Fix reasoningSummary for gpt-5 series as well

22e9fd1

Fix greptile issue

e74329d

Simplify GPT-5 responses bridge condition

57ed2da

Fix GPT-5 reasoning summary strip test path

1628886

dummy change

7524c40

fix black and github mock test

aa1f57f

Preserve reasoning summary without effort

b150816

Merge pull request #27658 from BerriAI/litellm_internal_staging

79618b1

merge main

Merge pull request #27618 from BerriAI/litellm_reasoning_summary_chat…

5833d3e

…_bridge fix(openai): route reasoningSummary for gpt-5.4+ chat without tools to Responses API

greptile-apps Bot reviewed May 11, 2026

View reviewed changes

Comment thread litellm/litellm_core_utils/audio_utils/utils.py

Comment thread litellm/litellm_core_utils/audio_utils/utils.py

greptile-apps Bot reviewed May 11, 2026

View reviewed changes

Comment thread litellm/litellm_core_utils/audio_utils/utils.py

oss-pr-review-agent-shin Bot changed the base branch from litellm_internal_staging to litellm_agent_oss_staging_05_11_2026 May 11, 2026 19:35

oss-pr-review-agent-shin Bot merged this pull request into BerriAI:litellm_agent_oss_staging_05_11_2026 May 11, 2026
42 checks passed

krrish-berri-2 mentioned this pull request May 12, 2026

chore: reject bare str at file-input sinks to prevent local-file read #27762

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: reject bare str at file-input sinks to prevent local-file read#27667

chore: reject bare str at file-input sinks to prevent local-file read#27667
oss-pr-review-agent-shin[bot] merged 13 commits into
BerriAI:litellm_agent_oss_staging_05_11_2026from
stuxf:chore/file-input-reject-bare-str

stuxf commented May 11, 2026

Uh oh!

codecov Bot commented May 11, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 11, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

stuxf commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

oss-pr-review-agent-shin Bot commented May 11, 2026

Uh oh!

stuxf commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

stuxf commented May 11, 2026

Type

Changes

Compatibility

Test Plan

Commits

Uh oh!

codecov Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

stuxf commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

oss-pr-review-agent-shin Bot commented May 11, 2026

Uh oh!

stuxf commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 11, 2026 •

edited

Loading

greptile-apps Bot commented May 11, 2026 •

edited

Loading