fix(responses): forward timeout on completion transformation path (Anthropic, Bedrock, Vertex)#28133
Conversation
responses() consumes timeout as a named param, so it is not in **kwargs. The completion transformation path (Anthropic, Bedrock, Vertex etc.) only spread **kwargs, silently dropping timeout. The native Responses API path already forwarded timeout explicitly. Effect: Router(timeout=N) was a no-op for Anthropic and other providers without a native BaseResponsesAPIConfig — calls fell back to the provider SDK default (~600s for Anthropic). This is the same class of bug as BerriAI#22544 (metadata not forwarded).
Greptile SummaryThis PR fixes
Confidence Score: 5/5This PR is safe to merge — it adds a single missing keyword argument to one call site and cleans up dead code that was never executed. The change is minimal and targeted: one new argument forwarded on the completion transformation path, matching the identical pattern used on every other path in the same function. The dead-code removal in the handler has no runtime effect. The new mock test provides direct regression coverage without touching existing assertions. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/responses/main.py | Adds timeout=timeout or request_timeout to the completion transformation path call, mirroring the identical pattern already used on all other handler paths in this file. |
| litellm/responses/litellm_completion_transformation/handler.py | Removes 4 lines of dead code (completion_args dict that was built but never passed to litellm.completion); the call site already used **litellm_completion_request, **kwargs directly. |
| tests/llm_responses_api_testing/test_anthropic_responses_api.py | Adds a mock-only regression test asserting timeout is forwarded through aresponses() to acompletion() on the completion transformation path; no real network calls. |
Reviews (2): Last reviewed commit: "chore(responses): remove unused completi..." | Re-trigger Greptile
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
🤖 litellm-agent: This PR is currently BLOCKED from merge. Score: 4/5 ❌ Why blocked:
Details: Score docked for: 1 unresolved reviewer concern (greptile). Fix the issues above and push an update — the bot will re-review automatically.
|
The completion_args dict was built in the sync response_api_handler path but never used — litellm.completion() is called directly with **litellm_completion_request and **kwargs. Flagged by greptile on BerriAI#28133 as a follow-up cleanup; removing here since it's a 3-line diff in the same neighborhood as the timeout fix.
d85e4b4
into
BerriAI:shin_agent_oss_staging_05_18_2026
|
🤖 litellm-agent: Squash-merged into staging branch Triage Summary Merge Confidence: 5/5 ✅ READY All checks green. Greptile 5/5, no blocking pattern findings, no CircleCI runs (OSS-typical). |
* feat: add Xiaomi MiMo-V2.5-Pro and MiMo-V2.5 OpenRouter model entries (#27700) Squash-merged by litellm-agent from TorvaldUtne's PR. * fix(ui): trim whitespace from MCP inspector tool call inputs (#28203) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * gemini-3.1-flash-lite pricing (#27933) * feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> * fix: incorrect /v1/agents request example (#28131) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge (#28201) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks). Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks. Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash). * test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models. * test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop). * feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite (#28280) Squash-merged by litellm-agent from ro31337's PR. * fix(router): wrap aresponses streaming iterator for mid-stream fallbacks (#28215) Squash-merged by litellm-agent from cwang-otto's PR. * fix(router): unblock staging — mypy + coverage for aresponses streaming fallback (#28318) Squash-merged by litellm-agent from cwang-otto's PR. * fix(responses): forward timeout on completion transformation path (Anthropic, Bedrock, Vertex) (#28133) Squash-merged by litellm-agent from cwang-otto's PR. * feat(ui): add pause/resume Switch to the models table (#28151) Squash-merged by litellm-agent from Cyberfilo's PR. * fix(responses): merge sync completion kwargs to avoid duplicate keys Double-splatting litellm_completion_request and kwargs raised TypeError when metadata or service_tier were set. Match the async merge pattern. Co-authored-by: Cursor <cursoragent@cursor.com> * Use proxy base URL for CLI SSO form action (#28271) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest Mistral rotated the 'mistral/mistral-tiny' alias to return 'ministral-8b-2512' as the response model, which was missing from the cost map. This caused test_completion_mistral_api and test_completion_mistral_api_modified_input to fail in litellm.completion_cost lookup. - Add mistral/ministral-8b-2512 entry to both the in-tree model_prices_and_context_window.json and the bundled litellm/model_prices_and_context_window_backup.json (mirrors the existing openrouter/mistralai/ministral-8b-2512 pricing). - litellm.model_cost is loaded at import time from the URL pinned to main, so the new backup entry isn't visible at test runtime until it also lands on main. Backfill any entries missing from the remote-fetched map into litellm.model_cost in the local_testing conftest so cost-calculator lookups succeed on this branch. * fix(tests): drop unnecessary del of conftest backfill loop vars * fix(router): harden streaming fallback wrapper for bridge iterators - FallbackResponsesStreamWrapper now uses getattr fallbacks when copying attributes from the source iterator. The bridge path (LiteLLMCompletionStreamingIterator used by Anthropic/Bedrock/Vertex) does not call super().__init__ and is missing response, logging_obj (it uses litellm_logging_obj), responses_api_provider_config, start_time, request_data, call_type, and _hidden_params. Previously, wrapper construction raised AttributeError for any streaming fallback on the bridge path. - _aresponses_with_streaming_fallbacks now deep-copies the litellm_metadata (and metadata) dicts into fallback_kwargs. The primary attempt mutates this dict in place via _update_kwargs_with_deployment, so a shallow copy of kwargs was leaking primary-deployment fields (deployment, model_info, api_base) into the mid-stream fallback request. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(router): use safe_deep_copy for fallback metadata snapshot The ban_copy_deepcopy_kwargs CI check rejects copy.deepcopy() on any variable whose name contains 'kwargs' (incl. fallback_kwargs). Swap the two copy.deepcopy(fallback_kwargs[...]) calls for safe_deep_copy, which handles non-picklable values (OTEL spans, etc.) by per-key deepcopy with fallback to the original reference. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(ci): skip chronically flaky build_and_test integration tests Both tests have been failing on every recent run of build_and_test against this PR's HEAD (1686967, 1688402, 1689993, 1690877), and the same two tests also fail intermittently on unrelated commits and other branches, independent of any code change in this PR (which only touches router fallback wrappers, the Anthropic Responses bridge, and unrelated UI/cost-map files). - tests.test_spend_logs.test_spend_logs: /spend/logs?request_id=... returns 500 even after a 20s wait for the spend log to be written. Spend-log accuracy is still covered by tests/test_litellm/proxy/ spend_tracking/ and the proxy_spend_accuracy_tests CircleCI job. - tests.test_team_members.test_add_multiple_members: /team/info?team_id= ... intermittently returns 404/400 mid-loop after add_team_member calls in the same fixture-created team. Single-member coverage in test_add_single_member already exercises the same endpoints, and team-member CRUD has dedicated unit coverage under tests/test_litellm/proxy/management_endpoints/. Skipping unblocks the build_and_test job until the underlying race in the dockerized integration setup is root-caused. * fix: preserve explicit timeout=0 in responses API handler Use 'timeout if timeout is not None else request_timeout' instead of 'timeout or request_timeout' so an explicit timeout=0/0.0 isn't silently replaced by the default request_timeout. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(ui): guard model_info access in pause Switch with optional chaining * fix(ui): guard model_info access in pause Switch onChange handler Mirror the optional-chaining guard already applied to the isPausing check so a config-model row with a missing model_info cannot throw when the toggle's onChange fires. --------- Co-authored-by: TorvaldUtne <78661304+TorvaldUtne@users.noreply.github.com> Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com> Co-authored-by: cwang-otto <chengxuan.wang@ottotheagent.com> Co-authored-by: Roman Pushkin <roman.pushkin@gmail.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: boarder7395 <37314943+boarder7395@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>
* feat: add Xiaomi MiMo-V2.5-Pro and MiMo-V2.5 OpenRouter model entries (BerriAI#27700) Squash-merged by litellm-agent from TorvaldUtne's PR. * fix(ui): trim whitespace from MCP inspector tool call inputs (BerriAI#28203) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * gemini-3.1-flash-lite pricing (BerriAI#27933) * feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> * fix: incorrect /v1/agents request example (BerriAI#28131) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge (BerriAI#28201) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge Issue BerriAI#28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in BerriAI#25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks). Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks. Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash). * test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort Per Greptile feedback on PR BerriAI#28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models. * test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize Per @greptile-apps inline review on PR BerriAI#28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop). * feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite (BerriAI#28280) Squash-merged by litellm-agent from ro31337's PR. * fix(router): wrap aresponses streaming iterator for mid-stream fallbacks (BerriAI#28215) Squash-merged by litellm-agent from cwang-otto's PR. * fix(router): unblock staging — mypy + coverage for aresponses streaming fallback (BerriAI#28318) Squash-merged by litellm-agent from cwang-otto's PR. * fix(responses): forward timeout on completion transformation path (Anthropic, Bedrock, Vertex) (BerriAI#28133) Squash-merged by litellm-agent from cwang-otto's PR. * feat(ui): add pause/resume Switch to the models table (BerriAI#28151) Squash-merged by litellm-agent from Cyberfilo's PR. * fix(responses): merge sync completion kwargs to avoid duplicate keys Double-splatting litellm_completion_request and kwargs raised TypeError when metadata or service_tier were set. Match the async merge pattern. Co-authored-by: Cursor <cursoragent@cursor.com> * Use proxy base URL for CLI SSO form action (BerriAI#28271) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest Mistral rotated the 'mistral/mistral-tiny' alias to return 'ministral-8b-2512' as the response model, which was missing from the cost map. This caused test_completion_mistral_api and test_completion_mistral_api_modified_input to fail in litellm.completion_cost lookup. - Add mistral/ministral-8b-2512 entry to both the in-tree model_prices_and_context_window.json and the bundled litellm/model_prices_and_context_window_backup.json (mirrors the existing openrouter/mistralai/ministral-8b-2512 pricing). - litellm.model_cost is loaded at import time from the URL pinned to main, so the new backup entry isn't visible at test runtime until it also lands on main. Backfill any entries missing from the remote-fetched map into litellm.model_cost in the local_testing conftest so cost-calculator lookups succeed on this branch. * fix(tests): drop unnecessary del of conftest backfill loop vars * fix(router): harden streaming fallback wrapper for bridge iterators - FallbackResponsesStreamWrapper now uses getattr fallbacks when copying attributes from the source iterator. The bridge path (LiteLLMCompletionStreamingIterator used by Anthropic/Bedrock/Vertex) does not call super().__init__ and is missing response, logging_obj (it uses litellm_logging_obj), responses_api_provider_config, start_time, request_data, call_type, and _hidden_params. Previously, wrapper construction raised AttributeError for any streaming fallback on the bridge path. - _aresponses_with_streaming_fallbacks now deep-copies the litellm_metadata (and metadata) dicts into fallback_kwargs. The primary attempt mutates this dict in place via _update_kwargs_with_deployment, so a shallow copy of kwargs was leaking primary-deployment fields (deployment, model_info, api_base) into the mid-stream fallback request. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(router): use safe_deep_copy for fallback metadata snapshot The ban_copy_deepcopy_kwargs CI check rejects copy.deepcopy() on any variable whose name contains 'kwargs' (incl. fallback_kwargs). Swap the two copy.deepcopy(fallback_kwargs[...]) calls for safe_deep_copy, which handles non-picklable values (OTEL spans, etc.) by per-key deepcopy with fallback to the original reference. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(ci): skip chronically flaky build_and_test integration tests Both tests have been failing on every recent run of build_and_test against this PR's HEAD (1686967, 1688402, 1689993, 1690877), and the same two tests also fail intermittently on unrelated commits and other branches, independent of any code change in this PR (which only touches router fallback wrappers, the Anthropic Responses bridge, and unrelated UI/cost-map files). - tests.test_spend_logs.test_spend_logs: /spend/logs?request_id=... returns 500 even after a 20s wait for the spend log to be written. Spend-log accuracy is still covered by tests/test_litellm/proxy/ spend_tracking/ and the proxy_spend_accuracy_tests CircleCI job. - tests.test_team_members.test_add_multiple_members: /team/info?team_id= ... intermittently returns 404/400 mid-loop after add_team_member calls in the same fixture-created team. Single-member coverage in test_add_single_member already exercises the same endpoints, and team-member CRUD has dedicated unit coverage under tests/test_litellm/proxy/management_endpoints/. Skipping unblocks the build_and_test job until the underlying race in the dockerized integration setup is root-caused. * fix: preserve explicit timeout=0 in responses API handler Use 'timeout if timeout is not None else request_timeout' instead of 'timeout or request_timeout' so an explicit timeout=0/0.0 isn't silently replaced by the default request_timeout. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(ui): guard model_info access in pause Switch with optional chaining * fix(ui): guard model_info access in pause Switch onChange handler Mirror the optional-chaining guard already applied to the isPausing check so a config-model row with a missing model_info cannot throw when the toggle's onChange fires. --------- Co-authored-by: TorvaldUtne <78661304+TorvaldUtne@users.noreply.github.com> Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com> Co-authored-by: cwang-otto <chengxuan.wang@ottotheagent.com> Co-authored-by: Roman Pushkin <roman.pushkin@gmail.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: boarder7395 <37314943+boarder7395@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>
* [Refactor] UI - Spend Logs: consolidate filter state and extract components (#25847)
* [Refactor] UI - Spend Logs: consolidate filter state, extract components, remove dead code
- Lift filter state into index.tsx and pass to hook (removes selectedX vars + sync useEffect)
- Move main useQuery into useLogFilterLogic hook (removes isMainQueryEnabled toggle)
- Delete dead RequestViewer component (300 lines, replaced by LogDetailsDrawer)
- Extract LogsTableToolbar component (search, date range, pagination, live tail)
- Extract filter options config to filter_options.ts
- Remove dead code: handleRefresh, handleSelectLog, handleCloseDrawer, formatTimeUnit,
showFilters/showColumnDropdown state, dropdownRef/filtersRef
* Fix PR feedback: use antd Switch instead of Tremor in new file, fix typo
* Collapse dual-path filtering into single React Query
All 10 filter keys now go through the useQuery — the imperative
performSearch / debouncedSearch / backendFilteredLogs path is deleted.
Filter values are debounced via useDebouncedValue(300ms) before hitting
the query key so text inputs don't fire per-keystroke.
Removed: performSearch, debouncedSearch, backendFilteredLogs,
lastSearchTimestamp, hasBackendFilters, clientDerivedFilteredLogs,
the sort/page/time refetch useEffect, and the filteredLogs chooser memo.
* Clean up remaining smells: remove isFetchingDeferred, internalize selectedTimeInterval, fix circular import
- Remove useDeferredValue/isButtonLoading — pass logsQuery.isFetching directly
- Move selectedTimeInterval into LogsTableToolbar as internal state
- Move PaginatedResponse type from index.tsx to log_filter_logic.tsx
* Fix quick-select dropdown overlapping sidebar
* Fix stale quick-select label after Reset Filters
Move selectedTimeInterval back to parent so handleFilterReset can
reset it to the 24-hour default. The toolbar receives it as a prop.
* refactor useLogFilterLogic tests for controlled-hook + backend-query shape
The hook no longer owns filter state or does client-side filtering — it
receives filters/setFilters as props and drives filteredLogs from a
useQuery over uiSpendLogsCall. Reshape the tests around that contract:
introduce a controlled harness that owns filter state, collapse the 10
per-filter assertions into a single it.each over filterKey → API param,
and drop the client-side passthrough tests (the .min test file and the
"return all logs when no filters" / "empty when logs null" cases) that
no longer correspond to any hook behavior.
* cover new useLogFilterLogic invariants: activeTab gate, filterByCurrentUser fallback, debounce negative, partial merge
Follow-up to the test refactor. Adds coverage for invariants the
refactored hook contract introduced but that the first pass didn't
assert:
- query enablement: expand the single accessToken-null case into an
it.each over all four credential props (accessToken, token, userRole,
userID), plus a separate test for activeTab !== "request logs"
- filterByCurrentUser: when true with a blank User ID filter, the
outbound request carries user_id = userID
- debounce: also assert the negative case — no call in the first 100ms
after a filter change (first waiting out the initial mount fire)
- handleFilterChange: partial updates merge without clobbering other
filter keys (protects the spread + default-fill semantics)
- handleFilterReset: calls setCurrentPage(1) alongside restoring
filters
* fix typo dropping the live-tail banner border
Tailwind silently ignores unknown classes, so border-greem-200 was
leaving the auto-refresh banner with only its bg-green-50 fill and no
outline.
* memoize columns and derived table data in SpendLogsTable
The table's columns array, four-pass data pipeline, and sort-change
handler were all being rebuilt on every parent render. That made every
filter click re-instance all 23 TanStack-Table columns, re-run
filter/reduce/map over all rows, and recreate per-row click closures —
all before the intentional 300ms debounce timer even got a chance to
fire.
Local measurement (40 rows, dev mode):
filter click → query fires: 1957ms → 1217ms (−38%)
Wrap createColumns in useMemo keyed on sortBy/sortOrder, hoist
onSortChange into a useCallback, and move the searchedLogs /
sessionComposition / sessionRepresentativeMap / filteredData derivations
into a single useMemo keyed on filteredLogs.data + searchTerm.
These were pre-existing issues on main — not regressions from the
hook refactor — but the refactor made them user-visible because the
new query debounce put render cost on the critical path.
* apply dropdown filters instantly, debounce only text inputs
Dropdown selects now bypass the 300ms debounce so a click updates the
table immediately. Text inputs (Key Hash, Error Message, Request ID,
User ID) still debounce. handleFilterReset also clears the pending
debounced value so a half-typed text filter can't re-fire after reset.
* fix(ui/spend-logs): restore lost loading/debounce behavior + cover dropped tests
Regressions from the spend-logs-view refactor:
- debounce the 'Public model / search tool' text filter (was firing a
backend query per keystroke) via TEXT_FILTER_KEYS
- restore Fetch-button smoothing through table repaint using
useDeferredValue on the rendered data (explicit staleness)
- show AntDLoadingSpinner during the auth-resolve phase instead of a
blank screen on first load
- only live-tail-poll while the tab is visible
(refetchIntervalInBackground: false)
- extract getLiveTailRefetchInterval helper for the poll decision
Tests:
- LogDetailContent: retries display (>0 / 0 / absent), overhead-absent
- log_filter_logic: regression guard that the public-model filter
debounces; getLiveTailRefetchInterval unit tests
- logs_utils: getTimeRangeDisplay quick-select window labels
* test(ui/spend-logs): cover the cold-load auth-not-ready spinner guard
Asserts SpendLogsTable shows a loading spinner (not a blank screen)
while credentials are unresolved, and renders the table once present.
* fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 (#28281)
* fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5
OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so the live audio
calls in test_stream_chunk_builder_openai_audio_output_usage and
test_standard_logging_payload_audio now hard-fail with a model-not-found
error on every PR. The error was not "openai-internal", so the except
block swallowed it and execution fell through to an unbound
completion/response (UnboundLocalError).
Switch both tests to gpt-audio-1.5, OpenAI's recommended successor
(GA, not deprecated, already present in the litellm cost map so the
response_cost assertion still resolves). Also broaden the except to
skip with the real error in the reason instead of crashing, so a
transient upstream blip can't reintroduce the UnboundLocalError.
* fix(tests): narrow audio-test skip to model-not-found, re-raise the rest
Address review feedback: an unconditional skip on any exception would
silently mask a litellm-internal regression in the audio path (broken
param transformation, serialization, bad header) instead of failing CI.
Skip only on the upstream-unavailable class (model_not_found / "does not
exist" / openai-internal) and re-raise everything else, so genuine
regressions still fail loudly. The UnboundLocalError is still fixed
because the handler either skips or raises - it never falls through.
* fix(tests): add budget_exceeded to expected Interaction status enum
Staging added budget_exceeded to the Interaction OpenAPI status enum; the staging merge into this branch picked up the spec change but not the matching test update, so test_status_enum_values failed in CI. Align the test's expected list (exact-match by design) with the live spec.
* fix(tests): mock HTTP fetch in test_img_url_token_counter
The test parameterized a live third-party image URL (blog.purpureus.net) which now 404s, causing get_image_dimensions to fall through to its base64 decode path and crash with 'not enough values to unpack' on every PR run. Mock safe_get with a tiny 1x1 PNG so the URL branch is still exercised without any network dependency.
* fix(tests): swap gpt-4o-audio-preview to gpt-audio-1.5 in test_gpt4o_audio
OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so both live tests in test_gpt4o_audio.py (test_audio_output_from_model and test_audio_input_to_model) hard-fail model_not_found on every PR. Swap the hardcoded model to OpenAI's successor gpt-audio-1.5 (same chat-completions audio surface; already in the litellm cost map). Mirror the narrowed-skip pattern from the prior audio fixes: skip on model_not_found / does-not-exist / openai-internal, re-raise everything else so genuine litellm regressions still fail CI loudly.
* chore(ci): bump versions (#28287)
* bump: version 0.4.72 → 0.4.73
* bump: version 1.86.0 → 1.87.0
* uv lock
* feat: propagate team_id and team_alias to all child OTEL spans (#28273)
- Add `_set_team_attributes_on_span` helper to stamp team_id/team_alias
onto any span, ensuring these attributes are not limited to the root
litellm_request span
- Add `_set_team_attributes_from_kwargs` helper to extract team metadata
from the standard_logging_object in kwargs and apply them to a span
- Apply team attributes to raw request spans via `_maybe_log_raw_request`
so downstream consumers can filter traces by team without needing the
root span
- Apply team attributes to guardrail spans so guardrail activity can be
correlated to teams in tracing backends
- Apply team attributes to exception logging spans to preserve team
context during failure paths
- Add comprehensive unit tests covering all new helpers, including edge
cases where metadata or standard_logging_object is absent
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
* Day 0 support : Gemini 3.5 Flash (#28268)
* Add day 0 support for gemini 3.5 flash
* Fix pricing
* Fix greptile review
* Fix failing test
* Fix tests
* Fix: revert tool removing logic
* fix greptile and test
---------
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* Gemini managed agents support (#28270)
* Add support for environment variable in interactions api
* Add sdk support for gemini create agent
* Add agents endpoint support via proxy
* Add outputs of each api
* Add routing for model and agents param
* Remove redundant condition in get_provider_agents_api_config
LlmProviders.GEMINI.value is literally the string "gemini", so the
second clause of the or was checking the exact same thing as the first.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix: forward query-param credentials to list/get/delete/versions Gemini agent endpoints
The list_gemini_agents, get_gemini_agent, delete_gemini_agent, and
list_gemini_agent_versions endpoints previously constructed a hardcoded
data dict with no mechanism to pass provider credentials. Unlike
create_gemini_agent (POST, reads litellm_params_template from body),
these GET/DELETE endpoints gave no way for multi-tenant callers to
supply a per-request api_key or other LiteLLM params.
Fix:
- Add _merge_query_params_into_data() helper that reads query parameters
from the request and merges them into the data dict without overwriting
already-set keys (e.g. path params like 'name').
- Support a JSON-encoded litellm_params_template query parameter
(matching the POST body pattern) as well as flat key=value pairs
(e.g. api_key=AIza...).
- Apply the helper in all four affected endpoints.
- Add 13 unit tests covering the helper and each endpoint.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix: pass model=None for managed agent proxy endpoints to prevent agent name polluting data["model"]
Endpoints acreate_agent, aget_agent, adelete_agent, and alist_agent_versions
were passing model=<agent_name> to base_process_llm_request. This caused
common_processing_pre_call_logic to write the agent name into self.data["model"],
which then triggered spurious model-alias mapping, rate-limiting lookups, and
logging tied to a non-existent model deployment.
The agent name is already carried in data["name"] and is passed correctly to
the SDK functions (litellm.interactions.agents.*). There is no reason to also
set model=<agent_name>; the correct value is model=None for all five managed-agent
management routes.
Adds tests/test_litellm/proxy/google_endpoints/test_managed_agents_model_param.py
to verify all five managed-agent endpoints pass model=None.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix: address greptile P1/P2 review comments
P1 (router.py): Restore fallback/retry support for acreate_interaction
and create_interaction. Both were silently moved to _init_interactions_api_endpoints
(direct call, no fallbacks). Moved them back to _ageneric_api_call_with_fallbacks
so users with configured fallback models keep retry behaviour.
P1 security (agents_endpoints.py): Remove flat query-param credential
path (e.g. ?api_key=AIza...) from _merge_query_params_into_data.
Credentials in URL query strings appear verbatim in server access logs,
CDN edge logs, and browser history. Only the JSON-encoded
litellm_params_template query param (matching the POST body pattern) is
retained.
P2 (interactions/http_handler.py): Extract _BaseHTTPHandler with shared
_handle_error, _sync_client, and _async_client helpers. InteractionsHTTPHandler
now extends _BaseHTTPHandler. The _async_client reads the provider from
litellm_params instead of hardcoding GEMINI.
P2 (interactions/agents/http_handler.py): AgentsHTTPHandler now extends
InteractionsHTTPHandler (which inherits _BaseHTTPHandler) so all shared
HTTP infrastructure is reused rather than duplicated. Removes the
hardcoded LlmProviders.GEMINI from the async client path.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address CI failures from greptile review fixes
- black: format interactions/agents/main.py and utils.py
- tests: update test_gemini_agents_endpoints.py to match new
_merge_query_params_into_data behaviour (flat credential params are
rejected; only JSON-encoded litellm_params_template is accepted)
- ci: add test_gemini_agents_endpoints.py to endpoints-and-responses
shard in test-unit-proxy-db.yml so assert-shard-coverage passes
- tests: add _initialize_managed_agents_endpoints and
_init_managed_agents_api_endpoints test coverage so router_code_coverage
passes; also fix TestRouterCreateInteractionRouting to reflect that
acreate_interaction now correctly routes through
_ageneric_api_call_with_fallbacks (restoring fallback support)
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: remove InteractionsHTTPHandler._handle_error override to fix type errors
AgentsHTTPHandler extends InteractionsHTTPHandler and calls
self._handle_error(provider_config=agents_api_config) where
agents_api_config is BaseAgentsAPIConfig. Python MRO resolved _handle_error
to InteractionsHTTPHandler._handle_error which expected BaseInteractionsAPIConfig,
causing 10 mypy arg-type errors in interactions/agents/http_handler.py.
Removing the redundant override lets both classes inherit _BaseHTTPHandler._handle_error
(provider_config: Any) which is structurally correct for both config types.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: agent-only interactions and managed agents provider routing
Resolve None custom_llm_provider in agents HTTP client lookup and set
custom_llm_provider on GenericLiteLLMParams for all agent CRUD paths.
Stop mapping agent names to proxy model routing; route interactions
through _init_interactions_api_endpoints with fallbacks only when model
is set. Consolidate duplicate router elif branches for interaction APIs.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Fix greptile review
* test(agents): add unit tests for managed agents SDK and HTTP handler
Adds coverage for the new `litellm.interactions.agents` surface area:
- main.py: sync/async entry points (create/list/get/delete/list_versions),
provider config lookup, logging-obj helper, async error wrapping
- http_handler.py: every CRUD method (sync + async paths), `_is_async`
dispatch branches, and provider error mapping through GeminiAgentsConfig
- utils.py: get_provider_agents_api_config for supported / unsupported
providers
Brings patch coverage on these files from <25% to ~100% so codecov/patch
is satisfied.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* docs(gemini-agents): fix misleading credential-passing examples in GET/DELETE docstrings (#28293)
The four GET/DELETE endpoint docstrings (list_gemini_agents,
get_gemini_agent, delete_gemini_agent, list_gemini_agent_versions)
documented passing per-request credentials as flat query parameters
(e.g. ?api_key=AIza...). However, _merge_query_params_into_data only
reads the JSON-encoded litellm_params_template query parameter and
intentionally ignores flat params (URL query strings appear verbatim
in access logs, browser history, and Referer headers).
Callers following the documented curl examples would have their
credentials silently dropped and hit auth failures against Gemini.
Update the examples to use the supported JSON-encoded
litellm_params_template query parameter, matching _merge_query_params_into_data's own docstring.
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* refactor(agents): rename provider-agnostic agent response types
Move GeminiAgent{ListResponse,DeleteResult,VersionsResponse} to
provider-neutral names (AgentListResponse, AgentDeleteResult,
AgentVersionsResponse) so the BaseAgentsAPIConfig interface no longer
references Gemini-specific type names.
* fix(gemini-agents): close veria-flagged credential-escalation gaps
Two high-severity findings from the veria-ai PR review are addressed:
1. **api_base override could leak the shared Gemini key**
GeminiAgentsConfig.validate_environment falls back to GOOGLE_API_KEY /
GEMINI_API_KEY when no api_key is supplied. Combined with caller-controlled
api_base on the proxy CRUD endpoints, an authenticated user could redirect
the outbound request to an attacker-controlled host and capture the
operator's shared Gemini key from the x-goog-api-key header. The config
now refuses env-fallback whenever api_base is explicitly overridden.
2. **Managed-agent CRUD exposed to ordinary LLM keys**
The new /v1beta/agents routes live in google_routes (i.e. llm_api_routes),
so any non-admin LLM key can reach them. Unlike /v1beta/models/...:
generateContent these endpoints are NOT model-routed and have no
model_list-supplied credentials, so env-fallback would let any LLM key
list / create / delete agents inside the operator's Gemini project. Each
endpoint now calls _enforce_caller_supplied_provider_key, which requires
non-admin callers to supply their own Gemini api_key via
litellm_params_template. Proxy admins keep the env-fallback convenience.
Tests cover non-admin rejection, admin allow-through, the api_base override
guard, and SDK env-fallback when api_base is not overridden.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(router): restore strict assert_called_once_with on interactions default-provider test
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* feat(gemini): add gemini-3.1-flash-lite model cost map (#28320)
* feat(gemini): add gemini-3.1-flash-lite model cost map entries
Co-authored-by: Cursor <cursoragent@cursor.com>
* Update model_prices_and_context_window.json
* Update source URL for model pricing information
* Sync source URL for gemini-3.1-flash-lite in backup JSON
* fix(model_cost_map): add mistral/ministral-8b-2512 entry
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which is not in the cost map.
This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
completion_cost lookup. Add the entry mirroring the existing
openrouter/mistralai/ministral-8b-2512 pricing.
* test(cost_calculator): assert output_cost_per_reasoning_token for gemini-3.1-flash-lite
* fix(tests): backfill local backup entries into runtime model_cost
litellm.model_cost is loaded from LITELLM_MODEL_COST_MAP_URL (pinned to
main) at import time, so any pricing entries added to the in-tree backup
on this branch aren't visible at test runtime until they also land on
main. The Mistral cassette currently returns model=ministral-8b-2512
and the cost-calculator lookup in test_completion_mistral_api /
test_completion_mistral_api_modified_input fails despite the entry
existing in the local backup. Backfill missing backup entries into
litellm.model_cost in the local_testing conftest so these lookups
succeed against the cassette state the branch is being tested with.
* fix(tests): guard conftest backfill against empty local cost map
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed (#27854)
* fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed
Symptom
-------
Customers on multi-pod deployments see team `spend` jump to ~2x (or N x
the pod count) shortly after a Redis cache miss / TTL expiry, triggering
spurious "Budget Crossed" alerts and blocked requests until the value is
manually reset.
Root cause
----------
`SpendCounterReseed.coalesced` warmed the primary spend counter by
calling `redis.async_increment(key, value=db_spend, refresh_ttl=True)`,
which lowers to Redis `INCRBYFLOAT`. That is additive, not idempotent.
The per-counter `asyncio.Lock` only coalesces seeders inside one
process. With N pods sharing one Redis, on a cold key (cold start, TTL
expiry, manual delete) every pod independently passes its lock + Redis
re-check, reads the same `db_spend`, and issues `INCRBYFLOAT db_spend`.
Final value: N x db_spend.
Fix
---
Use `redis.async_set_cache(key, value=db_spend, nx=True)` for the seed.
SET NX is atomic across pods: exactly one writer initializes the key;
losers read the winner's value via `async_get_cache`. This is the same
idiom already used by `coalesced_window` in the same file, so the two
seed paths are now consistent.
Per-request deltas continue to use `INCRBYFLOAT` (correct - additive
behaviour is what we want for increments, not for initial seed).
Verification
------------
Live two-process repro against the same Postgres + Redis (DB
spend = 506):
Unpatched: 4/4 runs -> Redis counter = ~1012 (~2 x db_spend)
Patched: 12/12 runs -> Redis counter = ~506
Unit tests (`test_proxy_server.py`):
- New `test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed`
patches `_get_lock` to return a fresh lock per caller (otherwise the
per-process lock masks the race), races two `coalesced` calls, and
asserts final = 506 with exactly one of two SET NX attempts winning.
- 4 existing tests updated for the new seed contract (SET NX for the
seed, INCRBYFLOAT only for the per-request delta).
- Full `spend_counter or reseed or budget` slice: 22 passed.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(spend_counter): make SET NX mock atomic so loser branch is exercised
Greptile flagged that `redis_set_cache` in
test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed
placed `await asyncio.sleep(0)` AFTER the NX membership check. Both
concurrent tasks observed an empty `redis_store`, passed the guard, and
both returned True - so the loser branch (else: read back winner's value)
was never exercised.
Fix the mock to model real atomic Redis SET NX:
- Yield BEFORE the membership check so two concurrent callers interleave
the way real SET NX does (first to resume runs check + write atomically
and wins; second resumes after the key exists and loses).
- Track set_cache return values; assert sorted([loser, winner]) so we
know exactly one task wins and one loses.
- Track async_get_cache calls that happen AFTER at least one SET NX has
completed; assert at least one such read - that is the loser-path
fallback (`current_value = float(cached)` when seeded is False).
Verified by temporarily reverting the mock to the old order: the test
now fails with `expected exactly one SET NX winner and one loser, got
[True, True]`, exactly the failure mode Greptile described.
No production code change.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(spend_counter): mock async_set_cache to populate redis_store in concurrent read+write test
`test_concurrent_read_and_write_paths_share_one_db_query` mocks
`async_increment` to populate the in-memory `redis_store`, but did not
mock `async_set_cache`. After the SET-NX seed change in `coalesced()`,
the seed step writes via `async_set_cache(nx=True)` (default AsyncMock,
no `redis_store` write), so the simulated Redis stays empty after the
first reseed. The second `get_current_spend` then sees a clean Redis
miss, re-enters the DB read path, and the test fails with
`expected 1 DB query, got 2`.
Fix: add a `redis_set_cache` side_effect that updates `redis_store` on
`nx=True` (and rejects when the key already exists), matching the
pattern used by the four sibling tests fixed in this branch's first
commit. Pre-existing assertions are unchanged.
Full `tests/test_litellm/proxy/test_proxy_server.py`: 158 passed.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): normalize batch file IDs before ManagedObjectTable write (#28339)
* fix(proxy): normalize batch file IDs before ManagedObjectTable write
Run post_call_success_hook before update_batch_in_database on retrieve/cancel,
and ensure_batch_response_managed_file_ids so file_object never stores raw
provider output_file_id or error_file_id.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): address Greptile review on batch file ID normalization
Remove redundant resolve_* calls after update_batch_in_database and rename
loop variable to avoid shadowing hidden_params unified_file_id.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which was missing from the
cost map. This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
litellm.completion_cost lookup.
- Add mistral/ministral-8b-2512 entry to both the in-tree
model_prices_and_context_window.json and the bundled
litellm/model_prices_and_context_window_backup.json (mirrors the
existing openrouter/mistralai/ministral-8b-2512 pricing).
- litellm.model_cost is loaded at import time from the URL pinned to
main, so the new backup entry isn't visible at test runtime until
it also lands on main. Backfill any entries missing from the
remote-fetched map into litellm.model_cost in the local_testing
conftest so cost-calculator lookups succeed on this branch.
* fix(tests): drop unnecessary del of conftest backfill loop vars
* fix: resolve batch response file IDs even when status unchanged
The status-unchanged early return in update_batch_in_database was
skipping ensure_batch_response_managed_file_ids, leaving raw provider
input_file_id (and other raw IDs) in the user-facing response when
polling an in-progress batch. Move the in-place file ID normalization
above the early return so the response always carries unified managed
IDs while still skipping the DB write when nothing changed.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(batches): cover ensure_batch_response_managed_file_ids branches
Add tests for the previously-uncovered paths in
ensure_batch_response_managed_file_ids: error_file_id normalization,
swallowed conversion errors, UserAPIKeyAuth fallback from
db_batch_object, model_name resolution from unified_file_id, and early
returns when managed_files_obj, model_id, or auth context are missing.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <noreply@anthropic.com>
* fix(router): use forwarded model_id for native Azure container IDs (#27921)
* fix(router): use forwarded model_id for native Azure container IDs in _init_containers_api_endpoints
Azure code-interpreter containers return provider-native IDs (cntr_ + hex)
that carry no LiteLLM routing payload, so _decode_container_id returns
model_id=None. The router was falling through to call the handler directly,
bypassing _ageneric_api_call_with_fallbacks and leaving api_base=None for
Azure deployments. Fall back to the model_id forwarded from the proxy
ownership check so deployment credentials are always applied.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(azure-containers): strip /openai/responses path from api_base in AzureContainerConfig.get_complete_url
When a deployment's api_base is the responses endpoint URL
(e.g. .../openai/responses?api-version=...), AzureContainerConfig was
appending /openai/containers on top of it, producing the broken path
.../openai/responses/openai/containers. Azure returns 404 for that URL
while the correct path is .../openai/containers.
Strip any /openai/responses suffix from api_base before constructing
the containers URL so the resource root is always used as the starting point.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(azure-containers): prefer api-version from api_base URL over deployment's api_version
The deployment's api_version (e.g. 2024-08-01-preview) targets the chat/responses
API and is too old for the containers API, which requires 2025-04-01-preview.
The responses endpoint api_base already carries the correct api-version in its
query string. Extract it and use it for the containers URL, overriding the
stale deployment-level version.
Fixes DELETE and file-upload operations returning 404 due to wrong api-version.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(containers): pass params=None instead of params={} to httpx to preserve api-version
httpx erases a URL's query-string when params={} (empty dict) is passed,
silently stripping ?api-version=2025-04-01-preview from every container
POST/DELETE request. Azure's GET endpoints tolerate a missing api-version;
POST (upload) and DELETE are strict, so those returned 404.
Fix: use `params or None` in container_handler._async_handle and
llm_http_handler.async_container_delete_handler (and all sibling container
handlers) so that an empty params dict falls back to None, leaving httpx to
preserve the URL's existing query string intact.
Adds a regression test that directly documents the httpx behaviour.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(router): remove elif model_id branch from _init_containers_api_endpoints
Two reviewer findings addressed:
1. Truncated comment on the model_id fallback line — now complete.
2. Security: the elif branch that fired when container_id was absent allowed
any authenticated caller to supply model_id in a POST /v1/containers body
and route the request through an arbitrary deployment UUID, bypassing the
model-level access checks that only validate `model`. Removed the elif
branch; operations without container_id (create, list) route by the
caller-supplied `model` field as before. model_id forwarding is kept only
inside the container_id block, where the proxy ownership check has already
validated the container before forwarding the deployment ID.
Adds a regression test pinning the security boundary: no-container-id path
calls original_function directly even when model_id is in kwargs.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(containers): validate proxy-to-router model_id forwarding for managed IDs
Add test_regression_get_container_forwarding_params_sets_model_id_for_managed_id
to verify that get_container_forwarding_params (the proxy-side half of the Azure
routing fix) correctly extracts and forwards model_id from a LiteLLM-managed
encoded container ID.
This closes the gap identified by Greptile P1: the previous regression test
only injected model_id as a direct kwarg, validating the router in isolation.
The new test exercises the actual proxy-to-router data flow through
ownership.get_container_forwarding_params, confirming that kwargs["model_id"]
is populated before _init_containers_api_endpoints is reached.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(azure-containers): tighten endpoint-path strip to endswith match
Use path.endswith() instead of path.find() for _AZURE_ENDPOINT_PATHS so
the suffix strip only fires when api_base actually ends with one of the
endpoint-specific path suffixes. This is the more precise check greptile
flagged on the original find()-based implementation.
* Fix sync container handler to preserve URL query string
Mirror the async path fix: pass None instead of an empty params dict so
httpx does not strip the URL's existing query string (e.g.
?api-version=...), which is required for Azure container routing.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(azure-containers): strip trailing slash before endpoint suffix match
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(containers): recover model_id from stored encoded id for native Azure container IDs
get_container_forwarding_params previously only set model_id when the
user-supplied container_id was a LiteLLM-managed encoded id. For native
upstream IDs (e.g. Azure 'cntr_<hex>') the decode fails and model_id was
never forwarded — making the router-side fallback in
_init_containers_api_endpoints unreachable in production.
Fall back to the stored 'unified_object_id' on the ownership row, which
is the encoded form captured at create time when the router selected a
specific deployment. Decoding that yields the deployment model_id and
restores router-based credential application (api_base, api_key) for
retrieve/delete and container-file operations on native IDs.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(ui): restore log filter loading indicator (#28282)
When a new filter is applied to spend logs, React Query's keepPreviousData
left stale rows on screen for 10–15s with no indication that a fetch was
in progress. The previous custom isFilteringResults flag was removed in
the #25847 toolbar refactor and only partially restored on the Fetch
button. Use React Query's isPlaceholderData to discriminate a real
filter change (queryKey changed, data not yet arrived) from a same-key
live-tail refetch, and feed it into the existing isLoading prop on the
toolbar pagination text and the table body. Live-tail polls still keep
previous rows without flicker.
Co-authored-by: Ryan <ryan@Ryans-MBP.localdomain>
* test(e2e): migrate runner to uv, add All Proxy Models key test (#28313)
* chore(e2e): migrate runner to uv, add All Proxy Models key test
Switches the local e2e runner (run_e2e.sh) from poetry to uv to match
the rest of the repo and CI. Adds a Playwright test for creating an
admin key with no team selected (all-proxy-models flow), a SLOWMO env
hook for headed debugging, and a MIGRATION_TRACKING.md doc that maps
the manual UI QA checklist to e2e tests so future migration work has
a single source of truth.
* chore(e2e): address greptile feedback
- Remove MIGRATION_TRACKING.md (docs belong in litellm-docs repo)
- playwright.config.ts: fall back to 0 when SLOWMO is non-numeric
(parseInt returns NaN, which Playwright accepts silently)
- run_e2e.sh: add --frozen to uv sync for CI determinism
* feat(ui): team passthrough routes create parity + edit load fix (#28098)
* feat(ui): team allowed_passthrough_routes create parity + edit load fix
Add the Allowed Pass Through Routes selector to the create-team modal
(previously only on the edit form), and fix the edit form silently
dropping the field: it lives under team metadata, so initialValues must
read info.metadata.allowed_passthrough_routes — otherwise the selector
renders empty and saving wipes admin-set routes. Both selectors are
gated to premium proxy admins, mirroring the server-side gate.
Resolves LIT-3019
* fix(ui): persist team allowed_passthrough_routes edits on save
The edit form loaded the selector but the save path never wrote it back:
allowed_passthrough_routes stayed in the raw metadata JSON textarea and
parsedMetadata (from that textarea) always won, so selector edits were
silently discarded. Strip it from the textarea initialValues and overlay
values.allowed_passthrough_routes into updateData.metadata, mirroring how
guardrails is handled.
Resolves LIT-3019
* fix(ui): preserve team passthrough routes for non-proxy-admins on save
Only proxy admins may set allowed_passthrough_routes (server-side gate).
For non-proxy-admins, write the team's stored value back into metadata
instead of the form value, so saving an unrelated setting can't silently
wipe routes; omit the key entirely when the team never had any.
Resolves LIT-3019
* fix(mcp): JWT on tools/list and REST tools/call server resolution (#28227)
* fix(mcp): JWT on tools/list, REST server_id resolution, tool_server_mismatch
Sign outbound MCP JWTs for list_mcp_tools and inject headers on the tools/list
path. Resolve server_id on /mcp-rest/tools/call and return 403 tool_server_mismatch
when the tool does not belong to the requested server. Default missing arguments to {}.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): restrict list JWTs to mcp:tools/list and default REST arguments to {}
- List-only JWTs (call_type=list_mcp_tools) no longer carry the broad
mcp:tools/call scope. _build_scope() now emits only mcp:tools/list
when no tool name is provided, mirroring the existing least-privilege
rule that tool-call JWTs omit mcp:tools/list.
- REST /tools/call now defaults a missing 'arguments' field to {} so
execute_mcp_tool() and downstream **arguments / .keys() calls don't
receive None and crash with TypeError/AttributeError.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): validate tool/server in call_tool; skip JWT signer when not configured or static auth present
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): align tests and mypy with user_api_key_auth on tools/list
Update mocks for the new _get_tools_from_server parameter, mock server
registry in REST access-denied test, and narrow static_headers for mypy.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(test): accept user_api_key_auth in get_tools_from_mcp_servers mock
The side_effect for the all-servers case did not accept the new kwarg,
so tools/list returned an empty list.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): fail fast for unknown tools when server mapping exists
Server-name fallback in call_tool must not open an upstream session when
the tool is absent from a populated mapping. Update the HTTP transport test
to register a known tool before asserting not-found behavior.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix mypy
* Fix mypy
* fix(mcp): preserve tools/call scope on missing tool name; pass user_api_key_auth in list_tools
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): match alias/server_name in _resolve_mcp_server_for_tool_call
The registry lookup in _resolve_mcp_server_for_tool_call previously only
compared candidate.name against the provided server_name, but tool name
prefixes can be derived from a server's alias or server_name (see
get_server_prefix). When the tool→server mapping is empty/stale (cold
start, dynamic tools), the lookup would fail for alias-configured
servers even though get_mcp_server_by_name (used by the REST path)
matches alias, server_name, and name.
Match the same priority of identifiers in both the registry pass and
the unprefixed fallback so the MCP protocol call_tool path is
consistent with the REST path.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): reuse proxy_logging DualCache in inject_mcp_jwt_headers_for_upstream
Instead of allocating a fresh DualCache() on every tools/list invocation,
prefer the shared proxy_logging_obj.internal_usage_cache.dual_cache when
available. The cache argument is currently unused by MCPJWTSigner, but
sharing the proxy's cache avoids per-call allocation overhead and matches
the cache identity used elsewhere in the proxy hook plumbing — so any
future per-request state stored in cache will survive across list calls.
Co-authored-by: Claude <noreply@anthropic.com>
* fix(mcp): return 403 ip_filtering for IP-restricted servers in tools/call name lookup
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(test): accept user_api_key_auth kwarg in list_tools mocks
The proxy-infra job was failing on four TestMCPServerManager tests because
the mock_get_tools_from_server stubs did not accept the new
user_api_key_auth keyword argument that list_tools now forwards to
_get_tools_from_server. Add the kwarg to each stub so list_tools can call
through cleanly.
Co-authored-by: Claude <claude@anthropic.com>
* fix(mcp): skip JWT injection when per-user mcp_auth_header is set
MCPClient._get_auth_headers() applies extra_headers AFTER writing
Authorization from auth_value, so an injected JWT silently overwrites
the user's per-server OAuth token. Guard the JWT signer with
'not mcp_auth_header' so per-user OAuth (and any dict-form per-user
auth) takes precedence, mirroring the existing static_headers guard.
Adds a regression test that the signer's inject helper is not called
when mcp_auth_header is supplied.
* fix(mcp): skip JWT injection when extra_headers already has Authorization
When a server uses per-user OAuth tokens, the resolved token is passed
into _get_tools_from_server via extra_headers. The JWT injection guard
only checked mcp_auth_header and the server's static headers, so the
signer would silently overwrite the user's OAuth Authorization header.
Add a check for an existing Authorization entry in extra_headers so
caller-supplied per-user OAuth tokens take precedence over JWT signing.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(mcp): cover JWT signer + tool-call resolution branches
Adds unit tests for the new MCPServerManager helpers (_resolve_mcp_server_for_tool_call,
_resolve_oauth2_headers_for_tool_call) and the new MCPJWTSigner paths
(_build_scope call_type branches and inject_mcp_jwt_headers_for_upstream).
Brings patch coverage above the auto target without changing behavior.
Co-authored-by: Claude <claude@anthropic.com>
* fix(mcp): retry tool-server lookup with prefixed name in REST mismatch check
When the REST /mcp-rest/tools/call path sends a raw tool name plus
requested_server_id, _get_mcp_server_from_tool_name(name) can return
None if the mapping only stores the prefixed form. That bypassed the
tool_server_mismatch 403 guard and let the call fall through to
trusting requested_server.
Retry the lookup with every known prefix of the requested server so
the mismatch check fires whenever the tool is actually registered.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): always reject unknown tools in server-name fallback
Defense-in-depth: _resolve_mcp_server_for_tool_call previously skipped
the unknown-tool check whenever the per-server mapping had no entries
yet (cold start, OAuth2 lazy listing, or upstream listing failure),
allowing arbitrary tool names to reach upstream servers.
Tighten the check so the server-name fallback always rejects tool
names not present in the mapping. Callers must call list_tools first
(standard MCP flow) before tools/call can resolve. Removes the
now-unused _mapping_has_tools_for_server helper and adds an
explicit empty-mapping rejection test alongside the existing
populated-mapping rejection test.
Co-authored-by: Sameer Kankute <sameer@berri.ai>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude (greptile subagent) <claude-greptile-bot@anthropic.com>
* feat(interactions): migrate to Google Interactions API steps schema (May 2026) (#28153)
* feat(interactions): migrate to Google Interactions API steps schema (May 2026)
Default to Api-Revision: 2026-05-20 (new `steps` schema). Add
`litellm.use_legacy_interactions_schema` global flag that sends
Api-Revision: 2026-05-07 for operators who need the legacy `outputs`
schema until June 8, 2026.
- Inject Api-Revision header in GoogleAIStudioInteractionsConfig.validate_environment()
- Auto-coalesce response_mime_type → response_format and image_config migration on new schema
- Add steps field to InteractionsAPIResponse and InteractionsAPIStreamingResponse
- Add StepStart/StepDelta/StepStop/InteractionCreated/etc. SSE event types
- Update streaming completion detection to handle interaction.completed event
- Bridge transformer populates both outputs and steps fields
- Bridge streaming iterator emits new-schema events by default
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(interactions): address greptile review feedback
- Avoid mutating caller's generation_config dict by shallow-copying
before popping image_config, preventing silent failures on retries
- Skip schema key in response_format when response_format is None to
avoid sending schema: null to the Google Interactions API
- Remove delta field from step.stop events (new schema only); the
StepStop model has no delta field and sending it duplicates already-
streamed text and breaks spec-conformant clients
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): parse use_legacy_interactions_schema string values safely
bool("false") returns True in Python, so quoted YAML values like
"false" or "False" silently activated the legacy Interactions API
schema. Match the env-var parsing pattern in litellm/__init__.py by
treating string inputs as true only when they equal "true" (case
insensitive).
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(interactions): only set object/id/delta on step.stop for legacy schema
StepStop (new schema) has no object, id, or delta fields. Setting them
unconditionally caused spec-breaking extra fields on new-schema step.stop
events in all four construction sites (sync/async × main-loop/StopIteration).
Legacy content.stop still receives id, object, and delta unchanged.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(interactions): stabilize streaming bridge schema, dict aliasing, and lost first delta
- Capture use_legacy_interactions_schema once at iterator construction so
all events emitted by a single stream use a consistent schema, even if
the global flag is mutated mid-stream.
- Check for the buffered interaction.complete/completed event before the
finished check in __next__/__anext__ so the final completion event
(which carries the full collected text in steps) is not dropped after
self.finished is set.
- Copy text content entries before appending to both outputs and the
steps content list to avoid shared mutable dict aliasing between the
two response fields.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix tests
* fix greptile review
* fix(interactions): address Greptile P1 review on schema coalescing and legacy deltas
Skip response_mime_type merge when response_format is already a list, avoid
in-place list mutation on image_config append, and restore delta.type on
legacy content.delta events.
Co-authored-by: Cursor <cursoragent@cursor.com>
* style(interactions): black-format gemini transformation.py
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <noreply@anthropic.com>
* test(ui-e2e): admin key creation with a specific proxy model (#28365)
* test(ui-e2e): add admin key creation with a specific proxy model
Adds Playwright coverage for creating a key (no team) scoped to a single
proxy model, complementing the existing All-Proxy-Models test. Uses a
DOM-dispatched click on the antd dropdown option since the popup
animation can render the option outside the viewport.
* test(ui-e2e): verify scoped key works against mock /chat/completions
Extend the "Create a key with a specific proxy model" test to extract
the new key from the success modal and POST to /chat/completions for
the scoped model, asserting 200 and the mock response body. Without
this the test could pass even if the model selection failed to register.
* fix(vertex_ai): omit function_call id on Vertex Gemini 3.5+ tool turns (#28324)
* fix(vertex_ai): omit function_call id on Vertex Gemini 3.5+ tool turns
Vertex AI rejects `id` on function_call/function_response parts; only Google AI Studio accepts it for Gemini 3.5+ strict tool matching.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Update litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* fix(vertex_ai): forward custom_llm_provider in context caching
Pass custom_llm_provider through to _gemini_convert_messages_with_history
in the context caching path so Gemini 3.5+ tool-call `id` forwarding
behaves consistently between cached and non-cached completions on Google
AI Studio.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <claude@anthropic.com>
* feat(mcp): allow native MCP OAuth support for cursor (#28327)
* feat(mcp): allow native MCP OAuth redirect URIs (cursor://)
Discoverable OAuth /authorize rejected cursor:// callbacks because
validate_trusted_redirect_uri only accepted http/https. Add an
allowlisted native path with a built-in Cursor default and optional
MCP_TRUSTED_NATIVE_REDIRECT_URIS env for other clients.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): address Greptile native redirect URI review
Lowercase paths in normalizer so env allowlist entries match case-
insensitively. Tighten wildcard prefix matching to reject sibling
paths (e.g. callback-2) unless the prefix ends with /.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): reject query params on native OAuth redirect URIs
Greptile: normalization stripped query strings before allowlist compare,
so cursor://.../callback?injected=... could pass validation. Reject any
native redirect_uri with a query component (same as fragments).
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(model_cost_map): add mistral/ministral-8b-2512 entry
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which is not in the cost map.
This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
completion_cost lookup. Add the entry mirroring the existing
openrouter/mistralai/ministral-8b-2512 pricing.
* fix(mcp): lowercase default native redirect URIs
Make _parse_trusted_native_redirect_uris apply the same lowercasing
to built-in defaults as it does to env-var entries.
* fix(tests): backfill local model_cost into remote-fetched map
litellm.model_cost is loaded at import time from the URL pinned to main,
so pricing entries that exist only in this branch (e.g.
mistral/ministral-8b-2512, freshly added because Mistral now returns this
id from mistral-tiny) are absent at test time and completion_cost lookups
raise. Backfill the in-tree backup so cassette-driven cost calculations
resolve against the entries that ship with the branch under test.
Fixes the local_testing_part1 failures on test_completion_mistral_api and
test_completion_mistral_api_modified_input.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Claude <claude@anthropic.com>
* fix(interactions): never drop streamed text deltas; always emit terminal completion (#28394)
* fix(interactions): never drop streamed text deltas; always emit terminal completion
The interactions streaming bridge had two bugs flagged by Greptile on PR #28153:
1. The first OutputTextDeltaEvent (and the second, when no ResponseCreatedEvent
precedes the deltas) was consumed to emit a synthetic interaction.created /
step.start event, but the chunk's text payload was never forwarded as a
step.delta. The text only reappeared in the terminal step.stop, which
defeats the purpose of incremental streaming.
2. When the upstream Responses API stream ended via StopIteration without a
ResponseCompletedEvent, the iterator emitted step.stop but never the
terminal interaction.completed event carrying the full collected text.
This refactors the iterator to translate each upstream chunk into a list of
events (instead of a single event) and buffers them in a deque. A text delta
now expands into [interaction.created, step.start, step.delta] on the first
chunk so no token is dropped, and the StopIteration / StopAsyncIteration
fallback always flushes a terminal interaction.completed event when one
hasn't already been sent.
Both behaviors are covered by new unit tests:
- test_no_text_token_is_dropped_during_streaming
- test_response_created_then_text_delta_emits_step_start_and_delta
- test_stop_iteration_fallback_emits_completion_event
- test_response_completed_emits_stop_then_completion (no double-emit)
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(interactions): correlate EOF terminal events with stream's interaction id
The StopIteration fallback path previously built the terminal step.stop /
interaction.completed events with id=None (legacy content.stop) and a
memory-address fallback string (interaction.completed), neither of which
matched the item_id used by the earlier interaction.created / step.start /
step.delta events in the same stream. Downstream consumers correlating
events by id would see a mismatch.
Persist the interaction id derived from the first upstream chunk (item_id
on an OutputTextDeltaEvent, or response.id on a ResponseCreatedEvent) and
reuse it when flushing the terminal events on EOF.
Author: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* ci(windows): raise UV_HTTP_TIMEOUT to 300s for uv sync
The using_litellm_on_windows job has been hitting flaky PyPI download
timeouts during 'uv sync --frozen --group dev' — different packages on
each rerun (six, pydantic-core), all surfacing the same uv error:
Failed to download distribution due to network timeout.
Try increasing UV_HTTP_TIMEOUT (current value: 30s).
uv's default 30s per-request timeout is too tight for the Windows runner
on this project (50+ deps, several multi-MB wheels), so bump it to 300s
to let slow individual downloads complete instead of failing the build.
* fix(interactions): correlate ResponseCompletedEvent terminal events with stream's interaction id
When a stream starts directly with OutputTextDeltaEvent (no preceding
ResponseCreatedEvent), interaction.created carries item_id while
interaction.completed previously carried response.id from
ResponseCompletedEvent. The two ids can differ, leaving consumers that
correlate events by id unable to match the start and completion events.
Fall back to self._interaction_id (set on the first chunk that derives
an id) before response.id, mirroring the EOF terminal path.
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(proxy): expose Prisma idle/connect timeout + extra DB URL params (#28395)
* fix(proxy): expose Prisma idle/connect timeout + extra DB URL params
Operators have reported large numbers of idle Prisma connections that
never get closed. The proxy already forwards `connection_limit` and
`pool_timeout` to the DATABASE_URL, but had no knob for capping idle
or slow connections. Add three new `general_settings` keys that thread
through to the DATABASE_URL / DIRECT_URL query string:
- `database_connect_timeout` -> Prisma `connect_timeout`
- `database_socket_timeout` -> Prisma `socket_timeout` (the main
knob for closing idle connections from the LiteLLM side)
- `database_extra_connection_params` -> untyped passthrough dict for
any other Prisma URL param (`pgbouncer`, `statement_cache_size`,
`sslmode`, ...); keys here override LiteLLM defaults.
Refactors the duplicated DATABASE_URL/DIRECT_URL param dicts into a
single `_build_db_connection_url_params` helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Update litellm/proxy/proxy_cli.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Litellm oss staging 1 (#28337)
* feat: add Xiaomi MiMo-V2.5-Pro and MiMo-V2.5 OpenRouter model entries (#27700)
Squash-merged by litellm-agent from TorvaldUtne's PR.
* fix(ui): trim whitespace from MCP inspector tool call inputs (#28203)
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* gemini-3.1-flash-lite pricing (#27933)
* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers
* fix pricing
* add service tier
---------
Co-authored-by: shin-berri <shin-laptop@berri.ai>
* fix: incorrect /v1/agents request example (#28131)
* fix(anthropic): accept dict-shape reasoning_effort from Responses bridge (#28201)
* fix(anthropic): accept dict-shape reasoning_effort from Responses bridge
Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks).
Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks.
Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash).
* test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort
Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models.
* test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize
Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop).
* feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite (#28280)
Squash-merged by litellm-agent from ro31337's PR.
* fix(router): wrap aresponses streaming iterator for mid-stream fallbacks (#28215)
Squash-merged by litellm-agent from cwang-otto's PR.
* fix(router): unblock staging — mypy + coverage for aresponses streaming fallback (#28318)
Squash-merged by litellm-agent from cwang-otto's PR.
* fix(responses): forward timeout on completion transformation path (Anthropic, Bedrock, Vertex) (#28133)
Squash-merged by litellm-agent from cwang-otto's PR.
* feat(ui): add pause/resume Switch to the models table (#28151)
Squash-merged by litellm-agent from Cyberfilo's PR.
* fix(responses): merge sync completion kwargs to avoid duplicate keys
Double-splatting litellm_completion_request and kwargs raised TypeError
when metadata or service_tier were set. Match the async merge pattern.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Use proxy base URL for CLI SSO form action (#28271)
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which was missing from the
cost map. This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
litellm.completion_cost lookup.
- Add mistral/ministral-8b-2512 entry to both the in-tree
model_prices_and_context_window.json and the bundled
litellm/model_prices_and_context_window_backup.json (mirrors the
existing openrouter/mistralai/ministral-8b-2512 pricing).
- litellm.model_cost is loaded at import time from the URL pinned to
main, so the new backup entry isn't visible at test runtime until
it also lands on main. Backfill any entries missing from the
remote-fetched map into litellm.model_cost in the local_testing
conftest so cost-calculator lookups succeed on this branch.
* fix(tests): drop unnecessary del of conftest backfill loop vars
* fix(router): harden streaming fallback wrapper for bridge iterators
- FallbackResponsesStreamWrapper now uses getattr fallbacks when copying
attributes from the source iterator. The bridge path
(LiteLLMCompletionStreamingIterator used by Anthropic/Bedrock/Vertex)
does not call super().__init__ and is missing response, logging_obj
(it uses litellm_logging_obj), responses_api_provider_config,
start_time, request_data, call_type, and _hidden_params. Previously,
wrapper construction raised AttributeError for any streaming fallback
on the bridge path.
- _aresponses_with_streaming_fallbacks now deep-copies the
litellm_metadata (and metadata) dicts into fallback_kwargs. The
primary attempt mutates this dict in place via
_update_kwargs_with_deployment, so a shallow copy of kwargs was
leaking primary-deployment fields (deployment, model_info, api_base)
into the mid-stream fallback request.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(router): use safe_deep_copy for fallback metadata snapshot
The ban_copy_deepcopy_kwargs CI check rejects copy.deepcopy() on any
variable whose name contains 'kwargs' (incl. fallback_kwargs). Swap
the two copy.deepcopy(fallback_kwargs[...]) calls for safe_deep_copy,
which handles non-picklable values (OTEL spans, etc.) by per-key
deepcopy with fallback to the original reference.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(ci): skip chronically flaky build_and_test integration tests
Both tests have been failing on every recent run of build_and_test
against this PR's HEAD (1686967, 1688402, 1689993, 1690877), and the
same two tests also fail intermittently on unrelated commits and other
branches, independent of any code change in this PR (which only touches
router fallback wrappers, the Anthropic Responses bridge, and unrelated
UI/cost-map files).
- tests.test_spend_logs.test_spend_logs: /spend/logs?request_id=...
returns 500 even after a 20s wait for the spend log to be written.
Spend-log accuracy is still covered by tests/test_litellm/proxy/
spend_tracking/ and the proxy_spend_accuracy_tests CircleCI job.
- tests.test_team_members.test_add_multiple_members: /team/info?team_id=
... intermittently returns 404/400 mid-loop after add_team_member
calls in the same fixture-created team. Single-member coverage in
test_add_single_member already exercises the same endpoints, and
team-member CRUD has dedicated unit coverage under
tests/test_litellm/proxy/management_endpoints/.
Skipping unblocks the build_and_test job until the underlying race in
the dockerized integration setup is root-caused.
* fix: preserve explicit timeout=0 in responses API handler
Use 'timeout if timeout is not None else request_timeout' instead of
'timeout or request_timeout' so an explicit timeout=0/0.0 isn't silently
replaced by the default request_timeout.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(ui): guard model_info access in pause Switch with optional chaining
* fix(ui): guard model_info access in pause Switch onChange handler
Mirror the optional-chaining guard already applied to the isPausing
c…
* fix(llm_http_handler): forward kwargs['model_info'] to litellm_params for /v1/messages
Router._update_kwargs_with_deployment stamps the selected deployment's
model_info on kwargs['model_info'] before dispatching the request.
Downstream cooldown / success callbacks (deployment_callback_on_failure,
deployment_callback_on_success) look up the deployment id via
kwargs['litellm_params']['model_info']['id'].
async_anthropic_messages_handler constructs its own litellm_params dict
when calling logging_obj.update_from_kwargs and never forwarded
model_info. As a result, /v1/messages requests dispatched through the
Router had an empty model_info on litellm_params, the deployment id was
not discoverable, and cooldown / success tracking were silently skipped
for this call type.
Forward kwargs['model_info'] into the litellm_params dict so the
existing Router callbacks can identify the deployment.
* merge main (#29486)
* [Refactor] UI - Spend Logs: consolidate filter state and extract components (#25847)
* [Refactor] UI - Spend Logs: consolidate filter state, extract components, remove dead code
- Lift filter state into index.tsx and pass to hook (removes selectedX vars + sync useEffect)
- Move main useQuery into useLogFilterLogic hook (removes isMainQueryEnabled toggle)
- Delete dead RequestViewer component (300 lines, replaced by LogDetailsDrawer)
- Extract LogsTableToolbar component (search, date range, pagination, live tail)
- Extract filter options config to filter_options.ts
- Remove dead code: handleRefresh, handleSelectLog, handleCloseDrawer, formatTimeUnit,
showFilters/showColumnDropdown state, dropdownRef/filtersRef
* Fix PR feedback: use antd Switch instead of Tremor in new file, fix typo
* Collapse dual-path filtering into single React Query
All 10 filter keys now go through the useQuery — the imperative
performSearch / debouncedSearch / backendFilteredLogs path is deleted.
Filter values are debounced via useDebouncedValue(300ms) before hitting
the query key so text inputs don't fire per-keystroke.
Removed: performSearch, debouncedSearch, backendFilteredLogs,
lastSearchTimestamp, hasBackendFilters, clientDerivedFilteredLogs,
the sort/page/time refetch useEffect, and the filteredLogs chooser memo.
* Clean up remaining smells: remove isFetchingDeferred, internalize selectedTimeInterval, fix circular import
- Remove useDeferredValue/isButtonLoading — pass logsQuery.isFetching directly
- Move selectedTimeInterval into LogsTableToolbar as internal state
- Move PaginatedResponse type from index.tsx to log_filter_logic.tsx
* Fix quick-select dropdown overlapping sidebar
* Fix stale quick-select label after Reset Filters
Move selectedTimeInterval back to parent so handleFilterReset can
reset it to the 24-hour default. The toolbar receives it as a prop.
* refactor useLogFilterLogic tests for controlled-hook + backend-query shape
The hook no longer owns filter state or does client-side filtering — it
receives filters/setFilters as props and drives filteredLogs from a
useQuery over uiSpendLogsCall. Reshape the tests around that contract:
introduce a controlled harness that owns filter state, collapse the 10
per-filter assertions into a single it.each over filterKey → API param,
and drop the client-side passthrough tests (the .min test file and the
"return all logs when no filters" / "empty when logs null" cases) that
no longer correspond to any hook behavior.
* cover new useLogFilterLogic invariants: activeTab gate, filterByCurrentUser fallback, debounce negative, partial merge
Follow-up to the test refactor. Adds coverage for invariants the
refactored hook contract introduced but that the first pass didn't
assert:
- query enablement: expand the single accessToken-null case into an
it.each over all four credential props (accessToken, token, userRole,
userID), plus a separate test for activeTab !== "request logs"
- filterByCurrentUser: when true with a blank User ID filter, the
outbound request carries user_id = userID
- debounce: also assert the negative case — no call in the first 100ms
after a filter change (first waiting out the initial mount fire)
- handleFilterChange: partial updates merge without clobbering other
filter keys (protects the spread + default-fill semantics)
- handleFilterReset: calls setCurrentPage(1) alongside restoring
filters
* fix typo dropping the live-tail banner border
Tailwind silently ignores unknown classes, so border-greem-200 was
leaving the auto-refresh banner with only its bg-green-50 fill and no
outline.
* memoize columns and derived table data in SpendLogsTable
The table's columns array, four-pass data pipeline, and sort-change
handler were all being rebuilt on every parent render. That made every
filter click re-instance all 23 TanStack-Table columns, re-run
filter/reduce/map over all rows, and recreate per-row click closures —
all before the intentional 300ms debounce timer even got a chance to
fire.
Local measurement (40 rows, dev mode):
filter click → query fires: 1957ms → 1217ms (−38%)
Wrap createColumns in useMemo keyed on sortBy/sortOrder, hoist
onSortChange into a useCallback, and move the searchedLogs /
sessionComposition / sessionRepresentativeMap / filteredData derivations
into a single useMemo keyed on filteredLogs.data + searchTerm.
These were pre-existing issues on main — not regressions from the
hook refactor — but the refactor made them user-visible because the
new query debounce put render cost on the critical path.
* apply dropdown filters instantly, debounce only text inputs
Dropdown selects now bypass the 300ms debounce so a click updates the
table immediately. Text inputs (Key Hash, Error Message, Request ID,
User ID) still debounce. handleFilterReset also clears the pending
debounced value so a half-typed text filter can't re-fire after reset.
* fix(ui/spend-logs): restore lost loading/debounce behavior + cover dropped tests
Regressions from the spend-logs-view refactor:
- debounce the 'Public model / search tool' text filter (was firing a
backend query per keystroke) via TEXT_FILTER_KEYS
- restore Fetch-button smoothing through table repaint using
useDeferredValue on the rendered data (explicit staleness)
- show AntDLoadingSpinner during the auth-resolve phase instead of a
blank screen on first load
- only live-tail-poll while the tab is visible
(refetchIntervalInBackground: false)
- extract getLiveTailRefetchInterval helper for the poll decision
Tests:
- LogDetailContent: retries display (>0 / 0 / absent), overhead-absent
- log_filter_logic: regression guard that the public-model filter
debounces; getLiveTailRefetchInterval unit tests
- logs_utils: getTimeRangeDisplay quick-select window labels
* test(ui/spend-logs): cover the cold-load auth-not-ready spinner guard
Asserts SpendLogsTable shows a loading spinner (not a blank screen)
while credentials are unresolved, and renders the table once present.
* fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 (#28281)
* fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5
OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so the live audio
calls in test_stream_chunk_builder_openai_audio_output_usage and
test_standard_logging_payload_audio now hard-fail with a model-not-found
error on every PR. The error was not "openai-internal", so the except
block swallowed it and execution fell through to an unbound
completion/response (UnboundLocalError).
Switch both tests to gpt-audio-1.5, OpenAI's recommended successor
(GA, not deprecated, already present in the litellm cost map so the
response_cost assertion still resolves). Also broaden the except to
skip with the real error in the reason instead of crashing, so a
transient upstream blip can't reintroduce the UnboundLocalError.
* fix(tests): narrow audio-test skip to model-not-found, re-raise the rest
Address review feedback: an unconditional skip on any exception would
silently mask a litellm-internal regression in the audio path (broken
param transformation, serialization, bad header) instead of failing CI.
Skip only on the upstream-unavailable class (model_not_found / "does not
exist" / openai-internal) and re-raise everything else, so genuine
regressions still fail loudly. The UnboundLocalError is still fixed
because the handler either skips or raises - it never falls through.
* fix(tests): add budget_exceeded to expected Interaction status enum
Staging added budget_exceeded to the Interaction OpenAPI status enum; the staging merge into this branch picked up the spec change but not the matching test update, so test_status_enum_values failed in CI. Align the test's expected list (exact-match by design) with the live spec.
* fix(tests): mock HTTP fetch in test_img_url_token_counter
The test parameterized a live third-party image URL (blog.purpureus.net) which now 404s, causing get_image_dimensions to fall through to its base64 decode path and crash with 'not enough values to unpack' on every PR run. Mock safe_get with a tiny 1x1 PNG so the URL branch is still exercised without any network dependency.
* fix(tests): swap gpt-4o-audio-preview to gpt-audio-1.5 in test_gpt4o_audio
OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so both live tests in test_gpt4o_audio.py (test_audio_output_from_model and test_audio_input_to_model) hard-fail model_not_found on every PR. Swap the hardcoded model to OpenAI's successor gpt-audio-1.5 (same chat-completions audio surface; already in the litellm cost map). Mirror the narrowed-skip pattern from the prior audio fixes: skip on model_not_found / does-not-exist / openai-internal, re-raise everything else so genuine litellm regressions still fail CI loudly.
* chore(ci): bump versions (#28287)
* bump: version 0.4.72 → 0.4.73
* bump: version 1.86.0 → 1.87.0
* uv lock
* feat: propagate team_id and team_alias to all child OTEL spans (#28273)
- Add `_set_team_attributes_on_span` helper to stamp team_id/team_alias
onto any span, ensuring these attributes are not limited to the root
litellm_request span
- Add `_set_team_attributes_from_kwargs` helper to extract team metadata
from the standard_logging_object in kwargs and apply them to a span
- Apply team attributes to raw request spans via `_maybe_log_raw_request`
so downstream consumers can filter traces by team without needing the
root span
- Apply team attributes to guardrail spans so guardrail activity can be
correlated to teams in tracing backends
- Apply team attributes to exception logging spans to preserve team
context during failure paths
- Add comprehensive unit tests covering all new helpers, including edge
cases where metadata or standard_logging_object is absent
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
* Day 0 support : Gemini 3.5 Flash (#28268)
* Add day 0 support for gemini 3.5 flash
* Fix pricing
* Fix greptile review
* Fix failing test
* Fix tests
* Fix: revert tool removing logic
* fix greptile and test
---------
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* Gemini managed agents support (#28270)
* Add support for environment variable in interactions api
* Add sdk support for gemini create agent
* Add agents endpoint support via proxy
* Add outputs of each api
* Add routing for model and agents param
* Remove redundant condition in get_provider_agents_api_config
LlmProviders.GEMINI.value is literally the string "gemini", so the
second clause of the or was checking the exact same thing as the first.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix: forward query-param credentials to list/get/delete/versions Gemini agent endpoints
The list_gemini_agents, get_gemini_agent, delete_gemini_agent, and
list_gemini_agent_versions endpoints previously constructed a hardcoded
data dict with no mechanism to pass provider credentials. Unlike
create_gemini_agent (POST, reads litellm_params_template from body),
these GET/DELETE endpoints gave no way for multi-tenant callers to
supply a per-request api_key or other LiteLLM params.
Fix:
- Add _merge_query_params_into_data() helper that reads query parameters
from the request and merges them into the data dict without overwriting
already-set keys (e.g. path params like 'name').
- Support a JSON-encoded litellm_params_template query parameter
(matching the POST body pattern) as well as flat key=value pairs
(e.g. api_key=AIza...).
- Apply the helper in all four affected endpoints.
- Add 13 unit tests covering the helper and each endpoint.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix: pass model=None for managed agent proxy endpoints to prevent agent name polluting data["model"]
Endpoints acreate_agent, aget_agent, adelete_agent, and alist_agent_versions
were passing model=<agent_name> to base_process_llm_request. This caused
common_processing_pre_call_logic to write the agent name into self.data["model"],
which then triggered spurious model-alias mapping, rate-limiting lookups, and
logging tied to a non-existent model deployment.
The agent name is already carried in data["name"] and is passed correctly to
the SDK functions (litellm.interactions.agents.*). There is no reason to also
set model=<agent_name>; the correct value is model=None for all five managed-agent
management routes.
Adds tests/test_litellm/proxy/google_endpoints/test_managed_agents_model_param.py
to verify all five managed-agent endpoints pass model=None.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix: address greptile P1/P2 review comments
P1 (router.py): Restore fallback/retry support for acreate_interaction
and create_interaction. Both were silently moved to _init_interactions_api_endpoints
(direct call, no fallbacks). Moved them back to _ageneric_api_call_with_fallbacks
so users with configured fallback models keep retry behaviour.
P1 security (agents_endpoints.py): Remove flat query-param credential
path (e.g. ?api_key=AIza...) from _merge_query_params_into_data.
Credentials in URL query strings appear verbatim in server access logs,
CDN edge logs, and browser history. Only the JSON-encoded
litellm_params_template query param (matching the POST body pattern) is
retained.
P2 (interactions/http_handler.py): Extract _BaseHTTPHandler with shared
_handle_error, _sync_client, and _async_client helpers. InteractionsHTTPHandler
now extends _BaseHTTPHandler. The _async_client reads the provider from
litellm_params instead of hardcoding GEMINI.
P2 (interactions/agents/http_handler.py): AgentsHTTPHandler now extends
InteractionsHTTPHandler (which inherits _BaseHTTPHandler) so all shared
HTTP infrastructure is reused rather than duplicated. Removes the
hardcoded LlmProviders.GEMINI from the async client path.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address CI failures from greptile review fixes
- black: format interactions/agents/main.py and utils.py
- tests: update test_gemini_agents_endpoints.py to match new
_merge_query_params_into_data behaviour (flat credential params are
rejected; only JSON-encoded litellm_params_template is accepted)
- ci: add test_gemini_agents_endpoints.py to endpoints-and-responses
shard in test-unit-proxy-db.yml so assert-shard-coverage passes
- tests: add _initialize_managed_agents_endpoints and
_init_managed_agents_api_endpoints test coverage so router_code_coverage
passes; also fix TestRouterCreateInteractionRouting to reflect that
acreate_interaction now correctly routes through
_ageneric_api_call_with_fallbacks (restoring fallback support)
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: remove InteractionsHTTPHandler._handle_error override to fix type errors
AgentsHTTPHandler extends InteractionsHTTPHandler and calls
self._handle_error(provider_config=agents_api_config) where
agents_api_config is BaseAgentsAPIConfig. Python MRO resolved _handle_error
to InteractionsHTTPHandler._handle_error which expected BaseInteractionsAPIConfig,
causing 10 mypy arg-type errors in interactions/agents/http_handler.py.
Removing the redundant override lets both classes inherit _BaseHTTPHandler._handle_error
(provider_config: Any) which is structurally correct for both config types.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: agent-only interactions and managed agents provider routing
Resolve None custom_llm_provider in agents HTTP client lookup and set
custom_llm_provider on GenericLiteLLMParams for all agent CRUD paths.
Stop mapping agent names to proxy model routing; route interactions
through _init_interactions_api_endpoints with fallbacks only when model
is set. Consolidate duplicate router elif branches for interaction APIs.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Fix greptile review
* test(agents): add unit tests for managed agents SDK and HTTP handler
Adds coverage for the new `litellm.interactions.agents` surface area:
- main.py: sync/async entry points (create/list/get/delete/list_versions),
provider config lookup, logging-obj helper, async error wrapping
- http_handler.py: every CRUD method (sync + async paths), `_is_async`
dispatch branches, and provider error mapping through GeminiAgentsConfig
- utils.py: get_provider_agents_api_config for supported / unsupported
providers
Brings patch coverage on these files from <25% to ~100% so codecov/patch
is satisfied.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* docs(gemini-agents): fix misleading credential-passing examples in GET/DELETE docstrings (#28293)
The four GET/DELETE endpoint docstrings (list_gemini_agents,
get_gemini_agent, delete_gemini_agent, list_gemini_agent_versions)
documented passing per-request credentials as flat query parameters
(e.g. ?api_key=AIza...). However, _merge_query_params_into_data only
reads the JSON-encoded litellm_params_template query parameter and
intentionally ignores flat params (URL query strings appear verbatim
in access logs, browser history, and Referer headers).
Callers following the documented curl examples would have their
credentials silently dropped and hit auth failures against Gemini.
Update the examples to use the supported JSON-encoded
litellm_params_template query parameter, matching _merge_query_params_into_data's own docstring.
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* refactor(agents): rename provider-agnostic agent response types
Move GeminiAgent{ListResponse,DeleteResult,VersionsResponse} to
provider-neutral names (AgentListResponse, AgentDeleteResult,
AgentVersionsResponse) so the BaseAgentsAPIConfig interface no longer
references Gemini-specific type names.
* fix(gemini-agents): close veria-flagged credential-escalation gaps
Two high-severity findings from the veria-ai PR review are addressed:
1. **api_base override could leak the shared Gemini key**
GeminiAgentsConfig.validate_environment falls back to GOOGLE_API_KEY /
GEMINI_API_KEY when no api_key is supplied. Combined with caller-controlled
api_base on the proxy CRUD endpoints, an authenticated user could redirect
the outbound request to an attacker-controlled host and capture the
operator's shared Gemini key from the x-goog-api-key header. The config
now refuses env-fallback whenever api_base is explicitly overridden.
2. **Managed-agent CRUD exposed to ordinary LLM keys**
The new /v1beta/agents routes live in google_routes (i.e. llm_api_routes),
so any non-admin LLM key can reach them. Unlike /v1beta/models/...:
generateContent these endpoints are NOT model-routed and have no
model_list-supplied credentials, so env-fallback would let any LLM key
list / create / delete agents inside the operator's Gemini project. Each
endpoint now calls _enforce_caller_supplied_provider_key, which requires
non-admin callers to supply their own Gemini api_key via
litellm_params_template. Proxy admins keep the env-fallback convenience.
Tests cover non-admin rejection, admin allow-through, the api_base override
guard, and SDK env-fallback when api_base is not overridden.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(router): restore strict assert_called_once_with on interactions default-provider test
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* feat(gemini): add gemini-3.1-flash-lite model cost map (#28320)
* feat(gemini): add gemini-3.1-flash-lite model cost map entries
Co-authored-by: Cursor <cursoragent@cursor.com>
* Update model_prices_and_context_window.json
* Update source URL for model pricing information
* Sync source URL for gemini-3.1-flash-lite in backup JSON
* fix(model_cost_map): add mistral/ministral-8b-2512 entry
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which is not in the cost map.
This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
completion_cost lookup. Add the entry mirroring the existing
openrouter/mistralai/ministral-8b-2512 pricing.
* test(cost_calculator): assert output_cost_per_reasoning_token for gemini-3.1-flash-lite
* fix(tests): backfill local backup entries into runtime model_cost
litellm.model_cost is loaded from LITELLM_MODEL_COST_MAP_URL (pinned to
main) at import time, so any pricing entries added to the in-tree backup
on this branch aren't visible at test runtime until they also land on
main. The Mistral cassette currently returns model=ministral-8b-2512
and the cost-calculator lookup in test_completion_mistral_api /
test_completion_mistral_api_modified_input fails despite the entry
existing in the local backup. Backfill missing backup entries into
litellm.model_cost in the local_testing conftest so these lookups
succeed against the cassette state the branch is being tested with.
* fix(tests): guard conftest backfill against empty local cost map
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed (#27854)
* fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed
Symptom
-------
Customers on multi-pod deployments see team `spend` jump to ~2x (or N x
the pod count) shortly after a Redis cache miss / TTL expiry, triggering
spurious "Budget Crossed" alerts and blocked requests until the value is
manually reset.
Root cause
----------
`SpendCounterReseed.coalesced` warmed the primary spend counter by
calling `redis.async_increment(key, value=db_spend, refresh_ttl=True)`,
which lowers to Redis `INCRBYFLOAT`. That is additive, not idempotent.
The per-counter `asyncio.Lock` only coalesces seeders inside one
process. With N pods sharing one Redis, on a cold key (cold start, TTL
expiry, manual delete) every pod independently passes its lock + Redis
re-check, reads the same `db_spend`, and issues `INCRBYFLOAT db_spend`.
Final value: N x db_spend.
Fix
---
Use `redis.async_set_cache(key, value=db_spend, nx=True)` for the seed.
SET NX is atomic across pods: exactly one writer initializes the key;
losers read the winner's value via `async_get_cache`. This is the same
idiom already used by `coalesced_window` in the same file, so the two
seed paths are now consistent.
Per-request deltas continue to use `INCRBYFLOAT` (correct - additive
behaviour is what we want for increments, not for initial seed).
Verification
------------
Live two-process repro against the same Postgres + Redis (DB
spend = 506):
Unpatched: 4/4 runs -> Redis counter = ~1012 (~2 x db_spend)
Patched: 12/12 runs -> Redis counter = ~506
Unit tests (`test_proxy_server.py`):
- New `test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed`
patches `_get_lock` to return a fresh lock per caller (otherwise the
per-process lock masks the race), races two `coalesced` calls, and
asserts final = 506 with exactly one of two SET NX attempts winning.
- 4 existing tests updated for the new seed contract (SET NX for the
seed, INCRBYFLOAT only for the per-request delta).
- Full `spend_counter or reseed or budget` slice: 22 passed.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(spend_counter): make SET NX mock atomic so loser branch is exercised
Greptile flagged that `redis_set_cache` in
test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed
placed `await asyncio.sleep(0)` AFTER the NX membership check. Both
concurrent tasks observed an empty `redis_store`, passed the guard, and
both returned True - so the loser branch (else: read back winner's value)
was never exercised.
Fix the mock to model real atomic Redis SET NX:
- Yield BEFORE the membership check so two concurrent callers interleave
the way real SET NX does (first to resume runs check + write atomically
and wins; second resumes after the key exists and loses).
- Track set_cache return values; assert sorted([loser, winner]) so we
know exactly one task wins and one loses.
- Track async_get_cache calls that happen AFTER at least one SET NX has
completed; assert at least one such read - that is the loser-path
fallback (`current_value = float(cached)` when seeded is False).
Verified by temporarily reverting the mock to the old order: the test
now fails with `expected exactly one SET NX winner and one loser, got
[True, True]`, exactly the failure mode Greptile described.
No production code change.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(spend_counter): mock async_set_cache to populate redis_store in concurrent read+write test
`test_concurrent_read_and_write_paths_share_one_db_query` mocks
`async_increment` to populate the in-memory `redis_store`, but did not
mock `async_set_cache`. After the SET-NX seed change in `coalesced()`,
the seed step writes via `async_set_cache(nx=True)` (default AsyncMock,
no `redis_store` write), so the simulated Redis stays empty after the
first reseed. The second `get_current_spend` then sees a clean Redis
miss, re-enters the DB read path, and the test fails with
`expected 1 DB query, got 2`.
Fix: add a `redis_set_cache` side_effect that updates `redis_store` on
`nx=True` (and rejects when the key already exists), matching the
pattern used by the four sibling tests fixed in this branch's first
commit. Pre-existing assertions are unchanged.
Full `tests/test_litellm/proxy/test_proxy_server.py`: 158 passed.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): normalize batch file IDs before ManagedObjectTable write (#28339)
* fix(proxy): normalize batch file IDs before ManagedObjectTable write
Run post_call_success_hook before update_batch_in_database on retrieve/cancel,
and ensure_batch_response_managed_file_ids so file_object never stores raw
provider output_file_id or error_file_id.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): address Greptile review on batch file ID normalization
Remove redundant resolve_* calls after update_batch_in_database and rename
loop variable to avoid shadowing hidden_params unified_file_id.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which was missing from the
cost map. This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
litellm.completion_cost lookup.
- Add mistral/ministral-8b-2512 entry to both the in-tree
model_prices_and_context_window.json and the bundled
litellm/model_prices_and_context_window_backup.json (mirrors the
existing openrouter/mistralai/ministral-8b-2512 pricing).
- litellm.model_cost is loaded at import time from the URL pinned to
main, so the new backup entry isn't visible at test runtime until
it also lands on main. Backfill any entries missing from the
remote-fetched map into litellm.model_cost in the local_testing
conftest so cost-calculator lookups succeed on this branch.
* fix(tests): drop unnecessary del of conftest backfill loop vars
* fix: resolve batch response file IDs even when status unchanged
The status-unchanged early return in update_batch_in_database was
skipping ensure_batch_response_managed_file_ids, leaving raw provider
input_file_id (and other raw IDs) in the user-facing response when
polling an in-progress batch. Move the in-place file ID normalization
above the early return so the response always carries unified managed
IDs while still skipping the DB write when nothing changed.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(batches): cover ensure_batch_response_managed_file_ids branches
Add tests for the previously-uncovered paths in
ensure_batch_response_managed_file_ids: error_file_id normalization,
swallowed conversion errors, UserAPIKeyAuth fallback from
db_batch_object, model_name resolution from unified_file_id, and early
returns when managed_files_obj, model_id, or auth context are missing.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <noreply@anthropic.com>
* fix(router): use forwarded model_id for native Azure container IDs (#27921)
* fix(router): use forwarded model_id for native Azure container IDs in _init_containers_api_endpoints
Azure code-interpreter containers return provider-native IDs (cntr_ + hex)
that carry no LiteLLM routing payload, so _decode_container_id returns
model_id=None. The router was falling through to call the handler directly,
bypassing _ageneric_api_call_with_fallbacks and leaving api_base=None for
Azure deployments. Fall back to the model_id forwarded from the proxy
ownership check so deployment credentials are always applied.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(azure-containers): strip /openai/responses path from api_base in AzureContainerConfig.get_complete_url
When a deployment's api_base is the responses endpoint URL
(e.g. .../openai/responses?api-version=...), AzureContainerConfig was
appending /openai/containers on top of it, producing the broken path
.../openai/responses/openai/containers. Azure returns 404 for that URL
while the correct path is .../openai/containers.
Strip any /openai/responses suffix from api_base before constructing
the containers URL so the resource root is always used as the starting point.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(azure-containers): prefer api-version from api_base URL over deployment's api_version
The deployment's api_version (e.g. 2024-08-01-preview) targets the chat/responses
API and is too old for the containers API, which requires 2025-04-01-preview.
The responses endpoint api_base already carries the correct api-version in its
query string. Extract it and use it for the containers URL, overriding the
stale deployment-level version.
Fixes DELETE and file-upload operations returning 404 due to wrong api-version.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(containers): pass params=None instead of params={} to httpx to preserve api-version
httpx erases a URL's query-string when params={} (empty dict) is passed,
silently stripping ?api-version=2025-04-01-preview from every container
POST/DELETE request. Azure's GET endpoints tolerate a missing api-version;
POST (upload) and DELETE are strict, so those returned 404.
Fix: use `params or None` in container_handler._async_handle and
llm_http_handler.async_container_delete_handler (and all sibling container
handlers) so that an empty params dict falls back to None, leaving httpx to
preserve the URL's existing query string intact.
Adds a regression test that directly documents the httpx behaviour.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(router): remove elif model_id branch from _init_containers_api_endpoints
Two reviewer findings addressed:
1. Truncated comment on the model_id fallback line — now complete.
2. Security: the elif branch that fired when container_id was absent allowed
any authenticated caller to supply model_id in a POST /v1/containers body
and route the request through an arbitrary deployment UUID, bypassing the
model-level access checks that only validate `model`. Removed the elif
branch; operations without container_id (create, list) route by the
caller-supplied `model` field as before. model_id forwarding is kept only
inside the container_id block, where the proxy ownership check has already
validated the container before forwarding the deployment ID.
Adds a regression test pinning the security boundary: no-container-id path
calls original_function directly even when model_id is in kwargs.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(containers): validate proxy-to-router model_id forwarding for managed IDs
Add test_regression_get_container_forwarding_params_sets_model_id_for_managed_id
to verify that get_container_forwarding_params (the proxy-side half of the Azure
routing fix) correctly extracts and forwards model_id from a LiteLLM-managed
encoded container ID.
This closes the gap identified by Greptile P1: the previous regression test
only injected model_id as a direct kwarg, validating the router in isolation.
The new test exercises the actual proxy-to-router data flow through
ownership.get_container_forwarding_params, confirming that kwargs["model_id"]
is populated before _init_containers_api_endpoints is reached.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(azure-containers): tighten endpoint-path strip to endswith match
Use path.endswith() instead of path.find() for _AZURE_ENDPOINT_PATHS so
the suffix strip only fires when api_base actually ends with one of the
endpoint-specific path suffixes. This is the more precise check greptile
flagged on the original find()-based implementation.
* Fix sync container handler to preserve URL query string
Mirror the async path fix: pass None instead of an empty params dict so
httpx does not strip the URL's existing query string (e.g.
?api-version=...), which is required for Azure container routing.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(azure-containers): strip trailing slash before endpoint suffix match
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(containers): recover model_id from stored encoded id for native Azure container IDs
get_container_forwarding_params previously only set model_id when the
user-supplied container_id was a LiteLLM-managed encoded id. For native
upstream IDs (e.g. Azure 'cntr_<hex>') the decode fails and model_id was
never forwarded — making the router-side fallback in
_init_containers_api_endpoints unreachable in production.
Fall back to the stored 'unified_object_id' on the ownership row, which
is the encoded form captured at create time when the router selected a
specific deployment. Decoding that yields the deployment model_id and
restores router-based credential application (api_base, api_key) for
retrieve/delete and container-file operations on native IDs.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(ui): restore log filter loading indicator (#28282)
When a new filter is applied to spend logs, React Query's keepPreviousData
left stale rows on screen for 10–15s with no indication that a fetch was
in progress. The previous custom isFilteringResults flag was removed in
the #25847 toolbar refactor and only partially restored on the Fetch
button. Use React Query's isPlaceholderData to discriminate a real
filter change (queryKey changed, data not yet arrived) from a same-key
live-tail refetch, and feed it into the existing isLoading prop on the
toolbar pagination text and the table body. Live-tail polls still keep
previous rows without flicker.
Co-authored-by: Ryan <ryan@Ryans-MBP.localdomain>
* test(e2e): migrate runner to uv, add All Proxy Models key test (#28313)
* chore(e2e): migrate runner to uv, add All Proxy Models key test
Switches the local e2e runner (run_e2e.sh) from poetry to uv to match
the rest of the repo and CI. Adds a Playwright test for creating an
admin key with no team selected (all-proxy-models flow), a SLOWMO env
hook for headed debugging, and a MIGRATION_TRACKING.md doc that maps
the manual UI QA checklist to e2e tests so future migration work has
a single source of truth.
* chore(e2e): address greptile feedback
- Remove MIGRATION_TRACKING.md (docs belong in litellm-docs repo)
- playwright.config.ts: fall back to 0 when SLOWMO is non-numeric
(parseInt returns NaN, which Playwright accepts silently)
- run_e2e.sh: add --frozen to uv sync for CI determinism
* feat(ui): team passthrough routes create parity + edit load fix (#28098)
* feat(ui): team allowed_passthrough_routes create parity + edit load fix
Add the Allowed Pass Through Routes selector to the create-team modal
(previously only on the edit form), and fix the edit form silently
dropping the field: it lives under team metadata, so initialValues must
read info.metadata.allowed_passthrough_routes — otherwise the selector
renders empty and saving wipes admin-set routes. Both selectors are
gated to premium proxy admins, mirroring the server-side gate.
Resolves LIT-3019
* fix(ui): persist team allowed_passthrough_routes edits on save
The edit form loaded the selector but the save path never wrote it back:
allowed_passthrough_routes stayed in the raw metadata JSON textarea and
parsedMetadata (from that textarea) always won, so selector edits were
silently discarded. Strip it from the textarea initialValues and overlay
values.allowed_passthrough_routes into updateData.metadata, mirroring how
guardrails is handled.
Resolves LIT-3019
* fix(ui): preserve team passthrough routes for non-proxy-admins on save
Only proxy admins may set allowed_passthrough_routes (server-side gate).
For non-proxy-admins, write the team's stored value back into metadata
instead of the form value, so saving an unrelated setting can't silently
wipe routes; omit the key entirely when the team never had any.
Resolves LIT-3019
* fix(mcp): JWT on tools/list and REST tools/call server resolution (#28227)
* fix(mcp): JWT on tools/list, REST server_id resolution, tool_server_mismatch
Sign outbound MCP JWTs for list_mcp_tools and inject headers on the tools/list
path. Resolve server_id on /mcp-rest/tools/call and return 403 tool_server_mismatch
when the tool does not belong to the requested server. Default missing arguments to {}.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): restrict list JWTs to mcp:tools/list and default REST arguments to {}
- List-only JWTs (call_type=list_mcp_tools) no longer carry the broad
mcp:tools/call scope. _build_scope() now emits only mcp:tools/list
when no tool name is provided, mirroring the existing least-privilege
rule that tool-call JWTs omit mcp:tools/list.
- REST /tools/call now defaults a missing 'arguments' field to {} so
execute_mcp_tool() and downstream **arguments / .keys() calls don't
receive None and crash with TypeError/AttributeError.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): validate tool/server in call_tool; skip JWT signer when not configured or static auth present
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): align tests and mypy with user_api_key_auth on tools/list
Update mocks for the new _get_tools_from_server parameter, mock server
registry in REST access-denied test, and narrow static_headers for mypy.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(test): accept user_api_key_auth in get_tools_from_mcp_servers mock
The side_effect for the all-servers case did not accept the new kwarg,
so tools/list returned an empty list.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): fail fast for unknown tools when server mapping exists
Server-name fallback in call_tool must not open an upstream session when
the tool is absent from a populated mapping. Update the HTTP transport test
to register a known tool before asserting not-found behavior.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix mypy
* Fix mypy
* fix(mcp): preserve tools/call scope on missing tool name; pass user_api_key_auth in list_tools
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): match alias/server_name in _resolve_mcp_server_for_tool_call
The registry lookup in _resolve_mcp_server_for_tool_call previously only
compared candidate.name against the provided server_name, but tool name
prefixes can be derived from a server's alias or server_name (see
get_server_prefix). When the tool→server mapping is empty/stale (cold
start, dynamic tools), the lookup would fail for alias-configured
servers even though get_mcp_server_by_name (used by the REST path)
matches alias, server_name, and name.
Match the same priority of identifiers in both the registry pass and
the unprefixed fallback so the MCP protocol call_tool path is
consistent with the REST path.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): reuse proxy_logging DualCache in inject_mcp_jwt_headers_for_upstream
Instead of allocating a fresh DualCache() on every tools/list invocation,
prefer the shared proxy_logging_obj.internal_usage_cache.dual_cache when
available. The cache argument is currently unused by MCPJWTSigner, but
sharing the proxy's cache avoids per-call allocation overhead and matches
the cache identity used elsewhere in the proxy hook plumbing — so any
future per-request state stored in cache will survive across list calls.
Co-authored-by: Claude <noreply@anthropic.com>
* fix(mcp): return 403 ip_filtering for IP-restricted servers in tools/call name lookup
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(test): accept user_api_key_auth kwarg in list_tools mocks
The proxy-infra job was failing on four TestMCPServerManager tests because
the mock_get_tools_from_server stubs did not accept the new
user_api_key_auth keyword argument that list_tools now forwards to
_get_tools_from_server. Add the kwarg to each stub so list_tools can call
through cleanly.
Co-authored-by: Claude <claude@anthropic.com>
* fix(mcp): skip JWT injection when per-user mcp_auth_header is set
MCPClient._get_auth_headers() applies extra_headers AFTER writing
Authorization from auth_value, so an injected JWT silently overwrites
the user's per-server OAuth token. Guard the JWT signer with
'not mcp_auth_header' so per-user OAuth (and any dict-form per-user
auth) takes precedence, mirroring the existing static_headers guard.
Adds a regression test that the signer's inject helper is not called
when mcp_auth_header is supplied.
* fix(mcp): skip JWT injection when extra_headers already has Authorization
When a server uses per-user OAuth tokens, the resolved token is passed
into _get_tools_from_server via extra_headers. The JWT injection guard
only checked mcp_auth_header and the server's static headers, so the
signer would silently overwrite the user's OAuth Authorization header.
Add a check for an existing Authorization entry in extra_headers so
caller-supplied per-user OAuth tokens take precedence over JWT signing.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(mcp): cover JWT signer + tool-call resolution branches
Adds unit tests for the new MCPServerManager helpers (_resolve_mcp_server_for_tool_call,
_resolve_oauth2_headers_for_tool_call) and the new MCPJWTSigner paths
(_build_scope call_type branches and inject_mcp_jwt_headers_for_upstream).
Brings patch coverage above the auto target without changing behavior.
Co-authored-by: Claude <claude@anthropic.com>
* fix(mcp): retry tool-server lookup with prefixed name in REST mismatch check
When the REST /mcp-rest/tools/call path sends a raw tool name plus
requested_server_id, _get_mcp_server_from_tool_name(name) can return
None if the mapping only stores the prefixed form. That bypassed the
tool_server_mismatch 403 guard and let the call fall through to
trusting requested_server.
Retry the lookup with every known prefix of the requested server so
the mismatch check fires whenever the tool is actually registered.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): always reject unknown tools in server-name fallback
Defense-in-depth: _resolve_mcp_server_for_tool_call previously skipped
the unknown-tool check whenever the per-server mapping had no entries
yet (cold start, OAuth2 lazy listing, or upstream listing failure),
allowing arbitrary tool names to reach upstream servers.
Tighten the check so the server-name fallback always rejects tool
names not present in the mapping. Callers must call list_tools first
(standard MCP flow) before tools/call can resolve. Removes the
now-unused _mapping_has_tools_for_server helper and adds an
explicit empty-mapping rejection test alongside the existing
populated-mapping rejection test.
Co-authored-by: Sameer Kankute <sameer@berri.ai>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude (greptile subagent) <claude-greptile-bot@anthropic.com>
* feat(interactions): migrate to Google Interactions API steps schema (May 2026) (#28153)
* feat(interactions): migrate to Google Interactions API steps schema (May 2026)
Default to Api-Revision: 2026-05-20 (new `steps` schema). Add
`litellm.use_legacy_interactions_schema` global flag that sends
Api-Revision: 2026-05-07 for operators who need the legacy `outputs`
schema until June 8, 2026.
- Inject Api-Revision header in GoogleAIStudioInteractionsConfig.validate_environment()
- Auto-coalesce response_mime_type → response_format and image_config migration on new schema
- Add steps field to InteractionsAPIResponse and InteractionsAPIStreamingResponse
- Add StepStart/StepDelta/StepStop/InteractionCreated/etc. SSE event types
- Update streaming completion detection to handle interaction.completed event
- Bridge transformer populates both outputs and steps fields
- Bridge streaming iterator emits new-schema events by default
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(interactions): address greptile review feedback
- Avoid mutating caller's generation_config dict by shallow-copying
before popping image_config, preventing silent failures on retries
- Skip schema key in response_format when response_format is None to
avoid sending schema: null to the Google Interactions API
- Remove delta field from step.stop events (new schema only); the
StepStop model has no delta field and sending it duplicates already-
streamed text and breaks spec-conformant clients
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): parse use_legacy_interactions_schema string values safely
bool("false") returns True in Python, so quoted YAML values like
"false" or "False" silently activated the legacy Interactions API
schema. Match the env-var parsing pattern in litellm/__init__.py by
treating string inputs as true only when they equal "true" (case
insensitive).
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(interactions): only set object/id/delta on step.stop for legacy schema
StepStop (new schema) has no object, id, or delta fields. Setting them
unconditionally caused spec-breaking extra fields on new-schema step.stop
events in all four construction sites (sync/async × main-loop/StopIteration).
Legacy content.stop still receives id, object, and delta unchanged.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(interactions): stabilize streaming bridge schema, dict aliasing, and lost first delta
- Capture use_legacy_interactions_schema once at iterator construction so
all events emitted by a single stream use a consistent schema, even if
the global flag is mutated mid-stream.
- Check for the buffered interaction.complete/completed event before the
finished check in __next__/__anext__ so the final completion event
(which carries the full collected text in steps) is not dropped after
self.finished is set.
- Copy text content entries before appending to both outputs and the
steps content list to avoid shared mutable dict aliasing between the
two response fields.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix tests
* fix greptile review
* fix(interactions): address Greptile P1 review on schema coalescing and legacy deltas
Skip response_mime_type merge when response_format is already a list, avoid
in-place list mutation on image_config append, and restore delta.type on
legacy content.delta events.
Co-authored-by: Cursor <cursoragent@cursor.com>
* style(interactions): black-format gemini transformation.py
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <noreply@anthropic.com>
* test(ui-e2e): admin key creation with a specific proxy model (#28365)
* test(ui-e2e): add admin key creation with a specific proxy model
Adds Playwright coverage for creating a key (no team) scoped to a single
proxy model, complementing the existing All-Proxy-Models test. Uses a
DOM-dispatched click on the antd dropdown option since the popup
animation can render the option outside the viewport.
* test(ui-e2e): verify scoped key works against mock /chat/completions
Extend the "Create a key with a specific proxy model" test to extract
the new key from the success modal and POST to /chat/completions for
the scoped model, asserting 200 and the mock response body. Without
this the test could pass even if the model selection failed to register.
* fix(vertex_ai): omit function_call id on Vertex Gemini 3.5+ tool turns (#28324)
* fix(vertex_ai): omit function_call id on Vertex Gemini 3.5+ tool turns
Vertex AI rejects `id` on function_call/function_response parts; only Google AI Studio accepts it for Gemini 3.5+ strict tool matching.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Update litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* fix(vertex_ai): forward custom_llm_provider in context caching
Pass custom_llm_provider through to _gemini_convert_messages_with_history
in the context caching path so Gemini 3.5+ tool-call `id` forwarding
behaves consistently between cached and non-cached completions on Google
AI Studio.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <claude@anthropic.com>
* feat(mcp): allow native MCP OAuth support for cursor (#28327)
* feat(mcp): allow native MCP OAuth redirect URIs (cursor://)
Discoverable OAuth /authorize rejected cursor:// callbacks because
validate_trusted_redirect_uri only accepted http/https. Add an
allowlisted native path with a built-in Cursor default and optional
MCP_TRUSTED_NATIVE_REDIRECT_URIS env for other clients.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): address Greptile native redirect URI review
Lowercase paths in normalizer so env allowlist entries match case-
insensitively. Tighten wildcard prefix matching to reject sibling
paths (e.g. callback-2) unless the prefix ends with /.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): reject query params on native OAuth redirect URIs
Greptile: normalization stripped query strings before allowlist compare,
so cursor://.../callback?injected=... could pass validation. Reject any
native redirect_uri with a query component (same as fragments).
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(model_cost_map): add mistral/ministral-8b-2512 entry
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which is not in the cost map.
This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
completion_cost lookup. Add the entry mirroring the existing
openrouter/mistralai/ministral-8b-2512 pricing.
* fix(mcp): lowercase default native redirect URIs
Make _parse_trusted_native_redirect_uris apply the same lowercasing
to built-in defaults as it does to env-var entries.
* fix(tests): backfill local model_cost into remote-fetched map
litellm.model_cost is loaded at import time from the URL pinned to main,
so pricing entries that exist only in this branch (e.g.
mistral/ministral-8b-2512, freshly added because Mistral now returns this
id from mistral-tiny) are absent at test time and completion_cost lookups
raise. Backfill the in-tree backup so cassette-driven cost calculations
resolve against the entries that ship with the branch under test.
Fixes the local_testing_part1 failures on test_completion_mistral_api and
test_completion_mistral_api_modified_input.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Claude <claude@anthropic.com>
* fix(interactions): never drop streamed text deltas; always emit terminal completion (#28394)
* fix(interactions): never drop streamed text deltas; always emit terminal completion
The interactions streaming bridge had two bugs flagged by Greptile on PR #28153:
1. The first OutputTextDeltaEvent (and the second, when no ResponseCreatedEvent
precedes the deltas) was consumed to emit a synthetic interaction.created /
step.start event, but the chunk's text payload was never forwarded as a
step.delta. The text only reappeared in the terminal step.stop, which
defeats the purpose of incremental streaming.
2. When the upstream Responses API stream ended via StopIteration without a
ResponseCompletedEvent, the iterator emitted step.stop but never the
terminal interaction.completed event carrying the full collected text.
This refactors the iterator to translate each upstream chunk into a list of
events (instead of a single event) and buffers them in a deque. A text delta
now expands into [interaction.created, step.start, step.delta] on the first
chunk so no token is dropped, and the StopIteration / StopAsyncIteration
fallback always flushes a terminal interaction.completed event when one
hasn't already been sent.
Both behaviors are covered by new unit tests:
- test_no_text_token_is_dropped_during_streaming
- test_response_created_then_text_delta_emits_step_start_and_delta
- test_stop_iteration_fallback_emits_completion_event
- test_response_completed_emits_stop_then_completion (no double-emit)
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(interactions): correlate EOF terminal events with stream's interaction id
The StopIteration fallback path previously built the terminal step.stop /
interaction.completed events with id=None (legacy content.stop) and a
memory-address fallback string (interaction.completed), neither of which
matched the item_id used by the earlier interaction.created / step.start /
step.delta events in the same stream. Downstream consumers correlating
events by id would see a mismatch.
Persist the interaction id derived from the first upstream chunk (item_id
on an OutputTextDeltaEvent, or response.id on a ResponseCreatedEvent) and
reuse it when flushing the terminal events on EOF.
Author: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* ci(windows): raise UV_HTTP_TIMEOUT to 300s for uv sync
The using_litellm_on_windows job has been hitting flaky PyPI download
timeouts during 'uv sync --frozen --group dev' — different packages on
each rerun (six, pydantic-core), all surfacing the same uv error:
Failed to download distribution due to network timeout.
Try increasing UV_HTTP_TIMEOUT (current value: 30s).
uv's default 30s per-request timeout is too tight for the Windows runner
on this project (50+ deps, several multi-MB wheels), so bump it to 300s
to let slow individual downloads complete instead of failing the build.
* fix(interactions): correlate ResponseCompletedEvent terminal events with stream's interaction id
When a stream starts directly with OutputTextDeltaEvent (no preceding
ResponseCreatedEvent), interaction.created carries item_id while
interaction.completed previously carried response.id from
ResponseCompletedEvent. The two ids can differ, leaving consumers that
correlate events by id unable to match the start and completion events.
Fall back to self._interaction_id (set on the first chunk that derives
an id) before response.id, mirroring the EOF terminal path.
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(proxy): expose Prisma idle/connect timeout + extra DB URL params (#28395)
* fix(proxy): expose Prisma idle/connect timeout + extra DB URL params
Operators have reported large numbers of idle Prisma connections that
never get closed. The proxy already forwards `connection_limit` and
`pool_timeout` to the DATABASE_URL, but had no knob for capping idle
or slow connections. Add three new `general_settings` keys that thread
through to the DATABASE_URL / DIRECT_URL query string:
- `database_connect_timeout` -> Prisma `connect_timeout`
- `database_socket_timeout` -> Prisma `socket_timeout` (the main
knob for closing idle connections from the LiteLLM side)
- `database_extra_connection_params` -> untyped passthrough dict for
any other Prisma URL param (`pgbouncer`, `statement_cache_size`,
`sslmode`, ...); keys here override LiteLLM defaults.
Refactors the duplicated DATABASE_URL/DIRECT_URL param dicts into a
single `_build_db_connection_url_params` helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Update litellm/proxy/proxy_cli.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Litellm oss staging 1 (#28337)
* feat: add Xiaomi MiMo-V2.5-Pro and MiMo-V2.5 OpenRouter model entries (#27700)
Squash-merged by litellm-agent from TorvaldUtne's PR.
* fix(ui): trim whitespace from MCP inspector tool call inputs (#28203)
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* gemini-3.1-flash-lite pricing (#27933)
* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers
* fix pricing
* add service tier
---------
Co-authored-by: shin-berri <shin-laptop@berri.ai>
* fix: incorrect /v1/agents request example (#28131)
* fix(anthropic): accept dict-shape reasoning_effort from Responses bridge (#28201)
* fix(anthropic): accept dict-shape reasoning_effort from Responses bridge
Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks).
Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks.
Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash).
* test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort
Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models.
* test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize
Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop).
* feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite (#28280)
Squash-merged by litellm-agent from ro31337's PR.
* fix(router): wrap aresponses streaming iterator for mid-stream fallbacks (#28215)
Squash-merged by litellm-agent from cwang-otto's PR.
* fix(router): unblock staging — mypy + coverage for aresponses streaming fallback (#28318)
Squash-merged by litellm-agent from cwang-otto's PR.
* fix(responses): forward timeout on completion transformation path (Anthropic, Bedrock, Vertex) (#28133)
Squash-merged by litellm-agent from cwang-otto's PR.
* feat(ui): add pause/resume Switch to the models table (#28151)
Squash-merged by litellm-agent from Cyberfilo's PR.
* fix(responses): merge sync completion kwargs to avoid duplicate keys
Double-splatting litellm_completion_request and kwargs raised TypeError
when metadata or service_tier were set. Match the async merge pattern.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Use proxy base URL for CLI SSO form action (#28271)
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which was missing from the
cost map. This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
litellm.completion_cost lookup.
- Add mistral/ministral-8b-2512 entry to both the in-tree
model_prices_and_context_window.json and the bundled
litellm/model_prices_and_context_window_backup.json (mirrors the
existing openrouter/mistralai/ministral-8b-2512 pricing).
- litellm.model_cost is loaded at import time from the URL pinned to
main, so the new backup entry isn't visible at test runtime until
it also lands on main. Backfill any entries missing from the
remote-fetched map into litellm.model_cost in the local_testing
conftest so cost-calculator lookups succeed on this branch.
* fix(tests): drop unnecessary del of conftest backfill loop vars
* fix(router): harden streaming fallback wrapper for bridge iterators
- FallbackResponsesStreamWrapper now uses getattr fallbacks when copying
attributes from the source iterator. The bridge path
(LiteLLMCompletionStreamingIterator used by Anthropic/Bedrock/Vertex)
does not call super().__init__ and is missing response, logging_obj
(it uses litellm_logging_obj), responses_api_provider_config,
start_time, request_data, call_type, and _hidden_params. Previously,
wrapper construction raised AttributeError for any streaming fallback
on the bridge path.
- _aresponses_with_streaming_fallbacks now deep-copies the
litellm_metadata (and metadata) dicts into fallback_kwargs. The
primary attempt mutates this dict in place via
_update_kwargs_with_deployment, so a shallow copy of kwargs was
leaking primary-deployment fields (deployment, model_info, api_base)
into the mid-stream fallback request.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(router): use safe_deep_copy for fallback metadata snapshot
The ban_copy_deepcopy_kwargs CI check rejects copy.deepcopy() on any
variable whose name contains 'kwargs' (incl. fallback_kwargs). Swap
the two copy.deepcopy(fallback_kwargs[...]) calls for safe_deep_copy,
which handles non-picklable values (OTEL spans, etc.) by per-key
deepcopy with fallback to the original reference.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(ci): skip chronically flaky build_and_test integration tests
Both tests have been failing on every recent run of build_and_test
against this PR's HEAD (1686967, 1688402, 1689993, 1690877), and the
same two tests also fail intermittently on unrelated commits and other
branches, independent of any code change in this PR (which only touches
router fallback wrappers, the Anthropic Responses bridge, and unrelated
UI/cost-map files).
- tests.test_spend_logs.test_spend_logs: /spend/logs?request_id=...
returns 500 even after a 20s wait for the spend log to be written.
Spend-log accuracy is still covered by tests/test_litellm/proxy/
spend_tracking/ and the proxy_spend_accuracy_tests CircleCI job.
- tests.test_team_members.test_add_multiple_members: /team/info?team_id=
…
* fix(llm_http_handler): forward kwargs['model_info'] to litellm_params for /v1/messages
Router._update_kwargs_with_deployment stamps the selected deployment's
model_info on kwargs['model_info'] before dispatching the request.
Downstream cooldown / success callbacks (deployment_callback_on_failure,
deployment_callback_on_success) look up the deployment id via
kwargs['litellm_params']['model_info']['id'].
async_anthropic_messages_handler constructs its own litellm_params dict
when calling logging_obj.update_from_kwargs and never forwarded
model_info. As a result, /v1/messages requests dispatched through the
Router had an empty model_info on litellm_params, the deployment id was
not discoverable, and cooldown / success tracking were silently skipped
for this call type.
Forward kwargs['model_info'] into the litellm_params dict so the
existing Router callbacks can identify the deployment.
* merge main (#29486)
* [Refactor] UI - Spend Logs: consolidate filter state and extract components (#25847)
* [Refactor] UI - Spend Logs: consolidate filter state, extract components, remove dead code
- Lift filter state into index.tsx and pass to hook (removes selectedX vars + sync useEffect)
- Move main useQuery into useLogFilterLogic hook (removes isMainQueryEnabled toggle)
- Delete dead RequestViewer component (300 lines, replaced by LogDetailsDrawer)
- Extract LogsTableToolbar component (search, date range, pagination, live tail)
- Extract filter options config to filter_options.ts
- Remove dead code: handleRefresh, handleSelectLog, handleCloseDrawer, formatTimeUnit,
showFilters/showColumnDropdown state, dropdownRef/filtersRef
* Fix PR feedback: use antd Switch instead of Tremor in new file, fix typo
* Collapse dual-path filtering into single React Query
All 10 filter keys now go through the useQuery — the imperative
performSearch / debouncedSearch / backendFilteredLogs path is deleted.
Filter values are debounced via useDebouncedValue(300ms) before hitting
the query key so text inputs don't fire per-keystroke.
Removed: performSearch, debouncedSearch, backendFilteredLogs,
lastSearchTimestamp, hasBackendFilters, clientDerivedFilteredLogs,
the sort/page/time refetch useEffect, and the filteredLogs chooser memo.
* Clean up remaining smells: remove isFetchingDeferred, internalize selectedTimeInterval, fix circular import
- Remove useDeferredValue/isButtonLoading — pass logsQuery.isFetching directly
- Move selectedTimeInterval into LogsTableToolbar as internal state
- Move PaginatedResponse type from index.tsx to log_filter_logic.tsx
* Fix quick-select dropdown overlapping sidebar
* Fix stale quick-select label after Reset Filters
Move selectedTimeInterval back to parent so handleFilterReset can
reset it to the 24-hour default. The toolbar receives it as a prop.
* refactor useLogFilterLogic tests for controlled-hook + backend-query shape
The hook no longer owns filter state or does client-side filtering — it
receives filters/setFilters as props and drives filteredLogs from a
useQuery over uiSpendLogsCall. Reshape the tests around that contract:
introduce a controlled harness that owns filter state, collapse the 10
per-filter assertions into a single it.each over filterKey → API param,
and drop the client-side passthrough tests (the .min test file and the
"return all logs when no filters" / "empty when logs null" cases) that
no longer correspond to any hook behavior.
* cover new useLogFilterLogic invariants: activeTab gate, filterByCurrentUser fallback, debounce negative, partial merge
Follow-up to the test refactor. Adds coverage for invariants the
refactored hook contract introduced but that the first pass didn't
assert:
- query enablement: expand the single accessToken-null case into an
it.each over all four credential props (accessToken, token, userRole,
userID), plus a separate test for activeTab !== "request logs"
- filterByCurrentUser: when true with a blank User ID filter, the
outbound request carries user_id = userID
- debounce: also assert the negative case — no call in the first 100ms
after a filter change (first waiting out the initial mount fire)
- handleFilterChange: partial updates merge without clobbering other
filter keys (protects the spread + default-fill semantics)
- handleFilterReset: calls setCurrentPage(1) alongside restoring
filters
* fix typo dropping the live-tail banner border
Tailwind silently ignores unknown classes, so border-greem-200 was
leaving the auto-refresh banner with only its bg-green-50 fill and no
outline.
* memoize columns and derived table data in SpendLogsTable
The table's columns array, four-pass data pipeline, and sort-change
handler were all being rebuilt on every parent render. That made every
filter click re-instance all 23 TanStack-Table columns, re-run
filter/reduce/map over all rows, and recreate per-row click closures —
all before the intentional 300ms debounce timer even got a chance to
fire.
Local measurement (40 rows, dev mode):
filter click → query fires: 1957ms → 1217ms (−38%)
Wrap createColumns in useMemo keyed on sortBy/sortOrder, hoist
onSortChange into a useCallback, and move the searchedLogs /
sessionComposition / sessionRepresentativeMap / filteredData derivations
into a single useMemo keyed on filteredLogs.data + searchTerm.
These were pre-existing issues on main — not regressions from the
hook refactor — but the refactor made them user-visible because the
new query debounce put render cost on the critical path.
* apply dropdown filters instantly, debounce only text inputs
Dropdown selects now bypass the 300ms debounce so a click updates the
table immediately. Text inputs (Key Hash, Error Message, Request ID,
User ID) still debounce. handleFilterReset also clears the pending
debounced value so a half-typed text filter can't re-fire after reset.
* fix(ui/spend-logs): restore lost loading/debounce behavior + cover dropped tests
Regressions from the spend-logs-view refactor:
- debounce the 'Public model / search tool' text filter (was firing a
backend query per keystroke) via TEXT_FILTER_KEYS
- restore Fetch-button smoothing through table repaint using
useDeferredValue on the rendered data (explicit staleness)
- show AntDLoadingSpinner during the auth-resolve phase instead of a
blank screen on first load
- only live-tail-poll while the tab is visible
(refetchIntervalInBackground: false)
- extract getLiveTailRefetchInterval helper for the poll decision
Tests:
- LogDetailContent: retries display (>0 / 0 / absent), overhead-absent
- log_filter_logic: regression guard that the public-model filter
debounces; getLiveTailRefetchInterval unit tests
- logs_utils: getTimeRangeDisplay quick-select window labels
* test(ui/spend-logs): cover the cold-load auth-not-ready spinner guard
Asserts SpendLogsTable shows a loading spinner (not a blank screen)
while credentials are unresolved, and renders the table once present.
* fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 (#28281)
* fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5
OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so the live audio
calls in test_stream_chunk_builder_openai_audio_output_usage and
test_standard_logging_payload_audio now hard-fail with a model-not-found
error on every PR. The error was not "openai-internal", so the except
block swallowed it and execution fell through to an unbound
completion/response (UnboundLocalError).
Switch both tests to gpt-audio-1.5, OpenAI's recommended successor
(GA, not deprecated, already present in the litellm cost map so the
response_cost assertion still resolves). Also broaden the except to
skip with the real error in the reason instead of crashing, so a
transient upstream blip can't reintroduce the UnboundLocalError.
* fix(tests): narrow audio-test skip to model-not-found, re-raise the rest
Address review feedback: an unconditional skip on any exception would
silently mask a litellm-internal regression in the audio path (broken
param transformation, serialization, bad header) instead of failing CI.
Skip only on the upstream-unavailable class (model_not_found / "does not
exist" / openai-internal) and re-raise everything else, so genuine
regressions still fail loudly. The UnboundLocalError is still fixed
because the handler either skips or raises - it never falls through.
* fix(tests): add budget_exceeded to expected Interaction status enum
Staging added budget_exceeded to the Interaction OpenAPI status enum; the staging merge into this branch picked up the spec change but not the matching test update, so test_status_enum_values failed in CI. Align the test's expected list (exact-match by design) with the live spec.
* fix(tests): mock HTTP fetch in test_img_url_token_counter
The test parameterized a live third-party image URL (blog.purpureus.net) which now 404s, causing get_image_dimensions to fall through to its base64 decode path and crash with 'not enough values to unpack' on every PR run. Mock safe_get with a tiny 1x1 PNG so the URL branch is still exercised without any network dependency.
* fix(tests): swap gpt-4o-audio-preview to gpt-audio-1.5 in test_gpt4o_audio
OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so both live tests in test_gpt4o_audio.py (test_audio_output_from_model and test_audio_input_to_model) hard-fail model_not_found on every PR. Swap the hardcoded model to OpenAI's successor gpt-audio-1.5 (same chat-completions audio surface; already in the litellm cost map). Mirror the narrowed-skip pattern from the prior audio fixes: skip on model_not_found / does-not-exist / openai-internal, re-raise everything else so genuine litellm regressions still fail CI loudly.
* chore(ci): bump versions (#28287)
* bump: version 0.4.72 → 0.4.73
* bump: version 1.86.0 → 1.87.0
* uv lock
* feat: propagate team_id and team_alias to all child OTEL spans (#28273)
- Add `_set_team_attributes_on_span` helper to stamp team_id/team_alias
onto any span, ensuring these attributes are not limited to the root
litellm_request span
- Add `_set_team_attributes_from_kwargs` helper to extract team metadata
from the standard_logging_object in kwargs and apply them to a span
- Apply team attributes to raw request spans via `_maybe_log_raw_request`
so downstream consumers can filter traces by team without needing the
root span
- Apply team attributes to guardrail spans so guardrail activity can be
correlated to teams in tracing backends
- Apply team attributes to exception logging spans to preserve team
context during failure paths
- Add comprehensive unit tests covering all new helpers, including edge
cases where metadata or standard_logging_object is absent
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
* Day 0 support : Gemini 3.5 Flash (#28268)
* Add day 0 support for gemini 3.5 flash
* Fix pricing
* Fix greptile review
* Fix failing test
* Fix tests
* Fix: revert tool removing logic
* fix greptile and test
---------
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* Gemini managed agents support (#28270)
* Add support for environment variable in interactions api
* Add sdk support for gemini create agent
* Add agents endpoint support via proxy
* Add outputs of each api
* Add routing for model and agents param
* Remove redundant condition in get_provider_agents_api_config
LlmProviders.GEMINI.value is literally the string "gemini", so the
second clause of the or was checking the exact same thing as the first.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix: forward query-param credentials to list/get/delete/versions Gemini agent endpoints
The list_gemini_agents, get_gemini_agent, delete_gemini_agent, and
list_gemini_agent_versions endpoints previously constructed a hardcoded
data dict with no mechanism to pass provider credentials. Unlike
create_gemini_agent (POST, reads litellm_params_template from body),
these GET/DELETE endpoints gave no way for multi-tenant callers to
supply a per-request api_key or other LiteLLM params.
Fix:
- Add _merge_query_params_into_data() helper that reads query parameters
from the request and merges them into the data dict without overwriting
already-set keys (e.g. path params like 'name').
- Support a JSON-encoded litellm_params_template query parameter
(matching the POST body pattern) as well as flat key=value pairs
(e.g. api_key=AIza...).
- Apply the helper in all four affected endpoints.
- Add 13 unit tests covering the helper and each endpoint.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix: pass model=None for managed agent proxy endpoints to prevent agent name polluting data["model"]
Endpoints acreate_agent, aget_agent, adelete_agent, and alist_agent_versions
were passing model=<agent_name> to base_process_llm_request. This caused
common_processing_pre_call_logic to write the agent name into self.data["model"],
which then triggered spurious model-alias mapping, rate-limiting lookups, and
logging tied to a non-existent model deployment.
The agent name is already carried in data["name"] and is passed correctly to
the SDK functions (litellm.interactions.agents.*). There is no reason to also
set model=<agent_name>; the correct value is model=None for all five managed-agent
management routes.
Adds tests/test_litellm/proxy/google_endpoints/test_managed_agents_model_param.py
to verify all five managed-agent endpoints pass model=None.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix: address greptile P1/P2 review comments
P1 (router.py): Restore fallback/retry support for acreate_interaction
and create_interaction. Both were silently moved to _init_interactions_api_endpoints
(direct call, no fallbacks). Moved them back to _ageneric_api_call_with_fallbacks
so users with configured fallback models keep retry behaviour.
P1 security (agents_endpoints.py): Remove flat query-param credential
path (e.g. ?api_key=AIza...) from _merge_query_params_into_data.
Credentials in URL query strings appear verbatim in server access logs,
CDN edge logs, and browser history. Only the JSON-encoded
litellm_params_template query param (matching the POST body pattern) is
retained.
P2 (interactions/http_handler.py): Extract _BaseHTTPHandler with shared
_handle_error, _sync_client, and _async_client helpers. InteractionsHTTPHandler
now extends _BaseHTTPHandler. The _async_client reads the provider from
litellm_params instead of hardcoding GEMINI.
P2 (interactions/agents/http_handler.py): AgentsHTTPHandler now extends
InteractionsHTTPHandler (which inherits _BaseHTTPHandler) so all shared
HTTP infrastructure is reused rather than duplicated. Removes the
hardcoded LlmProviders.GEMINI from the async client path.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address CI failures from greptile review fixes
- black: format interactions/agents/main.py and utils.py
- tests: update test_gemini_agents_endpoints.py to match new
_merge_query_params_into_data behaviour (flat credential params are
rejected; only JSON-encoded litellm_params_template is accepted)
- ci: add test_gemini_agents_endpoints.py to endpoints-and-responses
shard in test-unit-proxy-db.yml so assert-shard-coverage passes
- tests: add _initialize_managed_agents_endpoints and
_init_managed_agents_api_endpoints test coverage so router_code_coverage
passes; also fix TestRouterCreateInteractionRouting to reflect that
acreate_interaction now correctly routes through
_ageneric_api_call_with_fallbacks (restoring fallback support)
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: remove InteractionsHTTPHandler._handle_error override to fix type errors
AgentsHTTPHandler extends InteractionsHTTPHandler and calls
self._handle_error(provider_config=agents_api_config) where
agents_api_config is BaseAgentsAPIConfig. Python MRO resolved _handle_error
to InteractionsHTTPHandler._handle_error which expected BaseInteractionsAPIConfig,
causing 10 mypy arg-type errors in interactions/agents/http_handler.py.
Removing the redundant override lets both classes inherit _BaseHTTPHandler._handle_error
(provider_config: Any) which is structurally correct for both config types.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: agent-only interactions and managed agents provider routing
Resolve None custom_llm_provider in agents HTTP client lookup and set
custom_llm_provider on GenericLiteLLMParams for all agent CRUD paths.
Stop mapping agent names to proxy model routing; route interactions
through _init_interactions_api_endpoints with fallbacks only when model
is set. Consolidate duplicate router elif branches for interaction APIs.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Fix greptile review
* test(agents): add unit tests for managed agents SDK and HTTP handler
Adds coverage for the new `litellm.interactions.agents` surface area:
- main.py: sync/async entry points (create/list/get/delete/list_versions),
provider config lookup, logging-obj helper, async error wrapping
- http_handler.py: every CRUD method (sync + async paths), `_is_async`
dispatch branches, and provider error mapping through GeminiAgentsConfig
- utils.py: get_provider_agents_api_config for supported / unsupported
providers
Brings patch coverage on these files from <25% to ~100% so codecov/patch
is satisfied.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* docs(gemini-agents): fix misleading credential-passing examples in GET/DELETE docstrings (#28293)
The four GET/DELETE endpoint docstrings (list_gemini_agents,
get_gemini_agent, delete_gemini_agent, list_gemini_agent_versions)
documented passing per-request credentials as flat query parameters
(e.g. ?api_key=AIza...). However, _merge_query_params_into_data only
reads the JSON-encoded litellm_params_template query parameter and
intentionally ignores flat params (URL query strings appear verbatim
in access logs, browser history, and Referer headers).
Callers following the documented curl examples would have their
credentials silently dropped and hit auth failures against Gemini.
Update the examples to use the supported JSON-encoded
litellm_params_template query parameter, matching _merge_query_params_into_data's own docstring.
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* refactor(agents): rename provider-agnostic agent response types
Move GeminiAgent{ListResponse,DeleteResult,VersionsResponse} to
provider-neutral names (AgentListResponse, AgentDeleteResult,
AgentVersionsResponse) so the BaseAgentsAPIConfig interface no longer
references Gemini-specific type names.
* fix(gemini-agents): close veria-flagged credential-escalation gaps
Two high-severity findings from the veria-ai PR review are addressed:
1. **api_base override could leak the shared Gemini key**
GeminiAgentsConfig.validate_environment falls back to GOOGLE_API_KEY /
GEMINI_API_KEY when no api_key is supplied. Combined with caller-controlled
api_base on the proxy CRUD endpoints, an authenticated user could redirect
the outbound request to an attacker-controlled host and capture the
operator's shared Gemini key from the x-goog-api-key header. The config
now refuses env-fallback whenever api_base is explicitly overridden.
2. **Managed-agent CRUD exposed to ordinary LLM keys**
The new /v1beta/agents routes live in google_routes (i.e. llm_api_routes),
so any non-admin LLM key can reach them. Unlike /v1beta/models/...:
generateContent these endpoints are NOT model-routed and have no
model_list-supplied credentials, so env-fallback would let any LLM key
list / create / delete agents inside the operator's Gemini project. Each
endpoint now calls _enforce_caller_supplied_provider_key, which requires
non-admin callers to supply their own Gemini api_key via
litellm_params_template. Proxy admins keep the env-fallback convenience.
Tests cover non-admin rejection, admin allow-through, the api_base override
guard, and SDK env-fallback when api_base is not overridden.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(router): restore strict assert_called_once_with on interactions default-provider test
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* feat(gemini): add gemini-3.1-flash-lite model cost map (#28320)
* feat(gemini): add gemini-3.1-flash-lite model cost map entries
Co-authored-by: Cursor <cursoragent@cursor.com>
* Update model_prices_and_context_window.json
* Update source URL for model pricing information
* Sync source URL for gemini-3.1-flash-lite in backup JSON
* fix(model_cost_map): add mistral/ministral-8b-2512 entry
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which is not in the cost map.
This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
completion_cost lookup. Add the entry mirroring the existing
openrouter/mistralai/ministral-8b-2512 pricing.
* test(cost_calculator): assert output_cost_per_reasoning_token for gemini-3.1-flash-lite
* fix(tests): backfill local backup entries into runtime model_cost
litellm.model_cost is loaded from LITELLM_MODEL_COST_MAP_URL (pinned to
main) at import time, so any pricing entries added to the in-tree backup
on this branch aren't visible at test runtime until they also land on
main. The Mistral cassette currently returns model=ministral-8b-2512
and the cost-calculator lookup in test_completion_mistral_api /
test_completion_mistral_api_modified_input fails despite the entry
existing in the local backup. Backfill missing backup entries into
litellm.model_cost in the local_testing conftest so these lookups
succeed against the cassette state the branch is being tested with.
* fix(tests): guard conftest backfill against empty local cost map
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed (#27854)
* fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed
Symptom
-------
Customers on multi-pod deployments see team `spend` jump to ~2x (or N x
the pod count) shortly after a Redis cache miss / TTL expiry, triggering
spurious "Budget Crossed" alerts and blocked requests until the value is
manually reset.
Root cause
----------
`SpendCounterReseed.coalesced` warmed the primary spend counter by
calling `redis.async_increment(key, value=db_spend, refresh_ttl=True)`,
which lowers to Redis `INCRBYFLOAT`. That is additive, not idempotent.
The per-counter `asyncio.Lock` only coalesces seeders inside one
process. With N pods sharing one Redis, on a cold key (cold start, TTL
expiry, manual delete) every pod independently passes its lock + Redis
re-check, reads the same `db_spend`, and issues `INCRBYFLOAT db_spend`.
Final value: N x db_spend.
Fix
---
Use `redis.async_set_cache(key, value=db_spend, nx=True)` for the seed.
SET NX is atomic across pods: exactly one writer initializes the key;
losers read the winner's value via `async_get_cache`. This is the same
idiom already used by `coalesced_window` in the same file, so the two
seed paths are now consistent.
Per-request deltas continue to use `INCRBYFLOAT` (correct - additive
behaviour is what we want for increments, not for initial seed).
Verification
------------
Live two-process repro against the same Postgres + Redis (DB
spend = 506):
Unpatched: 4/4 runs -> Redis counter = ~1012 (~2 x db_spend)
Patched: 12/12 runs -> Redis counter = ~506
Unit tests (`test_proxy_server.py`):
- New `test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed`
patches `_get_lock` to return a fresh lock per caller (otherwise the
per-process lock masks the race), races two `coalesced` calls, and
asserts final = 506 with exactly one of two SET NX attempts winning.
- 4 existing tests updated for the new seed contract (SET NX for the
seed, INCRBYFLOAT only for the per-request delta).
- Full `spend_counter or reseed or budget` slice: 22 passed.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(spend_counter): make SET NX mock atomic so loser branch is exercised
Greptile flagged that `redis_set_cache` in
test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed
placed `await asyncio.sleep(0)` AFTER the NX membership check. Both
concurrent tasks observed an empty `redis_store`, passed the guard, and
both returned True - so the loser branch (else: read back winner's value)
was never exercised.
Fix the mock to model real atomic Redis SET NX:
- Yield BEFORE the membership check so two concurrent callers interleave
the way real SET NX does (first to resume runs check + write atomically
and wins; second resumes after the key exists and loses).
- Track set_cache return values; assert sorted([loser, winner]) so we
know exactly one task wins and one loses.
- Track async_get_cache calls that happen AFTER at least one SET NX has
completed; assert at least one such read - that is the loser-path
fallback (`current_value = float(cached)` when seeded is False).
Verified by temporarily reverting the mock to the old order: the test
now fails with `expected exactly one SET NX winner and one loser, got
[True, True]`, exactly the failure mode Greptile described.
No production code change.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(spend_counter): mock async_set_cache to populate redis_store in concurrent read+write test
`test_concurrent_read_and_write_paths_share_one_db_query` mocks
`async_increment` to populate the in-memory `redis_store`, but did not
mock `async_set_cache`. After the SET-NX seed change in `coalesced()`,
the seed step writes via `async_set_cache(nx=True)` (default AsyncMock,
no `redis_store` write), so the simulated Redis stays empty after the
first reseed. The second `get_current_spend` then sees a clean Redis
miss, re-enters the DB read path, and the test fails with
`expected 1 DB query, got 2`.
Fix: add a `redis_set_cache` side_effect that updates `redis_store` on
`nx=True` (and rejects when the key already exists), matching the
pattern used by the four sibling tests fixed in this branch's first
commit. Pre-existing assertions are unchanged.
Full `tests/test_litellm/proxy/test_proxy_server.py`: 158 passed.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): normalize batch file IDs before ManagedObjectTable write (#28339)
* fix(proxy): normalize batch file IDs before ManagedObjectTable write
Run post_call_success_hook before update_batch_in_database on retrieve/cancel,
and ensure_batch_response_managed_file_ids so file_object never stores raw
provider output_file_id or error_file_id.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): address Greptile review on batch file ID normalization
Remove redundant resolve_* calls after update_batch_in_database and rename
loop variable to avoid shadowing hidden_params unified_file_id.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which was missing from the
cost map. This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
litellm.completion_cost lookup.
- Add mistral/ministral-8b-2512 entry to both the in-tree
model_prices_and_context_window.json and the bundled
litellm/model_prices_and_context_window_backup.json (mirrors the
existing openrouter/mistralai/ministral-8b-2512 pricing).
- litellm.model_cost is loaded at import time from the URL pinned to
main, so the new backup entry isn't visible at test runtime until
it also lands on main. Backfill any entries missing from the
remote-fetched map into litellm.model_cost in the local_testing
conftest so cost-calculator lookups succeed on this branch.
* fix(tests): drop unnecessary del of conftest backfill loop vars
* fix: resolve batch response file IDs even when status unchanged
The status-unchanged early return in update_batch_in_database was
skipping ensure_batch_response_managed_file_ids, leaving raw provider
input_file_id (and other raw IDs) in the user-facing response when
polling an in-progress batch. Move the in-place file ID normalization
above the early return so the response always carries unified managed
IDs while still skipping the DB write when nothing changed.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(batches): cover ensure_batch_response_managed_file_ids branches
Add tests for the previously-uncovered paths in
ensure_batch_response_managed_file_ids: error_file_id normalization,
swallowed conversion errors, UserAPIKeyAuth fallback from
db_batch_object, model_name resolution from unified_file_id, and early
returns when managed_files_obj, model_id, or auth context are missing.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <noreply@anthropic.com>
* fix(router): use forwarded model_id for native Azure container IDs (#27921)
* fix(router): use forwarded model_id for native Azure container IDs in _init_containers_api_endpoints
Azure code-interpreter containers return provider-native IDs (cntr_ + hex)
that carry no LiteLLM routing payload, so _decode_container_id returns
model_id=None. The router was falling through to call the handler directly,
bypassing _ageneric_api_call_with_fallbacks and leaving api_base=None for
Azure deployments. Fall back to the model_id forwarded from the proxy
ownership check so deployment credentials are always applied.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(azure-containers): strip /openai/responses path from api_base in AzureContainerConfig.get_complete_url
When a deployment's api_base is the responses endpoint URL
(e.g. .../openai/responses?api-version=...), AzureContainerConfig was
appending /openai/containers on top of it, producing the broken path
.../openai/responses/openai/containers. Azure returns 404 for that URL
while the correct path is .../openai/containers.
Strip any /openai/responses suffix from api_base before constructing
the containers URL so the resource root is always used as the starting point.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(azure-containers): prefer api-version from api_base URL over deployment's api_version
The deployment's api_version (e.g. 2024-08-01-preview) targets the chat/responses
API and is too old for the containers API, which requires 2025-04-01-preview.
The responses endpoint api_base already carries the correct api-version in its
query string. Extract it and use it for the containers URL, overriding the
stale deployment-level version.
Fixes DELETE and file-upload operations returning 404 due to wrong api-version.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(containers): pass params=None instead of params={} to httpx to preserve api-version
httpx erases a URL's query-string when params={} (empty dict) is passed,
silently stripping ?api-version=2025-04-01-preview from every container
POST/DELETE request. Azure's GET endpoints tolerate a missing api-version;
POST (upload) and DELETE are strict, so those returned 404.
Fix: use `params or None` in container_handler._async_handle and
llm_http_handler.async_container_delete_handler (and all sibling container
handlers) so that an empty params dict falls back to None, leaving httpx to
preserve the URL's existing query string intact.
Adds a regression test that directly documents the httpx behaviour.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(router): remove elif model_id branch from _init_containers_api_endpoints
Two reviewer findings addressed:
1. Truncated comment on the model_id fallback line — now complete.
2. Security: the elif branch that fired when container_id was absent allowed
any authenticated caller to supply model_id in a POST /v1/containers body
and route the request through an arbitrary deployment UUID, bypassing the
model-level access checks that only validate `model`. Removed the elif
branch; operations without container_id (create, list) route by the
caller-supplied `model` field as before. model_id forwarding is kept only
inside the container_id block, where the proxy ownership check has already
validated the container before forwarding the deployment ID.
Adds a regression test pinning the security boundary: no-container-id path
calls original_function directly even when model_id is in kwargs.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(containers): validate proxy-to-router model_id forwarding for managed IDs
Add test_regression_get_container_forwarding_params_sets_model_id_for_managed_id
to verify that get_container_forwarding_params (the proxy-side half of the Azure
routing fix) correctly extracts and forwards model_id from a LiteLLM-managed
encoded container ID.
This closes the gap identified by Greptile P1: the previous regression test
only injected model_id as a direct kwarg, validating the router in isolation.
The new test exercises the actual proxy-to-router data flow through
ownership.get_container_forwarding_params, confirming that kwargs["model_id"]
is populated before _init_containers_api_endpoints is reached.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(azure-containers): tighten endpoint-path strip to endswith match
Use path.endswith() instead of path.find() for _AZURE_ENDPOINT_PATHS so
the suffix strip only fires when api_base actually ends with one of the
endpoint-specific path suffixes. This is the more precise check greptile
flagged on the original find()-based implementation.
* Fix sync container handler to preserve URL query string
Mirror the async path fix: pass None instead of an empty params dict so
httpx does not strip the URL's existing query string (e.g.
?api-version=...), which is required for Azure container routing.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(azure-containers): strip trailing slash before endpoint suffix match
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(containers): recover model_id from stored encoded id for native Azure container IDs
get_container_forwarding_params previously only set model_id when the
user-supplied container_id was a LiteLLM-managed encoded id. For native
upstream IDs (e.g. Azure 'cntr_<hex>') the decode fails and model_id was
never forwarded — making the router-side fallback in
_init_containers_api_endpoints unreachable in production.
Fall back to the stored 'unified_object_id' on the ownership row, which
is the encoded form captured at create time when the router selected a
specific deployment. Decoding that yields the deployment model_id and
restores router-based credential application (api_base, api_key) for
retrieve/delete and container-file operations on native IDs.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(ui): restore log filter loading indicator (#28282)
When a new filter is applied to spend logs, React Query's keepPreviousData
left stale rows on screen for 10–15s with no indication that a fetch was
in progress. The previous custom isFilteringResults flag was removed in
the #25847 toolbar refactor and only partially restored on the Fetch
button. Use React Query's isPlaceholderData to discriminate a real
filter change (queryKey changed, data not yet arrived) from a same-key
live-tail refetch, and feed it into the existing isLoading prop on the
toolbar pagination text and the table body. Live-tail polls still keep
previous rows without flicker.
Co-authored-by: Ryan <ryan@Ryans-MBP.localdomain>
* test(e2e): migrate runner to uv, add All Proxy Models key test (#28313)
* chore(e2e): migrate runner to uv, add All Proxy Models key test
Switches the local e2e runner (run_e2e.sh) from poetry to uv to match
the rest of the repo and CI. Adds a Playwright test for creating an
admin key with no team selected (all-proxy-models flow), a SLOWMO env
hook for headed debugging, and a MIGRATION_TRACKING.md doc that maps
the manual UI QA checklist to e2e tests so future migration work has
a single source of truth.
* chore(e2e): address greptile feedback
- Remove MIGRATION_TRACKING.md (docs belong in litellm-docs repo)
- playwright.config.ts: fall back to 0 when SLOWMO is non-numeric
(parseInt returns NaN, which Playwright accepts silently)
- run_e2e.sh: add --frozen to uv sync for CI determinism
* feat(ui): team passthrough routes create parity + edit load fix (#28098)
* feat(ui): team allowed_passthrough_routes create parity + edit load fix
Add the Allowed Pass Through Routes selector to the create-team modal
(previously only on the edit form), and fix the edit form silently
dropping the field: it lives under team metadata, so initialValues must
read info.metadata.allowed_passthrough_routes — otherwise the selector
renders empty and saving wipes admin-set routes. Both selectors are
gated to premium proxy admins, mirroring the server-side gate.
Resolves LIT-3019
* fix(ui): persist team allowed_passthrough_routes edits on save
The edit form loaded the selector but the save path never wrote it back:
allowed_passthrough_routes stayed in the raw metadata JSON textarea and
parsedMetadata (from that textarea) always won, so selector edits were
silently discarded. Strip it from the textarea initialValues and overlay
values.allowed_passthrough_routes into updateData.metadata, mirroring how
guardrails is handled.
Resolves LIT-3019
* fix(ui): preserve team passthrough routes for non-proxy-admins on save
Only proxy admins may set allowed_passthrough_routes (server-side gate).
For non-proxy-admins, write the team's stored value back into metadata
instead of the form value, so saving an unrelated setting can't silently
wipe routes; omit the key entirely when the team never had any.
Resolves LIT-3019
* fix(mcp): JWT on tools/list and REST tools/call server resolution (#28227)
* fix(mcp): JWT on tools/list, REST server_id resolution, tool_server_mismatch
Sign outbound MCP JWTs for list_mcp_tools and inject headers on the tools/list
path. Resolve server_id on /mcp-rest/tools/call and return 403 tool_server_mismatch
when the tool does not belong to the requested server. Default missing arguments to {}.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): restrict list JWTs to mcp:tools/list and default REST arguments to {}
- List-only JWTs (call_type=list_mcp_tools) no longer carry the broad
mcp:tools/call scope. _build_scope() now emits only mcp:tools/list
when no tool name is provided, mirroring the existing least-privilege
rule that tool-call JWTs omit mcp:tools/list.
- REST /tools/call now defaults a missing 'arguments' field to {} so
execute_mcp_tool() and downstream **arguments / .keys() calls don't
receive None and crash with TypeError/AttributeError.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): validate tool/server in call_tool; skip JWT signer when not configured or static auth present
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): align tests and mypy with user_api_key_auth on tools/list
Update mocks for the new _get_tools_from_server parameter, mock server
registry in REST access-denied test, and narrow static_headers for mypy.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(test): accept user_api_key_auth in get_tools_from_mcp_servers mock
The side_effect for the all-servers case did not accept the new kwarg,
so tools/list returned an empty list.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): fail fast for unknown tools when server mapping exists
Server-name fallback in call_tool must not open an upstream session when
the tool is absent from a populated mapping. Update the HTTP transport test
to register a known tool before asserting not-found behavior.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix mypy
* Fix mypy
* fix(mcp): preserve tools/call scope on missing tool name; pass user_api_key_auth in list_tools
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): match alias/server_name in _resolve_mcp_server_for_tool_call
The registry lookup in _resolve_mcp_server_for_tool_call previously only
compared candidate.name against the provided server_name, but tool name
prefixes can be derived from a server's alias or server_name (see
get_server_prefix). When the tool→server mapping is empty/stale (cold
start, dynamic tools), the lookup would fail for alias-configured
servers even though get_mcp_server_by_name (used by the REST path)
matches alias, server_name, and name.
Match the same priority of identifiers in both the registry pass and
the unprefixed fallback so the MCP protocol call_tool path is
consistent with the REST path.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): reuse proxy_logging DualCache in inject_mcp_jwt_headers_for_upstream
Instead of allocating a fresh DualCache() on every tools/list invocation,
prefer the shared proxy_logging_obj.internal_usage_cache.dual_cache when
available. The cache argument is currently unused by MCPJWTSigner, but
sharing the proxy's cache avoids per-call allocation overhead and matches
the cache identity used elsewhere in the proxy hook plumbing — so any
future per-request state stored in cache will survive across list calls.
Co-authored-by: Claude <noreply@anthropic.com>
* fix(mcp): return 403 ip_filtering for IP-restricted servers in tools/call name lookup
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(test): accept user_api_key_auth kwarg in list_tools mocks
The proxy-infra job was failing on four TestMCPServerManager tests because
the mock_get_tools_from_server stubs did not accept the new
user_api_key_auth keyword argument that list_tools now forwards to
_get_tools_from_server. Add the kwarg to each stub so list_tools can call
through cleanly.
Co-authored-by: Claude <claude@anthropic.com>
* fix(mcp): skip JWT injection when per-user mcp_auth_header is set
MCPClient._get_auth_headers() applies extra_headers AFTER writing
Authorization from auth_value, so an injected JWT silently overwrites
the user's per-server OAuth token. Guard the JWT signer with
'not mcp_auth_header' so per-user OAuth (and any dict-form per-user
auth) takes precedence, mirroring the existing static_headers guard.
Adds a regression test that the signer's inject helper is not called
when mcp_auth_header is supplied.
* fix(mcp): skip JWT injection when extra_headers already has Authorization
When a server uses per-user OAuth tokens, the resolved token is passed
into _get_tools_from_server via extra_headers. The JWT injection guard
only checked mcp_auth_header and the server's static headers, so the
signer would silently overwrite the user's OAuth Authorization header.
Add a check for an existing Authorization entry in extra_headers so
caller-supplied per-user OAuth tokens take precedence over JWT signing.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(mcp): cover JWT signer + tool-call resolution branches
Adds unit tests for the new MCPServerManager helpers (_resolve_mcp_server_for_tool_call,
_resolve_oauth2_headers_for_tool_call) and the new MCPJWTSigner paths
(_build_scope call_type branches and inject_mcp_jwt_headers_for_upstream).
Brings patch coverage above the auto target without changing behavior.
Co-authored-by: Claude <claude@anthropic.com>
* fix(mcp): retry tool-server lookup with prefixed name in REST mismatch check
When the REST /mcp-rest/tools/call path sends a raw tool name plus
requested_server_id, _get_mcp_server_from_tool_name(name) can return
None if the mapping only stores the prefixed form. That bypassed the
tool_server_mismatch 403 guard and let the call fall through to
trusting requested_server.
Retry the lookup with every known prefix of the requested server so
the mismatch check fires whenever the tool is actually registered.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): always reject unknown tools in server-name fallback
Defense-in-depth: _resolve_mcp_server_for_tool_call previously skipped
the unknown-tool check whenever the per-server mapping had no entries
yet (cold start, OAuth2 lazy listing, or upstream listing failure),
allowing arbitrary tool names to reach upstream servers.
Tighten the check so the server-name fallback always rejects tool
names not present in the mapping. Callers must call list_tools first
(standard MCP flow) before tools/call can resolve. Removes the
now-unused _mapping_has_tools_for_server helper and adds an
explicit empty-mapping rejection test alongside the existing
populated-mapping rejection test.
Co-authored-by: Sameer Kankute <sameer@berri.ai>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude (greptile subagent) <claude-greptile-bot@anthropic.com>
* feat(interactions): migrate to Google Interactions API steps schema (May 2026) (#28153)
* feat(interactions): migrate to Google Interactions API steps schema (May 2026)
Default to Api-Revision: 2026-05-20 (new `steps` schema). Add
`litellm.use_legacy_interactions_schema` global flag that sends
Api-Revision: 2026-05-07 for operators who need the legacy `outputs`
schema until June 8, 2026.
- Inject Api-Revision header in GoogleAIStudioInteractionsConfig.validate_environment()
- Auto-coalesce response_mime_type → response_format and image_config migration on new schema
- Add steps field to InteractionsAPIResponse and InteractionsAPIStreamingResponse
- Add StepStart/StepDelta/StepStop/InteractionCreated/etc. SSE event types
- Update streaming completion detection to handle interaction.completed event
- Bridge transformer populates both outputs and steps fields
- Bridge streaming iterator emits new-schema events by default
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(interactions): address greptile review feedback
- Avoid mutating caller's generation_config dict by shallow-copying
before popping image_config, preventing silent failures on retries
- Skip schema key in response_format when response_format is None to
avoid sending schema: null to the Google Interactions API
- Remove delta field from step.stop events (new schema only); the
StepStop model has no delta field and sending it duplicates already-
streamed text and breaks spec-conformant clients
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): parse use_legacy_interactions_schema string values safely
bool("false") returns True in Python, so quoted YAML values like
"false" or "False" silently activated the legacy Interactions API
schema. Match the env-var parsing pattern in litellm/__init__.py by
treating string inputs as true only when they equal "true" (case
insensitive).
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(interactions): only set object/id/delta on step.stop for legacy schema
StepStop (new schema) has no object, id, or delta fields. Setting them
unconditionally caused spec-breaking extra fields on new-schema step.stop
events in all four construction sites (sync/async × main-loop/StopIteration).
Legacy content.stop still receives id, object, and delta unchanged.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(interactions): stabilize streaming bridge schema, dict aliasing, and lost first delta
- Capture use_legacy_interactions_schema once at iterator construction so
all events emitted by a single stream use a consistent schema, even if
the global flag is mutated mid-stream.
- Check for the buffered interaction.complete/completed event before the
finished check in __next__/__anext__ so the final completion event
(which carries the full collected text in steps) is not dropped after
self.finished is set.
- Copy text content entries before appending to both outputs and the
steps content list to avoid shared mutable dict aliasing between the
two response fields.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix tests
* fix greptile review
* fix(interactions): address Greptile P1 review on schema coalescing and legacy deltas
Skip response_mime_type merge when response_format is already a list, avoid
in-place list mutation on image_config append, and restore delta.type on
legacy content.delta events.
Co-authored-by: Cursor <cursoragent@cursor.com>
* style(interactions): black-format gemini transformation.py
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <noreply@anthropic.com>
* test(ui-e2e): admin key creation with a specific proxy model (#28365)
* test(ui-e2e): add admin key creation with a specific proxy model
Adds Playwright coverage for creating a key (no team) scoped to a single
proxy model, complementing the existing All-Proxy-Models test. Uses a
DOM-dispatched click on the antd dropdown option since the popup
animation can render the option outside the viewport.
* test(ui-e2e): verify scoped key works against mock /chat/completions
Extend the "Create a key with a specific proxy model" test to extract
the new key from the success modal and POST to /chat/completions for
the scoped model, asserting 200 and the mock response body. Without
this the test could pass even if the model selection failed to register.
* fix(vertex_ai): omit function_call id on Vertex Gemini 3.5+ tool turns (#28324)
* fix(vertex_ai): omit function_call id on Vertex Gemini 3.5+ tool turns
Vertex AI rejects `id` on function_call/function_response parts; only Google AI Studio accepts it for Gemini 3.5+ strict tool matching.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Update litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* fix(vertex_ai): forward custom_llm_provider in context caching
Pass custom_llm_provider through to _gemini_convert_messages_with_history
in the context caching path so Gemini 3.5+ tool-call `id` forwarding
behaves consistently between cached and non-cached completions on Google
AI Studio.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <claude@anthropic.com>
* feat(mcp): allow native MCP OAuth support for cursor (#28327)
* feat(mcp): allow native MCP OAuth redirect URIs (cursor://)
Discoverable OAuth /authorize rejected cursor:// callbacks because
validate_trusted_redirect_uri only accepted http/https. Add an
allowlisted native path with a built-in Cursor default and optional
MCP_TRUSTED_NATIVE_REDIRECT_URIS env for other clients.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): address Greptile native redirect URI review
Lowercase paths in normalizer so env allowlist entries match case-
insensitively. Tighten wildcard prefix matching to reject sibling
paths (e.g. callback-2) unless the prefix ends with /.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): reject query params on native OAuth redirect URIs
Greptile: normalization stripped query strings before allowlist compare,
so cursor://.../callback?injected=... could pass validation. Reject any
native redirect_uri with a query component (same as fragments).
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(model_cost_map): add mistral/ministral-8b-2512 entry
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which is not in the cost map.
This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
completion_cost lookup. Add the entry mirroring the existing
openrouter/mistralai/ministral-8b-2512 pricing.
* fix(mcp): lowercase default native redirect URIs
Make _parse_trusted_native_redirect_uris apply the same lowercasing
to built-in defaults as it does to env-var entries.
* fix(tests): backfill local model_cost into remote-fetched map
litellm.model_cost is loaded at import time from the URL pinned to main,
so pricing entries that exist only in this branch (e.g.
mistral/ministral-8b-2512, freshly added because Mistral now returns this
id from mistral-tiny) are absent at test time and completion_cost lookups
raise. Backfill the in-tree backup so cassette-driven cost calculations
resolve against the entries that ship with the branch under test.
Fixes the local_testing_part1 failures on test_completion_mistral_api and
test_completion_mistral_api_modified_input.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Claude <claude@anthropic.com>
* fix(interactions): never drop streamed text deltas; always emit terminal completion (#28394)
* fix(interactions): never drop streamed text deltas; always emit terminal completion
The interactions streaming bridge had two bugs flagged by Greptile on PR #28153:
1. The first OutputTextDeltaEvent (and the second, when no ResponseCreatedEvent
precedes the deltas) was consumed to emit a synthetic interaction.created /
step.start event, but the chunk's text payload was never forwarded as a
step.delta. The text only reappeared in the terminal step.stop, which
defeats the purpose of incremental streaming.
2. When the upstream Responses API stream ended via StopIteration without a
ResponseCompletedEvent, the iterator emitted step.stop but never the
terminal interaction.completed event carrying the full collected text.
This refactors the iterator to translate each upstream chunk into a list of
events (instead of a single event) and buffers them in a deque. A text delta
now expands into [interaction.created, step.start, step.delta] on the first
chunk so no token is dropped, and the StopIteration / StopAsyncIteration
fallback always flushes a terminal interaction.completed event when one
hasn't already been sent.
Both behaviors are covered by new unit tests:
- test_no_text_token_is_dropped_during_streaming
- test_response_created_then_text_delta_emits_step_start_and_delta
- test_stop_iteration_fallback_emits_completion_event
- test_response_completed_emits_stop_then_completion (no double-emit)
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(interactions): correlate EOF terminal events with stream's interaction id
The StopIteration fallback path previously built the terminal step.stop /
interaction.completed events with id=None (legacy content.stop) and a
memory-address fallback string (interaction.completed), neither of which
matched the item_id used by the earlier interaction.created / step.start /
step.delta events in the same stream. Downstream consumers correlating
events by id would see a mismatch.
Persist the interaction id derived from the first upstream chunk (item_id
on an OutputTextDeltaEvent, or response.id on a ResponseCreatedEvent) and
reuse it when flushing the terminal events on EOF.
Author: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* ci(windows): raise UV_HTTP_TIMEOUT to 300s for uv sync
The using_litellm_on_windows job has been hitting flaky PyPI download
timeouts during 'uv sync --frozen --group dev' — different packages on
each rerun (six, pydantic-core), all surfacing the same uv error:
Failed to download distribution due to network timeout.
Try increasing UV_HTTP_TIMEOUT (current value: 30s).
uv's default 30s per-request timeout is too tight for the Windows runner
on this project (50+ deps, several multi-MB wheels), so bump it to 300s
to let slow individual downloads complete instead of failing the build.
* fix(interactions): correlate ResponseCompletedEvent terminal events with stream's interaction id
When a stream starts directly with OutputTextDeltaEvent (no preceding
ResponseCreatedEvent), interaction.created carries item_id while
interaction.completed previously carried response.id from
ResponseCompletedEvent. The two ids can differ, leaving consumers that
correlate events by id unable to match the start and completion events.
Fall back to self._interaction_id (set on the first chunk that derives
an id) before response.id, mirroring the EOF terminal path.
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(proxy): expose Prisma idle/connect timeout + extra DB URL params (#28395)
* fix(proxy): expose Prisma idle/connect timeout + extra DB URL params
Operators have reported large numbers of idle Prisma connections that
never get closed. The proxy already forwards `connection_limit` and
`pool_timeout` to the DATABASE_URL, but had no knob for capping idle
or slow connections. Add three new `general_settings` keys that thread
through to the DATABASE_URL / DIRECT_URL query string:
- `database_connect_timeout` -> Prisma `connect_timeout`
- `database_socket_timeout` -> Prisma `socket_timeout` (the main
knob for closing idle connections from the LiteLLM side)
- `database_extra_connection_params` -> untyped passthrough dict for
any other Prisma URL param (`pgbouncer`, `statement_cache_size`,
`sslmode`, ...); keys here override LiteLLM defaults.
Refactors the duplicated DATABASE_URL/DIRECT_URL param dicts into a
single `_build_db_connection_url_params` helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Update litellm/proxy/proxy_cli.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Litellm oss staging 1 (#28337)
* feat: add Xiaomi MiMo-V2.5-Pro and MiMo-V2.5 OpenRouter model entries (#27700)
Squash-merged by litellm-agent from TorvaldUtne's PR.
* fix(ui): trim whitespace from MCP inspector tool call inputs (#28203)
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* gemini-3.1-flash-lite pricing (#27933)
* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers
* fix pricing
* add service tier
---------
Co-authored-by: shin-berri <shin-laptop@berri.ai>
* fix: incorrect /v1/agents request example (#28131)
* fix(anthropic): accept dict-shape reasoning_effort from Responses bridge (#28201)
* fix(anthropic): accept dict-shape reasoning_effort from Responses bridge
Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks).
Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks.
Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash).
* test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort
Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models.
* test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize
Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop).
* feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite (#28280)
Squash-merged by litellm-agent from ro31337's PR.
* fix(router): wrap aresponses streaming iterator for mid-stream fallbacks (#28215)
Squash-merged by litellm-agent from cwang-otto's PR.
* fix(router): unblock staging — mypy + coverage for aresponses streaming fallback (#28318)
Squash-merged by litellm-agent from cwang-otto's PR.
* fix(responses): forward timeout on completion transformation path (Anthropic, Bedrock, Vertex) (#28133)
Squash-merged by litellm-agent from cwang-otto's PR.
* feat(ui): add pause/resume Switch to the models table (#28151)
Squash-merged by litellm-agent from Cyberfilo's PR.
* fix(responses): merge sync completion kwargs to avoid duplicate keys
Double-splatting litellm_completion_request and kwargs raised TypeError
when metadata or service_tier were set. Match the async merge pattern.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Use proxy base URL for CLI SSO form action (#28271)
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* fix(tests): add mistral/ministral-8b-2512 to cost map and backfill in conftest
Mistral rotated the 'mistral/mistral-tiny' alias to return
'ministral-8b-2512' as the response model, which was missing from the
cost map. This caused test_completion_mistral_api and
test_completion_mistral_api_modified_input to fail in
litellm.completion_cost lookup.
- Add mistral/ministral-8b-2512 entry to both the in-tree
model_prices_and_context_window.json and the bundled
litellm/model_prices_and_context_window_backup.json (mirrors the
existing openrouter/mistralai/ministral-8b-2512 pricing).
- litellm.model_cost is loaded at import time from the URL pinned to
main, so the new backup entry isn't visible at test runtime until
it also lands on main. Backfill any entries missing from the
remote-fetched map into litellm.model_cost in the local_testing
conftest so cost-calculator lookups succeed on this branch.
* fix(tests): drop unnecessary del of conftest backfill loop vars
* fix(router): harden streaming fallback wrapper for bridge iterators
- FallbackResponsesStreamWrapper now uses getattr fallbacks when copying
attributes from the source iterator. The bridge path
(LiteLLMCompletionStreamingIterator used by Anthropic/Bedrock/Vertex)
does not call super().__init__ and is missing response, logging_obj
(it uses litellm_logging_obj), responses_api_provider_config,
start_time, request_data, call_type, and _hidden_params. Previously,
wrapper construction raised AttributeError for any streaming fallback
on the bridge path.
- _aresponses_with_streaming_fallbacks now deep-copies the
litellm_metadata (and metadata) dicts into fallback_kwargs. The
primary attempt mutates this dict in place via
_update_kwargs_with_deployment, so a shallow copy of kwargs was
leaking primary-deployment fields (deployment, model_info, api_base)
into the mid-stream fallback request.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(router): use safe_deep_copy for fallback metadata snapshot
The ban_copy_deepcopy_kwargs CI check rejects copy.deepcopy() on any
variable whose name contains 'kwargs' (incl. fallback_kwargs). Swap
the two copy.deepcopy(fallback_kwargs[...]) calls for safe_deep_copy,
which handles non-picklable values (OTEL spans, etc.) by per-key
deepcopy with fallback to the original reference.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* test(ci): skip chronically flaky build_and_test integration tests
Both tests have been failing on every recent run of build_and_test
against this PR's HEAD (1686967, 1688402, 1689993, 1690877), and the
same two tests also fail intermittently on unrelated commits and other
branches, independent of any code change in this PR (which only touches
router fallback wrappers, the Anthropic Responses bridge, and unrelated
UI/cost-map files).
- tests.test_spend_logs.test_spend_logs: /spend/logs?request_id=...
returns 500 even after a 20s wait for the spend log to be written.
Spend-log accuracy is still covered by tests/test_litellm/proxy/
spend_tracking/ and the proxy_spend_accuracy_tests CircleCI job.
- tests.test_team_members.test_add_multiple_members: /team/info?team_id=
…
What this fixes
litellm.responses()/litellm.aresponses()silently drops thetimeoutparameter on the completion transformation path (used by Anthropic, Vertex AI Claude, Bedrock, and any provider without a nativeBaseResponsesAPIConfig). The timeout works correctly on the native Responses API path (OpenAI, Azure).Net effect:
Router(timeout=N)is a no-op for Anthropic models — calls fall back to the Anthropic SDK default (~600s).Filed as #28132.
Root cause
timeoutis declared as a named parameter ofresponses()— so it's removed from**kwargsby Python. The completion transformation path passed**kwargsto its handler, silently dropping the consumed value:The native path already forwards it explicitly (
timeout=timeout or request_timeouton line 1169). This PR mirrors that pattern.Same class of bug as #22544, which fixed
metadatasilently dropping on the same path.The fix
One line in
litellm/responses/main.py:The downstream handler already accepts
**kwargsand forwards them tolitellm.completion()/litellm.acompletion(), which respectstimeoutnatively.Verification
1. Unit test (added in
tests/llm_responses_api_testing/test_anthropic_responses_api.py)test_aresponses_forwards_timeout_to_acompletionmockslitellm.acompletionand asserts thattimeout=42passed tolitellm.aresponses()reaches the underlyingacompletioncall.Verified red→green:
AssertionError: timeout was not forwarded to acompletion (got None)2. End-to-end repro with a slow HTTP server
A standalone repro spins up a local HTTP server that accepts the connection then sleeps 30s before responding. We call
litellm.aresponses(model="anthropic/...", timeout=3)pointed at it.Repro script (uses no real API key —
api_baseis overridden to local slow server):Click to expand
Other named params at risk (same class)
While this PR targets
timeout, the same wrong-pattern affects other parameters declared as named inresponses()but not inresponse_api_optional_params. Easy follow-up —extra_queryis the most notable.timeoutextra_querymetadataAffected providers
Any provider routed via completion transformation (i.e. no native
BaseResponsesAPIConfig):anthropic/claude-*)vertex_ai/claude-*)bedrock/anthropic.*)Related
metadatafix (same bug class, same fix shape)litellm_settings.request_timeout#25591 — separatetimeoutfix path (litellm_settings-derived timeouts)