[litellm-agent] Staging → litellm_internal_staging (5/22/2026)#28542
[litellm-agent] Staging → litellm_internal_staging (5/22/2026)#28542oss-pr-review-agent-shin[bot] wants to merge 10 commits into
Conversation
…#27700) Squash-merged by litellm-agent from TorvaldUtne's PR.
Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai>
…dge (#28201) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks). Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks. Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash). * test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models. * test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop).
…28280) Squash-merged by litellm-agent from ro31337's PR.
…cks (#28215) Squash-merged by litellm-agent from cwang-otto's PR.
…ng fallback (#28318) Squash-merged by litellm-agent from cwang-otto's PR.
…PI requests (#28431) Squash-merged by litellm-agent from cwang-otto's PR.
|
@greptile please review |
|
|
Greptile SummaryThis automated staging PR bundles five independent fixes: mid-stream fallback support for the Responses API streaming path in
Confidence Score: 4/5Safe to merge after a second look at the two areas noted; all other changes are well-contained with good test coverage. The Responses API streaming fallback machinery in
|
| Filename | Overview |
|---|---|
| litellm/llms/anthropic/chat/transformation.py | Relaxes the reasoning_effort guard from isinstance(value, str) to also accept dict shape {"effort": "...", "summary": "..."} produced by the Responses→Chat bridge, coercing to the effort string before mapping. Fix is narrow and well-tested. |
| litellm/llms/openai/responses/transformation.py | Adds remove_cache_control_flag_from_input_and_tools to strip Anthropic-only cache_control markers before sending to OpenAI's Responses API. Correct fix but filter_value_from_dict mutates the caller's input in-place, which is a latent issue for retry scenarios. |
| litellm/router.py | Adds mid-stream fallback handling for the Responses API path (_aresponses_streaming_iterator, _aresponses_with_streaming_fallbacks, and three static helpers). Large addition with comprehensive tests; FallbackResponsesStreamWrapper bypasses super().__init__() creating a manual attribute surface that could drift from the base class. |
| litellm/utils.py | Two-part fix for custom pricing regression (#28336): strips litellm_provider: None from existing_model in register_model, and treats None as a wildcard match in _check_provider_match. Both changes are safe and well-tested. |
| model_prices_and_context_window.json | Adds stable (GA) gemini-3.1-flash-lite entries for bare, gemini/, vertex_ai/, and openrouter/google/ prefixes. Pricing is consistent with the existing preview variant. |
| ui/litellm-dashboard/src/components/mcp_tools/ToolTestPanel.tsx | Trims leading/trailing whitespace from string inputs before type-conversion and submission. Small, targeted change with no behavioural side-effects for non-string types. |
| tests/router_unit_tests/test_router_aresponses_streaming_fallback.py | New unit test file covering all four Responses-API streaming fallback helpers. Tests are mock-only with no real network calls. Good coverage of partial-usage combining, continuation input, and passthrough paths. |
| tests/test_litellm/test_router.py | Adds six new aresponses streaming fallback tests plus reformats an existing assertion block. All tests use mocks; no real network calls. Good regression coverage for metadata key routing, pre/post-chunk paths, and usage combining. |
Reviews (1): Last reviewed commit: "Treat None litellm_provider as wildcard ..." | Re-trigger Greptile
|
|
||
| return final_request_params | ||
|
|
||
| def remove_cache_control_flag_from_input_and_tools( | ||
| self, | ||
| model: str, # allows overrides to selectively run this | ||
| input: Union[str, ResponseInputParam], | ||
| tools: Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]] = None, | ||
| ) -> Tuple[ | ||
| Union[str, ResponseInputParam], | ||
| Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]], | ||
| ]: | ||
| """Sibling of `remove_cache_control_flag_from_messages_and_tools` on | ||
| the chat path. Strips Anthropic-only `cache_control` markers from | ||
| Responses API input content blocks and tools. | ||
|
|
||
| `filter_value_from_dict` mutates each dict in place, so the same | ||
| objects are returned. | ||
| """ | ||
| from litellm.litellm_core_utils.prompt_templates.common_utils import ( | ||
| filter_value_from_dict, | ||
| ) | ||
|
|
||
| if isinstance(input, list): | ||
| for item in input: | ||
| if isinstance(item, dict): | ||
| filter_value_from_dict(cast(dict, item), "cache_control") | ||
|
|
||
| if tools is not None: | ||
| for tool in tools: | ||
| if isinstance(tool, dict): | ||
| filter_value_from_dict(cast(dict, tool), "cache_control") | ||
|
|
||
| return input, tools | ||
|
|
||
| def _validate_input_param( | ||
| self, input: Union[str, ResponseInputParam] |
There was a problem hiding this comment.
In-place mutation of caller's input
filter_value_from_dict mutates each dict in-place, so the original input list/items passed by the caller are permanently modified before the request is even sent. If a higher-level caller holds a reference to the same input data and retries with a different provider (e.g. first tries OpenAI, falls back to Anthropic), the Anthropic provider will receive input that has already had its cache_control fields stripped. The docstring acknowledges the mutation but does not warn callers about this reuse hazard.
| "must read as a seamless continuation." | ||
| ), | ||
| } | ||
| ], | ||
| }, | ||
| { | ||
| "type": "message", | ||
| "role": "assistant", | ||
| "content": [{"type": "output_text", "text": generated_content}], | ||
| }, | ||
| ] | ||
| return cast("ResponseInputParam", base + continuation) | ||
|
|
||
| async def _aresponses_streaming_iterator( | ||
| self, | ||
| response: "BaseResponsesAPIStreamingIterator", | ||
| initial_kwargs: Dict[str, Any], | ||
| ) -> "BaseResponsesAPIStreamingIterator": | ||
| """ | ||
| Wrap a Responses-API streaming iterator so MidStreamFallbackError | ||
| triggers the Router's fallback chain (parity with | ||
| _acompletion_streaming_iterator for the chat-completions path). | ||
|
|
||
| The Responses-API streaming path goes through | ||
| _ageneric_api_call_with_fallbacks rather than _acompletion, so the | ||
| returned iterator is never wrapped by the chat completions | ||
| fallback handler. Without this wrapper, MidStreamFallbackError | ||
| raised mid-stream from the underlying CustomStreamWrapper (used by | ||
| LiteLLMCompletionStreamingIterator when the Responses API is | ||
| served via the completion bridge) propagates unhandled and the | ||
| configured cross-provider fallback never fires. | ||
|
|
||
| Full parity with the chat-completions path: | ||
| - Pre-first-chunk: retry with the original input unchanged. | ||
| - Partial content: inject a developer instruction + prior | ||
| assistant message carrying the generated text so the fallback | ||
| model continues rather than restarts. | ||
| - Usage combining: merge partial-stream usage onto the fallback's | ||
| response.completed event so accounting reflects both attempts. | ||
| - Stream cleanup: shielded aclose() on both source and fallback | ||
| iterators on terminate. | ||
| """ | ||
| from litellm.exceptions import MidStreamFallbackError | ||
| from litellm.responses.streaming_iterator import ( | ||
| BaseResponsesAPIStreamingIterator, | ||
| ) | ||
|
|
||
| source_iterator = response | ||
|
|
||
| class FallbackResponsesStreamWrapper(BaseResponsesAPIStreamingIterator): |
There was a problem hiding this comment.
Manual attribute init in
FallbackResponsesStreamWrapper may drift from base class
FallbackResponsesStreamWrapper intentionally bypasses super().__init__() and manually initialises every attribute BaseResponsesAPIStreamingIterator.__init__ would set (e.g. _failure_handled, _completed_response_cached, _persist_completed_response_before_logging, etc.). Any new attribute added to the base class in the future will silently be absent from the wrapper, which can lead to AttributeError in methods like _check_max_streaming_duration or _handle_failure that the proxy layer may call. A comment or assertion guarding the set of required attributes would help prevent future drift.
|
Closing this in favour of #28582 |
Automated staging PR created by litellm-agent.
This branch collects PRs approved by the agent on 5/22/2026.