[litellm-agent] Staging → litellm_internal_staging (5/22/2026) by oss-pr-review-agent-shin[bot] · Pull Request #28542 · BerriAI/litellm

oss-pr-review-agent-shin · 2026-05-22T03:06:03Z

Automated staging PR created by litellm-agent.

This branch collects PRs approved by the agent on 5/22/2026.

⚠️ Human review required before CI. Convert from draft to ready when you've reviewed the diff.

…#27700) Squash-merged by litellm-agent from TorvaldUtne's PR.

Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai>

@greptile-apps

…dge (#28201) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks). Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks. Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash). * test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models. * test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop).

…28280) Squash-merged by litellm-agent from ro31337's PR.

…cks (#28215) Squash-merged by litellm-agent from cwang-otto's PR.

…ng fallback (#28318) Squash-merged by litellm-agent from cwang-otto's PR.

…PI requests (#28431) Squash-merged by litellm-agent from cwang-otto's PR.

) Squash-merged by litellm-agent from adityasingh2400's PR.

oss-pr-review-agent-shin · 2026-05-22T03:06:05Z

@greptile please review

CLAassistant · 2026-05-22T03:06:13Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 7 committers have signed the CLA.

✅ TorvaldUtne
✅ IshaMeera
✅ ro31337
✅ cwang-otto
✅ mubashir1osmani
❌ oss-agent-shin
❌ adityasingh2400
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

greptile-apps · 2026-05-22T03:16:21Z

Greptile Summary

This automated staging PR bundles five independent fixes: mid-stream fallback support for the Responses API streaming path in Router, Anthropic reasoning_effort dict-shape handling, cache_control stripping for OpenAI's Responses API, a custom pricing regression fix in utils.py, and new gemini-3.1-flash-lite (GA) pricing entries.

router.py: Adds _aresponses_with_streaming_fallbacks and _aresponses_streaming_iterator to give the Responses API path the same mid-stream fallback chain that chat completions already had. An inner FallbackResponsesStreamWrapper subclasses BaseResponsesAPIStreamingIterator without calling super().__init__() and manually copies all expected attributes from the source iterator.
llms/anthropic/chat/transformation.py: Widens the reasoning_effort parameter to accept {\"effort\": \"low\", \"summary\": \"concise\"} dict shapes produced by the Responses→Chat bridge, coercing to the string form before mapping.
llms/openai/responses/transformation.py: Strips Anthropic-only cache_control markers from input content blocks and tools before sending to OpenAI, mirroring the existing chat-completions strip logic — note the strip mutates the caller's input in-place.
utils.py: Two-part fix so register_model no longer persists litellm_provider: None into model_cost, and _check_provider_match treats None as a wildcard, restoring custom pricing for provider-less deployments.

Confidence Score: 4/5

Safe to merge after a second look at the two areas noted; all other changes are well-contained with good test coverage.

The Responses API streaming fallback machinery in router.py is a substantial new code path. FallbackResponsesStreamWrapper skips the base-class constructor and clones ~15 attributes by hand — any future attribute addition to BaseResponsesAPIStreamingIterator will silently be absent from the wrapper. The in-place mutation of the caller's input list in remove_cache_control_flag_from_input_and_tools is a latent hazard for provider-fallback retry flows where the same input object is reused. Both issues are non-blocking today but worth tracking. The rest of the changes (reasoning_effort widening, pricing fix, model prices, UI trim) are narrow and thoroughly tested.

litellm/router.py (manual attribute init in FallbackResponsesStreamWrapper) and litellm/llms/openai/responses/transformation.py (in-place mutation of input in remove_cache_control_flag_from_input_and_tools).

Important Files Changed

Filename	Overview
litellm/llms/anthropic/chat/transformation.py	Relaxes the `reasoning_effort` guard from `isinstance(value, str)` to also accept dict shape `{"effort": "...", "summary": "..."}` produced by the Responses→Chat bridge, coercing to the effort string before mapping. Fix is narrow and well-tested.
litellm/llms/openai/responses/transformation.py	Adds `remove_cache_control_flag_from_input_and_tools` to strip Anthropic-only `cache_control` markers before sending to OpenAI's Responses API. Correct fix but `filter_value_from_dict` mutates the caller's input in-place, which is a latent issue for retry scenarios.
litellm/router.py	Adds mid-stream fallback handling for the Responses API path (`_aresponses_streaming_iterator`, `_aresponses_with_streaming_fallbacks`, and three static helpers). Large addition with comprehensive tests; `FallbackResponsesStreamWrapper` bypasses `super().__init__()` creating a manual attribute surface that could drift from the base class.
litellm/utils.py	Two-part fix for custom pricing regression (#28336): strips `litellm_provider: None` from `existing_model` in `register_model`, and treats `None` as a wildcard match in `_check_provider_match`. Both changes are safe and well-tested.
model_prices_and_context_window.json	Adds stable (GA) `gemini-3.1-flash-lite` entries for bare, `gemini/`, `vertex_ai/`, and `openrouter/google/` prefixes. Pricing is consistent with the existing preview variant.
ui/litellm-dashboard/src/components/mcp_tools/ToolTestPanel.tsx	Trims leading/trailing whitespace from string inputs before type-conversion and submission. Small, targeted change with no behavioural side-effects for non-string types.
tests/router_unit_tests/test_router_aresponses_streaming_fallback.py	New unit test file covering all four Responses-API streaming fallback helpers. Tests are mock-only with no real network calls. Good coverage of partial-usage combining, continuation input, and passthrough paths.
tests/test_litellm/test_router.py	Adds six new `aresponses` streaming fallback tests plus reformats an existing assertion block. All tests use mocks; no real network calls. Good regression coverage for metadata key routing, pre/post-chunk paths, and usage combining.

_{Reviews (1): Last reviewed commit: "Treat None litellm_provider as wildcard ..." | Re-trigger Greptile}

greptile-apps · 2026-05-22T03:16:26Z


        return final_request_params

+    def remove_cache_control_flag_from_input_and_tools(
+        self,
+        model: str,  # allows overrides to selectively run this
+        input: Union[str, ResponseInputParam],
+        tools: Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]] = None,
+    ) -> Tuple[
+        Union[str, ResponseInputParam],
+        Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]],
+    ]:
+        """Sibling of `remove_cache_control_flag_from_messages_and_tools` on
+        the chat path. Strips Anthropic-only `cache_control` markers from
+        Responses API input content blocks and tools.
+
+        `filter_value_from_dict` mutates each dict in place, so the same
+        objects are returned.
+        """
+        from litellm.litellm_core_utils.prompt_templates.common_utils import (
+            filter_value_from_dict,
+        )
+
+        if isinstance(input, list):
+            for item in input:
+                if isinstance(item, dict):
+                    filter_value_from_dict(cast(dict, item), "cache_control")
+
+        if tools is not None:
+            for tool in tools:
+                if isinstance(tool, dict):
+                    filter_value_from_dict(cast(dict, tool), "cache_control")
+
+        return input, tools
+
    def _validate_input_param(
        self, input: Union[str, ResponseInputParam]


In-place mutation of caller's input

filter_value_from_dict mutates each dict in-place, so the original input list/items passed by the caller are permanently modified before the request is even sent. If a higher-level caller holds a reference to the same input data and retries with a different provider (e.g. first tries OpenAI, falls back to Anthropic), the Anthropic provider will receive input that has already had its cache_control fields stripped. The docstring acknowledges the mutation but does not warn callers about this reuse hazard.

greptile-apps · 2026-05-22T03:16:27Z

+                            "must read as a seamless continuation."
+                        ),
+                    }
+                ],
+            },
+            {
+                "type": "message",
+                "role": "assistant",
+                "content": [{"type": "output_text", "text": generated_content}],
+            },
+        ]
+        return cast("ResponseInputParam", base + continuation)
+
+    async def _aresponses_streaming_iterator(
+        self,
+        response: "BaseResponsesAPIStreamingIterator",
+        initial_kwargs: Dict[str, Any],
+    ) -> "BaseResponsesAPIStreamingIterator":
+        """
+        Wrap a Responses-API streaming iterator so MidStreamFallbackError
+        triggers the Router's fallback chain (parity with
+        _acompletion_streaming_iterator for the chat-completions path).
+
+        The Responses-API streaming path goes through
+        _ageneric_api_call_with_fallbacks rather than _acompletion, so the
+        returned iterator is never wrapped by the chat completions
+        fallback handler. Without this wrapper, MidStreamFallbackError
+        raised mid-stream from the underlying CustomStreamWrapper (used by
+        LiteLLMCompletionStreamingIterator when the Responses API is
+        served via the completion bridge) propagates unhandled and the
+        configured cross-provider fallback never fires.
+
+        Full parity with the chat-completions path:
+          - Pre-first-chunk: retry with the original input unchanged.
+          - Partial content: inject a developer instruction + prior
+            assistant message carrying the generated text so the fallback
+            model continues rather than restarts.
+          - Usage combining: merge partial-stream usage onto the fallback's
+            response.completed event so accounting reflects both attempts.
+          - Stream cleanup: shielded aclose() on both source and fallback
+            iterators on terminate.
+        """
+        from litellm.exceptions import MidStreamFallbackError
+        from litellm.responses.streaming_iterator import (
+            BaseResponsesAPIStreamingIterator,
+        )
+
+        source_iterator = response
+
+        class FallbackResponsesStreamWrapper(BaseResponsesAPIStreamingIterator):


Manual attribute init in FallbackResponsesStreamWrapper may drift from base class

FallbackResponsesStreamWrapper intentionally bypasses super().__init__() and manually initialises every attribute BaseResponsesAPIStreamingIterator.__init__ would set (e.g. _failure_handled, _completed_response_cached, _persist_completed_response_before_logging, etc.). Any new attribute added to the base class in the future will silently be absent from the wrapper, which can lead to AttributeError in methods like _check_max_streaming_duration or _handle_failure that the proxy layer may call. A comment or assertion guarding the set of required attributes would help prevent future drift.

Sameerlite · 2026-05-22T12:06:00Z

Closing this in favour of #28582

TorvaldUtne and others added 10 commits May 19, 2026 02:38

feat: add Xiaomi MiMo-V2.5-Pro and MiMo-V2.5 OpenRouter model entries (…

43bc7d6

…#27700) Squash-merged by litellm-agent from TorvaldUtne's PR.

fix(ui): trim whitespace from MCP inspector tool call inputs (#28203)

91927eb

Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>

gemini-3.1-flash-lite pricing (#27933)

6f83cb2

* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai>

fix: incorrect /v1/agents request example (#28131)

249ec01

feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite (#…

87a55f5

…28280) Squash-merged by litellm-agent from ro31337's PR.

fix(router): wrap aresponses streaming iterator for mid-stream fallba…

5039e63

…cks (#28215) Squash-merged by litellm-agent from cwang-otto's PR.

fix(router): unblock staging — mypy + coverage for aresponses streami…

6ea1f57

…ng fallback (#28318) Squash-merged by litellm-agent from cwang-otto's PR.

fix(openai-responses): strip Anthropic cache_control from Responses A…

f92e1b0

…PI requests (#28431) Squash-merged by litellm-agent from cwang-otto's PR.

Treat None litellm_provider as wildcard in _check_provider_match (#28523

1cfe37d

) Squash-merged by litellm-agent from adityasingh2400's PR.

oss-pr-review-agent-shin Bot mentioned this pull request May 22, 2026

Treat None litellm_provider as wildcard in _check_provider_match #28523

Merged

4 tasks

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

Sameerlite closed this May 22, 2026

Sameerlite deleted the shin_agent_oss_staging_05_22_2026 branch May 22, 2026 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[litellm-agent] Staging → litellm_internal_staging (5/22/2026)#28542

[litellm-agent] Staging → litellm_internal_staging (5/22/2026)#28542
oss-pr-review-agent-shin[bot] wants to merge 10 commits into
litellm_internal_stagingfrom
shin_agent_oss_staging_05_22_2026

oss-pr-review-agent-shin Bot commented May 22, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 22, 2026

Uh oh!

CLAassistant commented May 22, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 22, 2026

Important Files Changed

Uh oh!

greptile-apps Bot May 22, 2026

Uh oh!

greptile-apps Bot May 22, 2026

Uh oh!

Sameerlite commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Uh oh!

Conversation

oss-pr-review-agent-shin Bot commented May 22, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 22, 2026

Uh oh!

CLAassistant commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 22, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Sameerlite commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

CLAassistant commented May 22, 2026 •

edited

Loading