[litellm-agent] Staging → litellm_internal_staging (5/21/2026) by oss-pr-review-agent-shin[bot] · Pull Request #28432 · BerriAI/litellm

oss-pr-review-agent-shin · 2026-05-21T02:55:30Z

Automated staging PR created by litellm-agent.

This branch collects PRs approved by the agent on 5/21/2026.

⚠️ Human review required before CI. Convert from draft to ready when you've reviewed the diff.

…#27700) Squash-merged by litellm-agent from TorvaldUtne's PR.

Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai>

@greptile-apps

…dge (#28201) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks). Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks. Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash). * test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models. * test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop).

…28280) Squash-merged by litellm-agent from ro31337's PR.

…cks (#28215) Squash-merged by litellm-agent from cwang-otto's PR.

…ng fallback (#28318) Squash-merged by litellm-agent from cwang-otto's PR.

…PI requests (#28431) Squash-merged by litellm-agent from cwang-otto's PR.

oss-pr-review-agent-shin · 2026-05-21T02:55:32Z

@greptile please review

greptile-apps · 2026-05-21T03:02:44Z

Greptile Summary

This PR bundles four independent improvements collected on 5/21/2026: mid-stream fallback support for the Responses API streaming path in Router, a fix for dict-shaped reasoning_effort being silently dropped in the Anthropic transformation, cache_control stripping for the OpenAI Responses API (mirroring the existing chat-completions path), and GA pricing entries for gemini-3.1-flash-lite across all three provider prefixes.

router.py: _aresponses_with_streaming_fallbacks wraps the returned iterator with _aresponses_streaming_iterator, which catches MidStreamFallbackError, injects a continuation prompt if partial content exists, merges partial-stream token usage onto the fallback's response.completed event, and shields cleanup with anyio.CancelScope. Full parity with _acompletion_streaming_iterator.
transformation.py (Anthropic): reasoning_effort branch now extracts the effort string from a dict shape (produced by the Responses→Chat bridge when summary is set on the reasoning field) before mapping, preventing silent disabling of extended thinking.
transformation.py (OpenAI responses): Adds remove_cache_control_flag_from_input_and_tools to strip Anthropic-only cache_control markers from Responses API input and tools, eliminating HTTP 400s from OpenAI for unknown parameters.

Confidence Score: 4/5

Safe to merge; the two observations in router.py are edge-case concerns that do not affect the primary streaming fallback path.

The FallbackResponsesStreamWrapper bypasses super().__init__() and manually mirrors every attribute, creating a maintenance coupling risk if the base class evolves. Additionally, when async_function_with_fallbacks_common_utils returns a non-iterable fallback in a stream=True context, the generator yields the raw ResponsesAPIResponse object as if it were a stream event rather than yielding None as the chat-completions fallback does — this could silently produce incorrect event-stream output in that specific scenario. Both concerns are unlikely to fire in practice given the existing test coverage.

litellm/router.py — specifically FallbackResponsesStreamWrapper.__init__ attribute mirroring and the non-streaming fallback yield branch inside stream_with_fallbacks.

Important Files Changed

Filename	Overview
litellm/llms/anthropic/chat/transformation.py	Fixes `reasoning_effort` handling to accept both string and dict shapes; the `isinstance(value, str)` guard was silently dropping dict-shaped values from the Responses→Chat bridge.
litellm/llms/openai/responses/transformation.py	Adds `remove_cache_control_flag_from_input_and_tools` to strip Anthropic-only `cache_control` markers before sending to OpenAI's Responses API; mirrors the existing chat-completions path.
litellm/router.py	Adds full mid-stream fallback support for the Responses API streaming path: `_aresponses_streaming_iterator`, `_aresponses_with_streaming_fallbacks`, and helpers for partial-usage extraction and continuation-input construction; large addition with minor edge-case gap.
tests/router_unit_tests/test_router_aresponses_streaming_fallback.py	New mock-only unit-test file covering all four new Router helpers; no real network calls.
tests/test_litellm/test_router.py	Adds mock tests for the aresponses streaming fallback path; reformats one existing assertion without changing its logic.
model_prices_and_context_window.json	Adds GA pricing entries for `gemini-3.1-flash-lite` across `vertex_ai`, `gemini/`, and `openrouter/google/` prefixes, completing the missing sibling entries for the stable variant.
tests/test_litellm/llms/anthropic/chat/test_anthropic_chat_transformation.py	New parametrized tests covering string/dict-shaped `reasoning_effort` for both adaptive and non-adaptive Anthropic models, plus bad-value drop tests.
tests/test_litellm/llms/openai/responses/test_openai_responses_transformation.py	New tests for `cache_control` stripping in input content blocks and tools; covers both the mutation path and the no-op path.
ui/litellm-dashboard/src/components/mcp_tools/ToolTestPanel.tsx	Adds `.trim()` normalization to string inputs before type-conversion; change is applied consistently to all branches.
litellm/proxy/_lazy_openapi_snapshot.json	Updates the example `curl` request body for `POST /v1/agents` in the OpenAPI snapshot; documentation change only.
tests/test_litellm/test_cost_calculator.py	Adds a regression test asserting the `openrouter/google/gemini-3.1-flash-lite` pricing entry exists with correct costs.

_{Reviews (1): Last reviewed commit: "fix(openai-responses): strip Anthropic c..." | Re-trigger Greptile}

greptile-apps · 2026-05-21T03:02:47Z

+                    else:
+                        yield fallback_response


Non-streaming fallback yields full response object as stream event

When async_function_with_fallbacks_common_utils returns a non-iterable (i.e., a completed ResponsesAPIResponse rather than a streaming iterator), the code yields the entire response object directly as a stream event. Any downstream consumer that expects events with a .type field (e.g., response.created, response.completed) will receive a ResponsesAPIResponse instead and likely produce an AttributeError or silently corrupt the event stream. The equivalent chat-completions fallback path yields None in this case, which is at minimum consistently neutral. Emitting a full response object here is more likely to cause subtle failures in production.

greptile-apps · 2026-05-21T03:02:48Z

+        class FallbackResponsesStreamWrapper(BaseResponsesAPIStreamingIterator):
+            """
+            Subclasses BaseResponsesAPIStreamingIterator only for isinstance
+            compatibility (proxy + interactions code paths check the type).
+            Bypasses the parent constructor and delegates iteration to an
+            async generator.
+            """
+
+            def __init__(self, async_generator: AsyncGenerator):
+                import time
+
+                self._async_generator = async_generator
+                # Mirror every attribute BaseResponsesAPIStreamingIterator.__init__
+                # would have set. The wrapper bypasses super().__init__ (it has no
+                # httpx.Response of its own and no provider config to drive), so
+                # we copy from source_iterator where applicable and use safe
+                # defaults elsewhere. This keeps inherited methods (e.g.
+                # _check_max_streaming_duration, _handle_failure) safe to call.
+                self.response = source_iterator.response
+                self.model = source_iterator.model
+                self.logging_obj = source_iterator.logging_obj
+                self.finished = False
+                self.responses_api_provider_config = (
+                    source_iterator.responses_api_provider_config
+                )
+                self.completed_response = None
+                self.start_time = source_iterator.start_time
+                self._failure_handled = False
+                self._completed_response_cached = False
+                self._completed_response_logged = False
+                self._completed_response_cache_hit = None
+                self._persist_completed_response_before_logging = True
+                self._stream_created_time = time.time()
+                self.litellm_metadata = source_iterator.litellm_metadata
+                self.custom_llm_provider = source_iterator.custom_llm_provider
+                self.request_data = source_iterator.request_data
+                self.call_type = source_iterator.call_type
+                # Preserve hidden params so response headers (model_id,
+                # api_base, additional_headers) keep flowing.
+                self._hidden_params = dict(source_iterator._hidden_params or {})
+
+            def __aiter__(self):
+                return self
+
+            async def __anext__(self):
+                return await self._async_generator.__anext__()


FallbackResponsesStreamWrapper manually mirrors all parent attributes without calling super().__init__()

The constructor copies every attribute BaseResponsesAPIStreamingIterator.__init__ would set. Any future attribute added to the base class __init__ (e.g., a new flag or sub-object) will be silently absent from the wrapper, potentially causing AttributeError in inherited helper methods like _check_max_streaming_duration or _handle_failure that rely on those attributes. The docstring acknowledges the bypass but there is no enforced coupling. Consider at least adding a comment enumerating which version of BaseResponsesAPIStreamingIterator.__init__ this mirrors so reviewers know when to update it.

Sameerlite · 2026-05-22T12:04:12Z

https://github.com/BerriAI/litellm/pull/28542/commits

THis PR has the same commits, closing this

TorvaldUtne and others added 9 commits May 19, 2026 02:38

feat: add Xiaomi MiMo-V2.5-Pro and MiMo-V2.5 OpenRouter model entries (…

43bc7d6

…#27700) Squash-merged by litellm-agent from TorvaldUtne's PR.

fix(ui): trim whitespace from MCP inspector tool call inputs (#28203)

91927eb

Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>

gemini-3.1-flash-lite pricing (#27933)

6f83cb2

* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai>

fix: incorrect /v1/agents request example (#28131)

249ec01

feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite (#…

87a55f5

…28280) Squash-merged by litellm-agent from ro31337's PR.

fix(router): wrap aresponses streaming iterator for mid-stream fallba…

5039e63

…cks (#28215) Squash-merged by litellm-agent from cwang-otto's PR.

fix(router): unblock staging — mypy + coverage for aresponses streami…

6ea1f57

…ng fallback (#28318) Squash-merged by litellm-agent from cwang-otto's PR.

fix(openai-responses): strip Anthropic cache_control from Responses A…

f92e1b0

…PI requests (#28431) Squash-merged by litellm-agent from cwang-otto's PR.

oss-pr-review-agent-shin Bot mentioned this pull request May 21, 2026

fix(openai-responses): strip Anthropic cache_control from Responses API requests #28431

Merged

5 tasks

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Sameerlite closed this May 22, 2026

Sameerlite deleted the shin_agent_oss_staging_05_21_2026 branch May 22, 2026 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[litellm-agent] Staging → litellm_internal_staging (5/21/2026)#28432

[litellm-agent] Staging → litellm_internal_staging (5/21/2026)#28432
oss-pr-review-agent-shin[bot] wants to merge 9 commits into
litellm_internal_stagingfrom
shin_agent_oss_staging_05_21_2026

oss-pr-review-agent-shin Bot commented May 21, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 21, 2026

Uh oh!

greptile-apps Bot commented May 21, 2026

Important Files Changed

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

Sameerlite commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

oss-pr-review-agent-shin Bot commented May 21, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 21, 2026

Uh oh!

greptile-apps Bot commented May 21, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Sameerlite commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants