[litellm-agent] Staging → litellm_internal_staging (5/21/2026)#28432
[litellm-agent] Staging → litellm_internal_staging (5/21/2026)#28432oss-pr-review-agent-shin[bot] wants to merge 9 commits into
Conversation
…#27700) Squash-merged by litellm-agent from TorvaldUtne's PR.
Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai>
…dge (#28201) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks). Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks. Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash). * test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models. * test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop).
…28280) Squash-merged by litellm-agent from ro31337's PR.
…cks (#28215) Squash-merged by litellm-agent from cwang-otto's PR.
…ng fallback (#28318) Squash-merged by litellm-agent from cwang-otto's PR.
…PI requests (#28431) Squash-merged by litellm-agent from cwang-otto's PR.
|
@greptile please review |
Greptile SummaryThis PR bundles four independent improvements collected on 5/21/2026: mid-stream fallback support for the Responses API streaming path in Router, a fix for dict-shaped
Confidence Score: 4/5Safe to merge; the two observations in The litellm/router.py — specifically
|
| Filename | Overview |
|---|---|
| litellm/llms/anthropic/chat/transformation.py | Fixes reasoning_effort handling to accept both string and dict shapes; the isinstance(value, str) guard was silently dropping dict-shaped values from the Responses→Chat bridge. |
| litellm/llms/openai/responses/transformation.py | Adds remove_cache_control_flag_from_input_and_tools to strip Anthropic-only cache_control markers before sending to OpenAI's Responses API; mirrors the existing chat-completions path. |
| litellm/router.py | Adds full mid-stream fallback support for the Responses API streaming path: _aresponses_streaming_iterator, _aresponses_with_streaming_fallbacks, and helpers for partial-usage extraction and continuation-input construction; large addition with minor edge-case gap. |
| tests/router_unit_tests/test_router_aresponses_streaming_fallback.py | New mock-only unit-test file covering all four new Router helpers; no real network calls. |
| tests/test_litellm/test_router.py | Adds mock tests for the aresponses streaming fallback path; reformats one existing assertion without changing its logic. |
| model_prices_and_context_window.json | Adds GA pricing entries for gemini-3.1-flash-lite across vertex_ai, gemini/, and openrouter/google/ prefixes, completing the missing sibling entries for the stable variant. |
| tests/test_litellm/llms/anthropic/chat/test_anthropic_chat_transformation.py | New parametrized tests covering string/dict-shaped reasoning_effort for both adaptive and non-adaptive Anthropic models, plus bad-value drop tests. |
| tests/test_litellm/llms/openai/responses/test_openai_responses_transformation.py | New tests for cache_control stripping in input content blocks and tools; covers both the mutation path and the no-op path. |
| ui/litellm-dashboard/src/components/mcp_tools/ToolTestPanel.tsx | Adds .trim() normalization to string inputs before type-conversion; change is applied consistently to all branches. |
| litellm/proxy/_lazy_openapi_snapshot.json | Updates the example curl request body for POST /v1/agents in the OpenAPI snapshot; documentation change only. |
| tests/test_litellm/test_cost_calculator.py | Adds a regression test asserting the openrouter/google/gemini-3.1-flash-lite pricing entry exists with correct costs. |
Reviews (1): Last reviewed commit: "fix(openai-responses): strip Anthropic c..." | Re-trigger Greptile
| else: | ||
| yield fallback_response |
There was a problem hiding this comment.
Non-streaming fallback yields full response object as stream event
When async_function_with_fallbacks_common_utils returns a non-iterable (i.e., a completed ResponsesAPIResponse rather than a streaming iterator), the code yields the entire response object directly as a stream event. Any downstream consumer that expects events with a .type field (e.g., response.created, response.completed) will receive a ResponsesAPIResponse instead and likely produce an AttributeError or silently corrupt the event stream. The equivalent chat-completions fallback path yields None in this case, which is at minimum consistently neutral. Emitting a full response object here is more likely to cause subtle failures in production.
| class FallbackResponsesStreamWrapper(BaseResponsesAPIStreamingIterator): | ||
| """ | ||
| Subclasses BaseResponsesAPIStreamingIterator only for isinstance | ||
| compatibility (proxy + interactions code paths check the type). | ||
| Bypasses the parent constructor and delegates iteration to an | ||
| async generator. | ||
| """ | ||
|
|
||
| def __init__(self, async_generator: AsyncGenerator): | ||
| import time | ||
|
|
||
| self._async_generator = async_generator | ||
| # Mirror every attribute BaseResponsesAPIStreamingIterator.__init__ | ||
| # would have set. The wrapper bypasses super().__init__ (it has no | ||
| # httpx.Response of its own and no provider config to drive), so | ||
| # we copy from source_iterator where applicable and use safe | ||
| # defaults elsewhere. This keeps inherited methods (e.g. | ||
| # _check_max_streaming_duration, _handle_failure) safe to call. | ||
| self.response = source_iterator.response | ||
| self.model = source_iterator.model | ||
| self.logging_obj = source_iterator.logging_obj | ||
| self.finished = False | ||
| self.responses_api_provider_config = ( | ||
| source_iterator.responses_api_provider_config | ||
| ) | ||
| self.completed_response = None | ||
| self.start_time = source_iterator.start_time | ||
| self._failure_handled = False | ||
| self._completed_response_cached = False | ||
| self._completed_response_logged = False | ||
| self._completed_response_cache_hit = None | ||
| self._persist_completed_response_before_logging = True | ||
| self._stream_created_time = time.time() | ||
| self.litellm_metadata = source_iterator.litellm_metadata | ||
| self.custom_llm_provider = source_iterator.custom_llm_provider | ||
| self.request_data = source_iterator.request_data | ||
| self.call_type = source_iterator.call_type | ||
| # Preserve hidden params so response headers (model_id, | ||
| # api_base, additional_headers) keep flowing. | ||
| self._hidden_params = dict(source_iterator._hidden_params or {}) | ||
|
|
||
| def __aiter__(self): | ||
| return self | ||
|
|
||
| async def __anext__(self): | ||
| return await self._async_generator.__anext__() |
There was a problem hiding this comment.
FallbackResponsesStreamWrapper manually mirrors all parent attributes without calling super().__init__()
The constructor copies every attribute BaseResponsesAPIStreamingIterator.__init__ would set. Any future attribute added to the base class __init__ (e.g., a new flag or sub-object) will be silently absent from the wrapper, potentially causing AttributeError in inherited helper methods like _check_max_streaming_duration or _handle_failure that rely on those attributes. The docstring acknowledges the bypass but there is no enforced coupling. Consider at least adding a comment enumerating which version of BaseResponsesAPIStreamingIterator.__init__ this mirrors so reviewers know when to update it.
|
https://github.com/BerriAI/litellm/pull/28542/commits THis PR has the same commits, closing this |
Automated staging PR created by litellm-agent.
This branch collects PRs approved by the agent on 5/21/2026.