[litellm-agent] Staging → litellm_internal_staging (5/20/2026)#28310
[litellm-agent] Staging → litellm_internal_staging (5/20/2026)#28310oss-pr-review-agent-shin[bot] wants to merge 8 commits into
Conversation
…#27700) Squash-merged by litellm-agent from TorvaldUtne's PR.
Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai>
…dge (#28201) * fix(anthropic): accept dict-shape reasoning_effort from Responses bridge Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks). Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks. Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash). * test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models. * test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop).
…28280) Squash-merged by litellm-agent from ro31337's PR.
…cks (#28215) Squash-merged by litellm-agent from cwang-otto's PR.
|
@greptile please review |
|
|
Greptile SummaryThis PR merges PR #28215, which adds mid-stream fallback support for the Responses API (
Confidence Score: 5/5Safe to merge — the changes are well-scoped and all new code paths are covered by mock-only unit tests. The core router change is a net-new code path that does not alter existing non-streaming or non-fallback flows. The transformation.py fix is a narrow guard relaxation with full parametrised test coverage. No existing tests were weakened, no real network calls were added, and the cleanup logic uses shielded cancellation scopes correctly. litellm/router.py — specifically the FallbackResponsesStreamWrapper constructor and the stream_with_fallbacks finally block, which are the most complex parts of the new fallback path.
|
| Filename | Overview |
|---|---|
| litellm/router.py | Adds _aresponses_with_streaming_fallbacks and supporting helpers to give the Responses-API path mid-stream fallback parity with the chat-completions path. Logic is sound and well-commented; FallbackResponsesStreamWrapper._hidden_params is copied from the source iterator so post-fallback wrapper metadata reflects the source's deployment, not the fallback's. |
| litellm/llms/anthropic/chat/transformation.py | Loosens the reasoning_effort guard to also accept a dict with an effort key, fixing silent drops when the Responses→Chat bridge passes a dict-shaped reasoning field. |
| tests/router_unit_tests/test_router_aresponses_streaming_fallback.py | New unit-test file covering all four helpers added to Router. All tests use mocks — no real network calls. |
| tests/test_litellm/test_router.py | Adds extensive mock-based tests for the aresponses streaming-fallback path. The only change to an existing test is a pure reformat — assertions are unchanged. |
| model_prices_and_context_window.json | Adds stable pricing entries for gemini-3.1-flash-lite across all four provider prefixes, consistent with the existing -preview entry. |
Reviews (2): Last reviewed commit: "fix(router): unblock staging — mypy + co..." | Re-trigger Greptile
| @@ -1519,12 +1531,12 @@ def map_openai_params( # noqa: PLR0915 | |||
| optional_params["thinking"] = mapped_thinking | |||
| if AnthropicConfig._is_adaptive_thinking_model(model): | |||
| mapped_effort = REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get( | |||
| value | |||
| effort_value | |||
| ) | |||
| if mapped_effort is None: | |||
| AnthropicConfig._raise_invalid_reasoning_effort( | |||
| model=model, | |||
| value=value, | |||
| value=effort_value, | |||
| llm_provider=self.custom_llm_provider or "anthropic", | |||
| ) | |||
| optional_params["output_config"] = {"effort": mapped_effort} | |||
There was a problem hiding this comment.
Hardcoded model-specific flag check — should use
model_prices_and_context_window.json
_is_effort_supported_model and _is_adaptive_thinking_model (called later in this block) are hardcoded model-name lists. The team rule is that model capability flags belong in model_prices_and_context_window.json and are read via get_model_info / supports_* helpers so they work automatically when new models are added. Putting them here means every new model that supports effort-based thinking requires a LiteLLM code change rather than a config-file entry.
Rule Used: What: Do not hardcode model-specific flags in the ... (source)
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…ng fallback (#28318) Squash-merged by litellm-agent from cwang-otto's PR.
|
@greptile please review |
|
Covered in #28337 |
Merged PRs (1)
Auto-updated by litellm-agent on each merge.