Skip to content

[litellm-agent] Staging → litellm_internal_staging (5/20/2026)#28310

Closed
oss-pr-review-agent-shin[bot] wants to merge 8 commits into
litellm_internal_stagingfrom
shin_agent_oss_staging_05_20_2026
Closed

[litellm-agent] Staging → litellm_internal_staging (5/20/2026)#28310
oss-pr-review-agent-shin[bot] wants to merge 8 commits into
litellm_internal_stagingfrom
shin_agent_oss_staging_05_20_2026

Conversation

@oss-pr-review-agent-shin

@oss-pr-review-agent-shin oss-pr-review-agent-shin Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Merged PRs (1)

# Title
#28215 fix(router): wrap aresponses streaming iterator for mid-stream fallbacks

Auto-updated by litellm-agent on each merge.

TorvaldUtne and others added 7 commits May 19, 2026 02:38
…#27700)

Squash-merged by litellm-agent from TorvaldUtne's PR.
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers

* fix pricing

* add service tier

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
…dge (#28201)

* fix(anthropic): accept dict-shape reasoning_effort from Responses bridge

Issue #28196 — the Responses->Chat parser (transformation.py:184-200) keeps the full dict as reasoning_effort when summary is set; that branch was added in #25359. But the Anthropic transformation here still guarded on isinstance(value, str), silently dropping the param. Result: callers using the standard Reasoning(effort, summary) OpenAI-shaped object on Anthropic lose thinking entirely (0 reasoning_tokens, no thinking_blocks).

Coerce dict -> string before mapping. Same shape tolerance that gpt_5_transformation._normalize_reasoning_effort_for_chat_completion already implements. summary is irrelevant for Anthropic's thinking_blocks.

Adds two regression tests: one parametrized over string + dict shapes (with and without summary), one covering unparseable dict inputs (drops silently, no crash).

* test(anthropic): add non-adaptive model coverage for dict-shape reasoning_effort

Per Greptile feedback on PR #28198: the original regression test only exercised the adaptive (4.6+) path. Add a parametrized test for the non-adaptive branch (claude-sonnet-4-5) verifying that dict-shape reasoning_effort still maps to thinking.type='enabled' + budget_tokens, and that output_config is NOT set on pre-4.6 models.

* test(anthropic): convert unparseable-dict test to @pytest.mark.parametrize

Per @greptile-apps inline review on PR #28201 — matches the parametrize style of the two adjacent dict-shape tests and produces clearer failure messages (test ID per case instead of one collapsing for-loop).
…cks (#28215)

Squash-merged by litellm-agent from cwang-otto's PR.
@oss-pr-review-agent-shin

Copy link
Copy Markdown
Contributor Author

@greptile please review

@CLAassistant

CLAassistant commented May 20, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 6 committers have signed the CLA.

✅ TorvaldUtne
✅ mubashir1osmani
✅ ro31337
✅ cwang-otto
✅ IshaMeera
❌ oss-agent-shin
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps

greptile-apps Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR merges PR #28215, which adds mid-stream fallback support for the Responses API (aresponses) streaming path in the Router, bringing it to parity with the existing chat-completions mid-stream fallback. It also fixes the Anthropic transformation to accept dict-shaped reasoning_effort (not just strings), adds stable pricing entries for gemini-3.1-flash-lite across all provider prefixes, and trims whitespace from MCP tool test panel string inputs.

  • router.py: Introduces _aresponses_with_streaming_fallbacks and four helpers that wrap the streaming iterator to catch MidStreamFallbackError, build a continuation prompt with partial assistant output, merge partial-stream usage onto the fallback's response.completed event, and clean up both streams in a shielded finally block. call_type == "aresponses" is now routed through this wrapper instead of the generic fallback handler.
  • transformation.py: Loosens the reasoning_effort guard from isinstance(value, str) to also coerce a {"effort": ..., "summary": ...} dict to its effort string before mapping — fixing silent drops when the Responses→Chat bridge forwards a dict-shaped reasoning field.
  • Model pricing / UI: gemini-3.1-flash-lite stable entries added for all four provider prefixes; ToolTestPanel.tsx trims string inputs before type-coercion.

Confidence Score: 5/5

Safe to merge — the changes are well-scoped and all new code paths are covered by mock-only unit tests.

The core router change is a net-new code path that does not alter existing non-streaming or non-fallback flows. The transformation.py fix is a narrow guard relaxation with full parametrised test coverage. No existing tests were weakened, no real network calls were added, and the cleanup logic uses shielded cancellation scopes correctly.

litellm/router.py — specifically the FallbackResponsesStreamWrapper constructor and the stream_with_fallbacks finally block, which are the most complex parts of the new fallback path.

Important Files Changed

Filename Overview
litellm/router.py Adds _aresponses_with_streaming_fallbacks and supporting helpers to give the Responses-API path mid-stream fallback parity with the chat-completions path. Logic is sound and well-commented; FallbackResponsesStreamWrapper._hidden_params is copied from the source iterator so post-fallback wrapper metadata reflects the source's deployment, not the fallback's.
litellm/llms/anthropic/chat/transformation.py Loosens the reasoning_effort guard to also accept a dict with an effort key, fixing silent drops when the Responses→Chat bridge passes a dict-shaped reasoning field.
tests/router_unit_tests/test_router_aresponses_streaming_fallback.py New unit-test file covering all four helpers added to Router. All tests use mocks — no real network calls.
tests/test_litellm/test_router.py Adds extensive mock-based tests for the aresponses streaming-fallback path. The only change to an existing test is a pure reformat — assertions are unchanged.
model_prices_and_context_window.json Adds stable pricing entries for gemini-3.1-flash-lite across all four provider prefixes, consistent with the existing -preview entry.

Reviews (2): Last reviewed commit: "fix(router): unblock staging — mypy + co..." | Re-trigger Greptile

Comment on lines 1506 to 1542
@@ -1519,12 +1531,12 @@ def map_openai_params( # noqa: PLR0915
optional_params["thinking"] = mapped_thinking
if AnthropicConfig._is_adaptive_thinking_model(model):
mapped_effort = REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get(
value
effort_value
)
if mapped_effort is None:
AnthropicConfig._raise_invalid_reasoning_effort(
model=model,
value=value,
value=effort_value,
llm_provider=self.custom_llm_provider or "anthropic",
)
optional_params["output_config"] = {"effort": mapped_effort}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded model-specific flag check — should use model_prices_and_context_window.json

_is_effort_supported_model and _is_adaptive_thinking_model (called later in this block) are hardcoded model-name lists. The team rule is that model capability flags belong in model_prices_and_context_window.json and are read via get_model_info / supports_* helpers so they work automatically when new models are added. Putting them here means every new model that supports effort-based thinking requires a LiteLLM code change rather than a config-file entry.

Rule Used: What: Do not hardcode model-specific flags in the ... (source)

Comment thread tests/test_litellm/test_cost_calculator.py
@codecov

codecov Bot commented May 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.50000% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/router.py 86.88% 16 Missing ⚠️

📢 Thoughts on this report? Let us know!

…ng fallback (#28318)

Squash-merged by litellm-agent from cwang-otto's PR.
@oss-pr-review-agent-shin

Copy link
Copy Markdown
Contributor Author

@greptile please review

@Sameerlite

Copy link
Copy Markdown
Collaborator

Covered in #28337

@Sameerlite Sameerlite closed this May 20, 2026
@Sameerlite Sameerlite deleted the shin_agent_oss_staging_05_20_2026 branch May 22, 2026 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants