feat(custom_llm): allow streaming/astreaming to yield ModelResponseStream by tanmay958 · Pull Request #27580 · BerriAI/litellm

tanmay958 · 2026-05-10T10:27:54Z

Relevant issues

Custom LLM providers that speak the OpenAI wire format were forced to
translate every streaming chunk into GenericStreamingChunk (GChunk).
This was painful for two reasons:

1. GChunk has only 7 fields; a real OpenAI chunk has 14.
model, system_fingerprint, logprobs, id, delta.role,
delta.tool_calls, delta.reasoning_content — all silently dropped
with no way to preserve them.

2. The provider_specific_fields escape hatch places data at the wrong level.
streaming_handler.py handles it with:

for key, value in anthropic_response_obj["provider_specific_fields"].items():
    setattr(model_response, key, value)   # sets on TOP-LEVEL chunk

So logprobs lands at chunk.logprobs instead of
chunk.choices[0].logprobs breaking every consumer that follows
the OpenAI spec.

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

script to run : code

Type

🆕 New Feature

Changes

CLAassistant · 2026-05-10T10:28:01Z

All committers have signed the CLA.

…eam directly

codecov · 2026-05-10T10:33:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-10T10:36:20Z

Greptile Summary

This PR adds a fast-path in chunk_creator that lets custom LLM providers yield ModelResponseStream objects directly instead of having to translate every chunk into a GenericStreamingChunk, solving the 7-field limitation and the provider_specific_fields misplacement bug described in #27389.

Core change (streaming_handler.py): A new isinstance(chunk, ModelResponseStream) + _custom_providers guard returns the chunk straight through after managing finish_reason state (strip from content chunk, record in received_finish_reason, rely on finish_reason_handler() for the terminal chunk). The guard correctly matches the existing GChunk path's provider restriction.
Tests (test_streaming_handler.py, test_custom_llm.py): Five new unit tests cover the passthrough, finish_reason stripping, StopIteration on trailing chunks, and tool_calls preservation; two end-to-end tests exercise the full sync/async streaming pipeline with a mock custom provider.

Confidence Score: 5/5

The change is narrowly scoped to custom providers and the passthrough logic is well-tested; no regressions to built-in providers are expected.

The guard restricts the new path strictly to litellm._custom_providers, keeping all built-in providers unaffected. The finish_reason stripping and StopIteration flow have been verified end-to-end by both unit and integration tests. The two remaining findings are narrow edge cases that do not affect the core correctness of the implementation.

litellm/litellm_core_utils/streaming_handler.py — specifically the chunk mutation on line 1154 and the incomplete _has_content guard on lines 1136-1143.

Important Files Changed

Filename	Overview
litellm/litellm_core_utils/streaming_handler.py	Adds a new early-return branch in chunk_creator that passes ModelResponseStream chunks from custom providers straight through; correctly guards on _custom_providers, handles finish_reason stripping and StopIteration, with minor concerns around chunk mutation and incomplete _has_content guard.
tests/local_testing/test_custom_llm.py	Adds end-to-end sync/async streaming tests for the ModelResponseStream passthrough via a minimal custom provider subclass; parametrized across four finish_reason values.
tests/test_litellm/litellm_core_utils/test_streaming_handler.py	Adds five unit tests for chunk_creator passthrough behavior (content preservation, finish_reason stripping, StopIteration on trailing chunks, tool_calls not dropped); also has minor whitespace-only reformatting of two existing assertions.

_{Reviews (5): Last reviewed commit: "fix(streaming): add type ignore for fini..." | Re-trigger Greptile}

…roviders

tanmay958 · 2026-05-10T12:55:43Z

@krrish-berri-2

Sameerlite · 2026-05-11T07:01:25Z

@greptile re review

Sameerlite · 2026-05-11T07:09:13Z

@tanmay958 please make sure greptile review is 4+/5 first before review from us

tanmay958 · 2026-05-11T10:14:37Z

@greptile re-review

…ol_calls are preserved

…mStreamWrapper

tanmay958 · 2026-05-11T12:42:57Z

@greptile re-review

tanmay958 · 2026-05-11T12:48:55Z

@greptile re review

tanmay958 · 2026-05-11T12:59:30Z

@Sameerlite could you please review this now that the Greptile score has increased

tanmay958 · 2026-05-12T13:16:06Z

@Sameerlite can you please review

tanmay958 · 2026-05-12T19:19:00Z

@krrish-berri-2 can you please assign someone to review

oss-pr-review-agent-shin · 2026-05-16T02:47:23Z

🤖 litellm-agent: Auto-merge skipped — the staging branch shin_agent_oss_staging_05_16_2026 has 194 commit(s) not in your branch. Merging as-is would produce a confusing diff on the staging PR.

Please rebase your branch onto shin_agent_oss_staging_05_16_2026 and push; the agent will re-review automatically.

oss-pr-review-agent-shin · 2026-05-17T10:57:58Z

🤖 litellm-agent: Auto-merge skipped — the staging branch shin_agent_oss_staging_05_17_2026 has 251 commit(s) not in your branch. Merging as-is would produce a confusing diff on the staging PR.

Please rebase your branch onto shin_agent_oss_staging_05_17_2026 and push; the agent will re-review automatically.

MaleicAcid · 2026-06-02T12:33:05Z

Hey! Really excited about this PR — we've been hitting the exact same pain point with the custom provider streaming path. The provider_specific_fields workaround feels like patching a leaky pipe, and we've had to monkey-patch return_processed_chunk_logic internally just to get basic fields like reasoning_content through. Works, but it's fragile and feels wrong.
The ModelResponseStream fast-path approach makes way more sense — instead of squeezing an OpenAI-shaped response through a 7-field TypedDict and hoping nothing falls off, just let the provider return what it already has. Clean, natural, and no data loss.
Really hope this gets reviewed and merged soon

Sameerlite

LGTM

* fix(azure): apply api_version fallback chain to image edit URL `AzureImageEditConfig.get_complete_url` only read `api_version` from `litellm_params`. When callers configured it via `litellm.api_version` or `AZURE_API_VERSION`, the constructed URL had no `?api-version=` and Azure responded `404 Resource not found`. Apply the same fallback chain the Azure chat path already uses in `common_utils.py`: litellm_params > litellm.api_version > AZURE_API_VERSION env > litellm.AZURE_DEFAULT_API_VERSION Adds 5 unit tests pinning each layer of the chain plus a regression guard for `api_base` that already carries `?api-version=`. * feat(mcp): core sampling and elicitation flow with security hardening - Add sampling_handler.py: full MCP sampling/createMessage flow with model selection (hint-based + priority-based), auth enforcement, budget checks, route restriction gates, and tag policy pre-auth - Add elicitation_handler.py: MCP elicitation/create relay with downstream client capability detection - Wire sampling/elicitation callbacks in mcp_server_manager.py gated behind allow_sampling/allow_elicitation config flags - Add allow_sampling/allow_elicitation fields to MCPServer type - Fix session lock deadlock: skip lock for JSON-RPC response POSTs (elicitation/sampling replies) with truncated-body heuristic - Extend client.py with sampling_callback and elicitation_callback - Security: RouteChecks gate, tag-budget bypass fix, x-forwarded-for spoofing fix, Latin-1 header encoding guard - Add 4 new test modules (model access, priority selection, request builder, tool conversion) + update existing MCP tests * fix(security): run pre-call guardrails before MCP sampling acompletion Without this, an upstream MCP server with allow_sampling enabled could send prompts that bypass every guardrail (content filtering, PII redaction, prompt-injection detection) configured on /chat/completions. - Call proxy_logging_obj.pre_call_hook(call_type='acompletion') before llm_router.acompletion so guardrails fire for sampling sub-calls - Add HTTPException to the re-raise list so guardrail rejections propagate correctly instead of being swallowed as generic errors * feat(bedrock_mantle): add Responses API support (/openai/v1/responses) (#29490) * feat(bedrock_mantle): add Responses API transformation config * test(bedrock_mantle): cover trailing-slash api_base normalization * feat(bedrock_mantle): export BedrockMantleResponsesAPIConfig * feat(bedrock_mantle): register gpt-5.x Responses config (gpt-oss unchanged) * feat(bedrock_mantle): add gpt-5.5/gpt-5.4 Responses price-map entries * refactor(bedrock_mantle): exclude gpt-oss instead of allow-listing gpt-5 for Responses routing Frontier OpenAI models on Bedrock Mantle are Responses-only on /openai/v1/responses; gpt-oss is the legacy family that also speaks chat-completions. Gate by excluding gpt-oss (which keeps its chat-completions emulation) and defaulting everything else to the native Responses config, so future frontier models (gpt-6, etc.) route correctly without a code change. Verified against the live us-east-2 Mantle endpoint: gpt-oss 400s on /openai/v1/responses while gpt-5.5 400s on both standard paths. * test(bedrock_mantle): cover supports_native_websocket opt-out Closes the one uncovered line flagged by codecov on the Responses config. The assertion documents that Mantle Responses has no realtime/websocket transport, so realtime routing must not attempt a socket it cannot serve. * fix(bedrock_mantle): route file_search through emulation instead of forwarding to Mantle BedrockMantleResponsesAPIConfig inherited supports_native_file_search() -> True from OpenAIResponsesAPIConfig but never overrode it. Mantle has no OpenAI vector stores, so a forwarded file_search tool is rejected with a 400 (verified upstream: Tool type 'file_search' is not supported). Opting out, like the existing supports_native_websocket override, routes the tool through LiteLLM's file_search emulation instead. * fix(bedrock_mantle): only route openai.gpt frontier models to Responses The previous gate excluded gpt-oss and routed every other model to the native Responses config. But on Mantle only the OpenAI gpt frontier models (gpt-5.x) are served on /openai/v1/responses; gpt-oss and the non-OpenAI families (nvidia, mistral, google, zai, ...) are chat-completions only and 400 on that path. Allow-list the openai.gpt- family (excluding gpt-oss) instead, so chat-only models fall through to the chat-completions emulation. Verified against the live us-east-2 endpoint: nvidia.nemotron-nano-9b-v2 returns 400 on /openai/v1/responses and 200 on /v1/chat/completions. * feat(custom_llm): allow streaming/astreaming to yield ModelResponseStream (#27580) * fix(custom_llm): allow streaming/astreaming to yield ModelResponseStream directly * fix(streaming): enhance ModelResponseStream handling for custom LLM providers * fix(streaming): strip finish_reason from content chunks and ensure tool_calls are preserved * fix(streaming): add type ignore for finish_reason assignment in CustomStreamWrapper * fix(proxy): strip stack trace from HTTP 503 responses (CWE-209) (#28330) * fix(proxy/cwe-209): strip Python traceback from HTTP 503 error responses The /cache/ping endpoint included a full Python traceback in its 503 error response body (inside the ProxyException message), leaking internal file paths, line numbers, and call stacks to any caller. Two MCP route handlers in proxy_server.py similarly interpolated str(e) into "Internal server error" detail strings. Fix: log the traceback server-side via verbose_proxy_logger.exception() and omit it from the ProxyException payload / HTTPException detail returned to clients. Tests updated to assert no "traceback" keyword or frame paths appear in the 503 body, with a new dedicated regression test. CWE-209: Generation of Error Message Containing Sensitive Information. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(proxy/cwe-209): apply Greptile P2 fixes and add MCP exception-path tests Greptile 4/5 review identified two remaining gaps and Codecov reported 0% coverage on the two MCP handler exception branches: 1. caching_routes.py — str(e) in "Service Unhealthy ({str(e)})" could still leak Redis hostnames/IPs; replaced with static "Service Unhealthy". HTTPException is now re-raised before the generic handler so the "cache not initialized" 503 still reaches callers with its detail. Removed the redundant str(e) arg from verbose_proxy_logger.exception() (exception() already appends the traceback automatically). 2. tests — two new unit tests cover the exception paths in dynamic_mcp_route and toolset_mcp_route that were previously at 0%: - test_dynamic_mcp_route_unexpected_exception_returns_500_without_traceback - test_toolset_mcp_route_unexpected_exception_returns_500_without_traceback All 25 tests pass (9 caching + 16 MCP). CWE-209: Generation of Error Message Containing Sensitive Information. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(caching_routes): restore precise assertion in test_cache_ping_no_cache_initialized The assertion was weakened to `"Cache not initialized" in str(data)`, which matches the raw string of the entire response dict and would pass even if the error moved to an unexpected field or changed structure. Restore a targeted check on the parsed response: assert the exact string in the correct field `data["detail"]`, matching FastAPI's HTTPException serialisation format {"detail": "<message>"}. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(caching_routes): restore precise assertion and add CWE-209 no-cache path test The assertion in test_cache_ping_no_cache_initialized was weakened to `"Cache not initialized" in str(data)`, which matched against the raw string representation of the entire response dict. This would pass silently even if the error message moved to an unexpected field or the structure changed. Restore a targeted assertion on the parsed field: assert data["detail"] == "Cache not initialized. litellm.cache is None" matching FastAPI's HTTPException serialisation format exactly. Add test_cache_ping_no_cache_does_not_expose_internals to show the code path is still working correctly after the CWE-209 fix: verifies that the HTTPException is re-raised as-is (no traceback, no source paths), and asserts the complete response structure is exactly {"detail": "Cache not initialized. litellm.cache is None"}. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(caching_routes): restore ProxyException envelope for null-cache 503 The except HTTPException: raise guard (added in the CWE-209 fix) caused the null-cache HTTPException to escape as FastAPI's {"detail": "..."} shape instead of the {"error": {...}} ProxyException envelope that callers expect. Move the null-cache guard before the try block and raise ProxyException directly so the response structure is consistent with all other /cache/ping 503s, and the except HTTPException: raise guard is only reachable by unexpected downstream HTTPExceptions. Update the two no-cache tests to assert the correct ProxyException envelope. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Update utils.py (#26609) * feat(pricing): add Snowflake Cortex REST API model pricing (#26612) * feat(pricing): add Snowflake Cortex REST API model pricing ## Summary Adds pricing and context window information for 20+ Snowflake Cortex REST API models to `model_prices_and_context_window.json`. ## What's included - **7 Claude models** (sonnet-4-5, sonnet-4-6, 4-sonnet, 4-opus, haiku-4-5, 3-7-sonnet, 3-5-sonnet) — with prompt caching rates - **4 OpenAI models** (gpt-4.1, gpt-5, gpt-5-mini, gpt-5-nano) — with prompt caching rates - **5 Llama models** (3.1-8b, 3.1-70b, 3.1-405b, 3.3-70b, 4-maverick) - **1 DeepSeek model** (deepseek-r1) - **1 Mistral model** (mistral-large2) - **1 Snowflake model** (snowflake-llama-3.3-70b) - **2 Embedding models** (arctic-embed-l-v2.0, arctic-embed-m-v2.0) Each entry includes `input_cost_per_token`, `output_cost_per_token`, `cache_read_input_token_cost` (where applicable), `max_input_tokens`, `max_output_tokens`, and capability flags (`supports_function_calling`, `supports_vision`, `supports_prompt_caching`, `supports_reasoning`). ## Pricing source All prices are in USD per token, sourced from the official [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf) — Tables 6(b) (REST API with Prompt Caching) and 6(c) (REST API). ## Context The existing `snowflake/` provider has zero model entries in the pricing JSON, which means LiteLLM cannot track costs for Snowflake Cortex calls. This PR fills that gap. ## Related - Existing provider: `litellm/llms/snowflake/` - Cortex REST API docs: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-rest-api * Update model_prices_and_context_window.json Fix the JSON parsing error * Update model_prices_and_context_window.json Removed the duplicate entry * fix(utils): copy extra_body before adding unknown params to prevent model config mutation (#29620) Fixes #29615. In add_provider_specific_params_to_optional_params, the line: extra_body = passed_params.pop("extra_body", None) or {} returns the original dict reference when extra_body is non-empty (truthy). Subsequent writes like extra_body[k] = passed_params[k] then mutate the shared model config object held by the router, poisoning /model/info and all subsequent requests for that deployment. The or {} short-circuit creates a new dict only when extra_body is falsy (None or {}), which is why the bug does not reproduce with extra_body: {}. Fix: wrap in dict() so we always work on a fresh shallow copy. * fix(vertex_ai): Bake tool_choice into Gemini CachedContent body to prevent silent drop (#29097) * fix(vertex_ai): bake tool_choice into Gemini CachedContent body to prevent silent drop * address greptile feedback on tool_choice cache test * adds test that uses ToolConfig(functionCallingConfig=FunctionCallingConfig(mode=ANY)) instead of a dict literal, mirroring what map_tool_choice_values actually produce * fix(gemini/veo): move image from parameters into instances[0] (#29501) * fix(gemini/veo): move image from parameters into instances[0] Veo's predictLongRunning schema puts image (and prompt) on the instances element; parameters is for aspectRatio/durationSeconds/etc. The Gemini path was leaving image in params_copy, so it ended up nested under parameters and the API silently ignored it. The Vertex path already builds the instance dict explicitly, so this just aligns the Gemini path with it. Fixes #29498 * address greptile: unconditional pop + BytesIO test - Pop `image` from params_copy unconditionally so it never reaches GeminiVideoGenerationParameters even when None, removing implicit reliance on Pydantic's extra-field-ignore. - Add test_transform_video_create_request_image_filelike_goes_to_instance covering the BytesIO path (_convert_image_to_gemini_format) — round-trips the base64 to confirm encoding. - Add test_transform_video_create_request_image_none_is_dropped covering the new None branch. * fix(huggingface): handle special token text in embedding usage (#29660) * fix(guardrails): recompile ToolPermissionGuardrail rules on update_in_memory_litellm_params (#29655) * fix(guardrails): recompile ToolPermissionGuardrail rules on update_in_memory_litellm_params ToolPermissionGuardrail builds self.rules and the compiled target/pattern maps only in __init__. The base update_in_memory_litellm_params re-sets raw attributes via setattr but never rebuilds those maps, so a guardrail updated in place (PUT /guardrails, or the immediate in-memory sync) keeps enforcing the construction-time rules until it is reinitialized (PATCH path, periodic DB poll, or restart). Extract the compile step into _load_rules and override update_in_memory_litellm_params to rebuild from it (dict- and model-safe), re-normalizing default_action / on_disallowed_action. Mirrors the existing PresidioGuardrail override of the same method. Adds regression tests. Fixes #29592. * fix(guardrails): handle dict params in ToolPermissionGuardrail in-memory update Delegate to super() only for LitellmParams input (the base setattr loop is model-only); apply the raw-dict case inline. Fixes the mypy arg-type error and makes the recompile work when the proxy passes the raw DB dict. * fix(guardrails): preserve tool-permission rules on a partial in-memory update A partial update (e.g. a LitellmParams whose rules field is None) ran through the generic setattr, which set self.rules to None, and the recompile was skipped, leaving the guardrail with no rules. Snapshot the previous rules and restore them when the update carries no rules; an explicit empty list still clears them. Adds a regression test for the rules-absent case. Addresses the Greptile review note on #29655. * fix(bedrock): stop base_model label from stripping tools/tool_choice (#29621) * fix(bedrock): stop base_model label from stripping tools/tool_choice A Router/proxy Bedrock deployment whose model_info.base_model is a friendly label (e.g. claude-haiku-4-5) silently lost tools/tool_choice: the outgoing Converse request was built without toolConfig, so the model behaved as if no tools were provided. Worked in v1.84.0, regressed in v1.85.0, and with drop_params=true it failed silently. Two changes compound into the bug. completion() passed model_info.base_model as the model argument to get_optional_params, so the real Bedrock model id never reached supported-param resolution; and get_supported_openai_params resolved the provider config's params from base_model or model, letting the label fully replace the real model. For Bedrock the label resolves to no tool support, so tools/tool_choice were dropped before transformation. completion() now keeps model as the real deployment model and threads the resolved base_model (kwarg or model_info) through separately, and get_supported_openai_params treats base_model as additive: it returns the union of the params supported by model and by base_model. A hint can only add capabilities, never strip ones the real model already exposes, which also preserves the original base_model behavior from #27717 and Azure's base_model driven model-type detection. Fixes #29618 * test(main): make base_model param test robust to new parametrize cases Restore an explicit per-case expected_model_param literal instead of hardcoding the gemini id, so a future case with a different model can't produce a misleading assertion failure. * fix(fireworks_ai): pass response_format json_schema through unchanged (#29606) FireworksAIConfig.map_openai_params was rewriting the OpenAI strict `{type: json_schema, json_schema: {name, strict, schema}}` shape into `{type: json_object, schema: ...}` before sending to Fireworks, dropping `strict` and `name` and changing the `type`. Per Fireworks' docs json_object means "force any valid JSON output (no specific schema)", so the schema constraint was effectively dropped and grammar-guided decoding never ran; model output silently violated the schema. The rewrite landed in #7085 (Dec 2024) when Fireworks did not yet accept native json_schema. Fireworks accepts the OpenAI strict shape natively now, so the rewrite has become a regression. Removes the rewrite. Passes response_format through unchanged. Updates the existing test_map_response_format to assert pass-through. Adds focused regression tests in tests/test_litellm/ covering preservation of type, strict, name, and schema body, plus that json_object alone still works. * fix(types): import Required from typing_extensions in gemini types * style: reformat sampling_handler.py for py312 black compat * refactor(mcp-sampling): extract helpers to fix PLR0915 too-many-statements in handle_sampling_create_message * fix(proxy-server): add explicit ProxyLogging type annotation to proxy_logging_obj to fix mypy inference * fix(mcp-sampling): suppress mypy assignment error on ImportError fallback for proxy_logging_obj * fix(test): use .value when comparing LlmProviders enum against string in test_default_api_base * fix(test): iterate LlmProviders enum in test_default_api_base to avoid str pollution from custom provider registration litellm.provider_list is a mutable global initialized to list(LlmProviders) but custom_llm_setup() appends plain provider strings to it. When a test_custom_llm.py test runs first in the same xdist worker, provider_list contains a str and calling .value on it raises AttributeError. Iterate the immutable LlmProviders enum instead, which is deterministic and what the check intends. * fix(mcp): depth-aware JSON-RPC response detection and neutral speed-priority fallback Replace the flat substring check in the truncated-body routing path with a top-level-key scan so a JSON-RPC response whose result payload nests a "method" field is still detected as a response and skips the session lock, removing a deadlock against the in-flight tool call awaiting it. Drop the inverse max_output_tokens speed proxy when no model exposes output_tokens_per_second; context-window size does not track latency, so a neutral score avoids biasing speedPriority toward the smallest-context model. * fix(guardrails): make ToolPermission rule reload atomic on invalid regex _load_rules appended each rule to self.rules before compiling its regex, so an invalid pattern raised mid-loop after the bad rule was already live but without a _compiled_rule_targets entry. _matches_regex reads a missing compiled target as a None pattern and returns True, turning the bad rule into a match-all that silently applies its decision to every tool. Via update_in_memory_litellm_params (PUT /guardrails) this corrupted the live guardrail. Build the parsed rules and compiled maps into locals and swap them in only after every regex compiles, and restore the previous ruleset if a live update is rejected, so an invalid regex now fails the update without leaving the guardrail enforcing a broken policy. * test(mcp): cover sampling conversion, model resolution, and elicitation relay paths The MCP sampling and elicitation handlers shipped with partial test coverage, leaving the response-to-MCP conversion, the model resolution fallback chain, completion-kwargs assembly, guardrail routing, and the entire elicitation relay untested. That pulled the PR's diff (patch) coverage below the codecov threshold even though overall project coverage rose. Add focused unit tests for _convert_openai_response_to_mcp_result, _convert_mcp_tools_to_openai, _convert_mcp_tool_choice_to_openai, image and audio content conversion, the hint-matching and fallback branches of _resolve_model_from_preferences, _build_completion_kwargs, the router and guardrail-rejection paths of _run_guardrails_and_call_llm, the handle_sampling_create_message success and error-propagation flows, the marker-hoisting fallback for tool content on unexpected roles, and the elicitation form/url/generic relay together with its decline paths --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: lengkejun <lengkejun@xd.com> Co-authored-by: Yug <yugborana000@gmail.com> Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com> Co-authored-by: tanmay958 <53569547+tanmay958@users.noreply.github.com> Co-authored-by: DrishnaTrivedi <142084770+DrishnaTrivedi@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Navnit Shukla <Navnit.shukla25@gmail.com> Co-authored-by: PRABHU KIRAN VANDRANKI <72809214+VANDRANKI@users.noreply.github.com> Co-authored-by: Adrian Lopez <109683617+adriangomez24@users.noreply.github.com> Co-authored-by: hcl <chenglunhu@gmail.com> Co-authored-by: JooHo Lee <96564470+BWAAEEEK@users.noreply.github.com> Co-authored-by: Dinesh Girbide <85330597+Dinesh-Girbide@users.noreply.github.com> Co-authored-by: cloudwiz <22098246+andrey-dubnik@users.noreply.github.com> Co-authored-by: Ahmad Khan <ahmadkhan2508@gmail.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>

fix(custom_llm): allow streaming/astreaming to yield ModelResponseStr…

edf75d6

…eam directly

tanmay958 force-pushed the fix/custom-llm-model-response-stream-passthrough branch from 5125f54 to edf75d6 Compare May 10, 2026 10:29

greptile-apps Bot reviewed May 10, 2026

View reviewed changes

Comment thread litellm/litellm_core_utils/streaming_handler.py Outdated

Comment thread litellm/litellm_core_utils/streaming_handler.py Outdated

fix(streaming): enhance ModelResponseStream handling for custom LLM p…

1346001

…roviders

greptile-apps Bot reviewed May 11, 2026

View reviewed changes

Comment thread litellm/litellm_core_utils/streaming_handler.py

tanmay958 added 2 commits May 11, 2026 17:58

fix(streaming): strip finish_reason from content chunks and ensure to…

4054c35

…ol_calls are preserved

fix(streaming): add type ignore for finish_reason assignment in Custo…

b4fbb63

…mStreamWrapper

tanmay958 changed the title ~~fix(custom_llm): allow streaming/astreaming to yield ModelResponseStream~~ feat(custom_llm): allow streaming/astreaming to yield ModelResponseStream May 12, 2026

Sameerlite approved these changes Jun 4, 2026

View reviewed changes

Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_staging_040626 June 4, 2026 11:45

Sameerlite merged commit 3b13d81 into BerriAI:litellm_oss_staging_040626 Jun 4, 2026
42 checks passed

Uh oh!

Conversation

tanmay958 commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

CLAassistant commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

tanmay958 commented May 10, 2026

Uh oh!

Sameerlite commented May 11, 2026

Uh oh!

Uh oh!

Sameerlite commented May 11, 2026

Uh oh!

tanmay958 commented May 11, 2026

Uh oh!

tanmay958 commented May 11, 2026

Uh oh!

tanmay958 commented May 11, 2026

Uh oh!

tanmay958 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tanmay958 commented May 12, 2026

Uh oh!

tanmay958 commented May 12, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 16, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 17, 2026

Uh oh!

MaleicAcid commented Jun 2, 2026

Uh oh!

Sameerlite left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tanmay958 commented May 10, 2026 •

edited

Loading

CLAassistant commented May 10, 2026 •

edited

Loading

codecov Bot commented May 10, 2026 •

edited

Loading

greptile-apps Bot commented May 10, 2026 •

edited

Loading

tanmay958 commented May 11, 2026 •

edited

Loading