fix(anthropic): preserve all tool_calls when an OpenAI delta contains multiple by freddyhaddad · Pull Request #26636 · BerriAI/litellm

freddyhaddad · 2026-04-27T21:20:37Z

Relevant issues

No existing issue. Surfaced by a real-world Claude Code + mlx_lm.server (kimi-k2.6) deployment where parallel-subagent dispatch was failing with InputValidationError: description / prompt missing.

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory — new file tests/test_litellm/llms/anthropic/test_anthropic_stream_multi_tool_call_split.py with 12 cases covering the static splitter, sync iteration, async iteration, and construction-time wiring
My PR passes the relevant unit tests — 266 passed across the new test file plus all tests/test_litellm/llms/anthropic/experimental_pass_through/, including the previously-existing TestAnthropicStreamWrapperToolArgs::test_sync_tool_args_not_dropped and test_async_tool_args_not_dropped regression tests
My PR's scope is as isolated as possible — single bug, single file change in litellm/, single new test file
Greptile review — not yet (I'll request after CI runs).

Type

🐛 Bug Fix

Changes

Problem

AnthropicStreamWrapper (the OpenAI→Anthropic streaming converter behind /v1/messages when use_chat_completions_url_for_anthropic_messages: true or anthropic-format passthrough) silently drops every tool_call beyond the first when an upstream OpenAI streaming chunk contains multiple complete tool_calls in a single delta.

Root cause: the helper _translate_streaming_openai_chunk_to_anthropic_content_block in transformation.py indexes choices[0].delta.tool_calls[0], and the streaming loop in AnthropicStreamWrapper.__next__ / __anext__ uses _should_start_new_content_block to decide WHEN to emit a new content_block_start. The combination only ever observes the first tool_call per chunk, so subsequent tool_calls in the same delta produce no content_block_start / input_json_delta / content_block_stop triple and are lost from the converted Anthropic event stream.

Trigger

mlx_lm.server emits all parallel tool_calls together in a single final delta after their text has been fully generated (this matches its single-pass, non-incremental tool-call streaming model). Likely affects other providers with similar patterns; OpenAI itself usually emits one tool_call per delta so this bug doesn't surface there.

Real-world symptom

When a user asks Claude Code (kimi-k2.6 backend, via this LiteLLM passthrough) to dispatch parallel subagents — e.g., "fetch these URLs concurrently with two Agent tool calls" — the model emits a 2-tool-call OpenAI delta, but Claude Code's harness only sees one tool_use content block. Its conversation state diverges from the model's, the model gets confused on the next turn and emits more tool_use blocks with empty input (mode-collapse on validation failure), and the harness loops with:

InputValidationError: Agent failed due to the following issues:
The required parameter `description` is missing
The required parameter `prompt` is missing

Fix

Introduce a small dual-protocol wrapper class _MultiToolCallSplitter that sits between the upstream completion stream and the existing converter, splitting any chunk whose delta.tool_calls has length > 1 into N chunks (one tool_call each) via deep-copy. The wrapper supports both __iter__ / __next__ and __aiter__ / __anext__ and matches whichever protocol the consumer (AnthropicStreamWrapper.__next__ vs __anext__) drives — necessary because some upstream stream wrappers (e.g. CustomStreamWrapper) expose both protocols, and a single-protocol wrapping at construction time would break the unused side. Single-tool-call chunks pass through as the same instance with no copy.

The downstream converter and the rest of AnthropicStreamWrapper's state machine are untouched; they continue to assume one tool_call per chunk, which is now invariant by construction.

Code change

One new module-level class _MultiToolCallSplitter, one static helper AnthropicStreamWrapper._split_chunk_by_tool_calls, and one line in __init__ that wires them in. 107 lines added, 1 line removed in litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py.

Screenshots / Proof of Fix

Before

Direct /v1/messages streaming test against mlx_lm.server for a prompt that produces 2 parallel tool_calls in one delta:

Total events: 8
  message_start
  content_block_start index=0 type=text
  content_block_stop index=0
  content_block_start index=1 type=tool_use name=Agent     ← only one tool_use
  content_block_delta index=1 input_json_delta partial='{"description": "...", "prompt": "..."}'
  content_block_stop index=1
  message_delta
  message_stop

After

Total events: 11
  message_start
  content_block_start index=0 type=text
  content_block_stop index=0
  content_block_start index=1 type=tool_use name=Agent     ← first tool_use
  content_block_delta index=1 input_json_delta partial='{"description": "count .py files...", ...}'
  content_block_stop index=1
  content_block_start index=2 type=tool_use name=Agent     ← second tool_use, was dropped before
  content_block_delta index=2 input_json_delta partial='{"description": "count .md files...", ...}'
  content_block_stop index=2
  message_delta
  message_stop

Claude Code TUI verification (end-to-end)

Same prompt that previously looped 8x with Invalid tool parameters now succeeds:

⏺ 2 agents finished
   ├ Count .py files in mlx_lm  ·  Done
   └ Count .md files in mlx-lm  ·  Done

⏺ Results:
  - 175 .py files in mlx_lm/
  - 31 .md files in mlx-lm/

Tests

$ uv run pytest tests/test_litellm/llms/anthropic/test_anthropic_stream_multi_tool_call_split.py -v
========================== 12 passed in 0.11s ==========================

$ uv run pytest tests/test_litellm/llms/anthropic/experimental_pass_through/ -q
========================= 254 passed in 6.4s =========================

The 12 new tests cover:

_split_chunk_by_tool_calls static helper: passthrough for None / "None" sentinels, single-tool-call chunks (no copy), 2-call splits, 3-call splits, chunks with no choices, chunks with delta=None
_MultiToolCallSplitter sync iteration
_MultiToolCallSplitter async iteration
AnthropicStreamWrapper construction with a dual-protocol upstream stream (verifies both sync and async paths reach the splitter)

Backwards compatibility

Single-tool-call chunks (the OpenAI default) pass through as the same Python instance — no deep-copy cost on the common path.
Public API of AnthropicStreamWrapper unchanged. Behavior unchanged for any provider that already emits one tool_call per delta.
The previously-existing TestAnthropicStreamWrapperToolArgs::test_sync_tool_args_not_dropped and test_async_tool_args_not_dropped (added in Anthropic adapter streaming drops tool_use input args on content block transition #24134) continue to pass — verified explicitly because they use a SimpleIterator fixture that exposes both sync and async protocols, which the dual-protocol splitter handles correctly.

🤖 Generated with Claude Code

CLAassistant · 2026-04-27T21:20:48Z

All committers have signed the CLA.

codspeed-hq · 2026-04-27T21:25:48Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing freddyhaddad:fix/anthropic-stream-multi-tool-call (2e3da15) with main (3d2b8fe)}

codecov · 2026-04-27T21:26:01Z

Codecov Report

❌ Patch coverage is 96.07843% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...mental_pass_through/adapters/streaming_iterator.py	96.07%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-04-27T21:26:16Z

Greptile Summary

This PR fixes a real-world bug in AnthropicStreamWrapper where parallel tool_calls emitted in a single OpenAI streaming delta were silently dropped after the first. The fix introduces _MultiToolCallSplitter, a thin wrapper that splits any multi-tool-call chunk into one chunk per tool_call before it reaches the downstream converter. The approach is minimal, backward-compatible (single-tool-call chunks pass through as the same instance), and the previous review-thread concerns (object aliasing in sub-chunks, missing end-to-end tests) appear to have been addressed in the current revision.

Confidence Score: 5/5

Safe to merge — targeted, minimal change with correct logic and solid test coverage including end-to-end event-sequence assertions.

No P0/P1 issues found. The previous thread's concerns (sub-chunk object aliasing, missing end-to-end tests) have been addressed in the current revision. Single-tool-call common path is zero-copy, multi-tool-call splitting logic is correct, and both sync/async protocols are handled cleanly.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py	Adds `_MultiToolCallSplitter` class and `_split_chunk_by_tool_calls` static helper; wires splitter into `AnthropicStreamWrapper.__init__`; switches `__next__`/`__anext__` loops from `self.completion_stream` to `self._completion_stream_splitter`. Logic is correct: single-tool-call chunks pass through as the same instance, multi-call chunks are deep-copied per slot, both sync and async protocols are supported via a shared deque buffer.
tests/test_litellm/llms/anthropic/test_anthropic_stream_multi_tool_call_split.py	New test file covering the static splitter helper (passthrough sentinels, single/2-call/3-call splits, edge cases), `_MultiToolCallSplitter` sync and async iteration, and two full end-to-end `AnthropicStreamWrapper` tests that assert on the emitted `content_block_start`/`content_block_delta` event sequence. No real network calls; all mocks or LiteLLM-internal types.

Sequence Diagram

sequenceDiagram
    participant Upstream as Upstream OpenAI Stream
    participant Splitter as _MultiToolCallSplitter
    participant Converter as AnthropicStreamWrapper.__next__
    participant Client as Anthropic Client

    Note over Upstream,Splitter: Single delta with 2 parallel tool_calls
    Upstream->>Splitter: chunk{tool_calls:[TC_A, TC_B]}
    Splitter->>Splitter: _split_chunk_by_tool_calls → [sub_A, sub_B]
    Splitter->>Splitter: buffer.append(sub_B)
    Splitter-->>Converter: sub_A {tool_calls:[TC_A]}

    Converter->>Client: content_block_start index=1 (tool_use TC_A)
    Converter->>Client: content_block_delta index=1 (args_A)
    Converter->>Client: content_block_stop index=1

    Note over Splitter,Converter: Next __next__ call drains the buffer
    Splitter-->>Converter: sub_B {tool_calls:[TC_B]} (from buffer)

    Converter->>Client: content_block_start index=2 (tool_use TC_B)
    Converter->>Client: content_block_delta index=2 (args_B)
    Converter->>Client: content_block_stop index=2

_{Reviews (2): Last reviewed commit: "fix(anthropic): preserve all tool_calls ..." | Re-trigger Greptile}

greptile-apps · 2026-04-27T21:26:19Z

+    def __next__(self) -> Any:
+        while True:
+            if self._buffer:
+                return self._buffer.popleft()
+            chunk = next(self._sync_iter_obj)  # raises StopIteration at EOF
+            splits = AnthropicStreamWrapper._split_chunk_by_tool_calls(chunk)
+            if len(splits) <= 1:
+                return splits[0]
+            self._buffer.extend(splits[1:])
+            return splits[0]


while True loop body always returns on first iteration

The while True: loop in __next__ (and identically in __anext__) is unreachable beyond the first iteration. _split_chunk_by_tool_calls always returns a list with ≥ 1 element (it returns [chunk] for every edge case), so every code path inside the loop body hits a return. The loop will never cycle back to check self._buffer a second time from a single __next__ call — that check only matters on the next call, which Python's for machinery handles by invoking __next__ again.

Suggested change

def __next__(self) -> Any:

while True:

if self._buffer:

return self._buffer.popleft()

chunk = next(self._sync_iter_obj) # raises StopIteration at EOF

splits = AnthropicStreamWrapper._split_chunk_by_tool_calls(chunk)

if len(splits) <= 1:

return splits[0]

self._buffer.extend(splits[1:])

return splits[0]

def __next__(self) -> Any:

if self._buffer:

return self._buffer.popleft()

chunk = next(self._sync_iter_obj) # raises StopIteration at EOF

splits = AnthropicStreamWrapper._split_chunk_by_tool_calls(chunk)

if len(splits) <= 1:

return splits[0]

self._buffer.extend(splits[1:])

return splits[0]

greptile-apps · 2026-04-27T21:26:20Z

+        out: List[Any] = []
+        for one_tc in tcs:
+            sub = copy.deepcopy(chunk)
+            sub.choices[0].delta.tool_calls = [one_tc]
+            out.append(sub)
+        return out


Sub-chunks share references to original tool-call objects

After deep-copying the full chunk into sub, the code immediately replaces the copied tool_calls list with a list containing the original (not deep-copied) one_tc object:

sub = copy.deepcopy(chunk) # deep copies tool_calls inside sub.choices[0].delta.tool_calls = [one_tc] # drops copy; re-assigns original

This means each emitted sub-chunk's tool_calls[0] is the same Python object as the corresponding slot in the original chunk's tool_calls. Any downstream mutation of that object (e.g. setting function.arguments) would silently alias back to the original chunk. Consider deep-copying one_tc explicitly to make sub-chunks fully independent:

for one_tc in tcs: sub = copy.deepcopy(chunk) sub.choices[0].delta.tool_calls = [copy.deepcopy(one_tc)] out.append(sub)

Alternatively, avoid the wasted deep-copy of tcs by only copying the outer structure and deep-copying each individual tool_call.

greptile-apps · 2026-04-27T21:26:21Z

+"""
+Tests for AnthropicStreamWrapper's multi-tool-call splitting.
+
+Some upstream providers (e.g. mlx_lm.server) emit multiple complete
+tool_calls inside a SINGLE OpenAI streaming delta when the model produces
+parallel tool calls. Anthropic's streaming format requires one
+``content_block`` per ``tool_use`` and the downstream converter in
+``AnthropicStreamWrapper`` indexes ``tool_calls[0]`` — so without splitting,
+all but the first tool_call are silently dropped from the converted
+``/v1/messages`` stream.
+
+These tests verify that ``AnthropicStreamWrapper`` splits such chunks into
+one tool_call per chunk before the converter sees them.
+"""
+
+import asyncio
+import os
+import sys
+from typing import List
+from unittest.mock import MagicMock


No end-to-end test for the full Anthropic SSE event sequence

The new tests thoroughly validate _split_chunk_by_tool_calls and _MultiToolCallSplitter in isolation, and verify that wrapper.completion_stream expands correctly. However, none of them exercise the complete AnthropicStreamWrapper.__next__ / __anext__ state machine with a multi-tool-call input to confirm that the final emitted Anthropic event sequence actually contains paired content_block_start / content_block_delta / content_block_stop triples for each parallel tool call. A regression in _should_start_new_content_block or the content_block_index accounting would go undetected by the current suite. An integration-style test (similar to the existing TestAnthropicStreamWrapperToolArgs tests) that consumes list(wrapper) and asserts on the type and index of each emitted event would close this gap.

… multiple The Anthropic /v1/messages streaming adapter (AnthropicStreamWrapper) silently drops every tool_call beyond the first when an upstream OpenAI streaming chunk contains multiple complete tool_calls in a single delta. The downstream converter (_translate_streaming_openai_chunk_to_anthropic_content_block) indexes tool_calls[0], so without splitting at the wrapper level only one content_block_start / input_json_delta / content_block_stop triple is emitted regardless of how many parallel tool_calls the model produced. The provider that triggered this is mlx_lm.server, which emits all parallel tool_calls in one final delta after their text has been fully generated. Real-world impact: Claude Code (which speaks /v1/messages) sees only the first parallel subagent dispatch, then loops on "InputValidationError: required parameter description / prompt missing" because the model's follow-up tool_use blocks arrive without their content. Fix: a small _MultiToolCallSplitter wraps the upstream stream inside __init__. It supports both __iter__/__next__ and __aiter__/__anext__ transparently so AnthropicStreamWrapper.__next__ and __anext__ each see the matching protocol. When a chunk's delta has more than one tool_call, the splitter deep-copies the chunk N times (one tool_call each) and buffers the rest. Single-tool-call chunks pass through unchanged (returns the same instance, no copy). Verified end-to-end against mlx_lm.server kimi-k2.6 + Claude Code's parallel-Agent-tool dispatch: before patch, /v1/messages stream emitted 8 events with 1 content_block_start for tool_use; after patch, 11 events with 2 content_block_starts (index 1 + index 2), each carrying its own input_json_delta. Claude Code TUI confirms both subagents now spawn, run, and return cleanly. Test added: tests/test_litellm/llms/anthropic/test_anthropic_stream_multi_tool_call_split.py covers the static splitter, sync iteration, async iteration, and construction-time wiring through AnthropicStreamWrapper. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

freddyhaddad · 2026-04-27T21:56:59Z

Hi maintainers — pushed an update addressing the bot review:

lint (mypy) ✅ — fixed by typing the lazy iterator slots as Any instead of Optional[Any] (Python's iteration protocol guarantees __iter__ runs before __next__)
Greptile Give me consistent exceptions #1 (while True unreachable) ✅ — removed; the loop only ever ran once
Greptile Enable model / call timeouts #2 (sub-chunk references) ✅ — added copy.deepcopy(one_tc) so each split chunk has fully independent state
Greptile Guarantee format of exceptions #3 (no end-to-end test) ✅ — added TestAnthropicStreamWrapperEndToEnd (sync + async) that drives the full AnthropicStreamWrapper and asserts two content_block_start of type tool_use are emitted for a multi-tool-call delta, each with its input_json_delta
CodeQL (attribute shadow) ✅ — moved the splitter onto a separate self._completion_stream_splitter attribute instead of overwriting self.completion_stream from the superclass

One open issue I can't fix from this side: the Verify PR source branch workflow rejects external contributions and points users at a branch named litellm_oss_branch, but that branch doesn't appear to exist in this repo. The closest matches are date-stamped litellm-oss-staging-* branches (most recent: litellm-oss-staging-04-25-2026), but I don't want to guess at the right target. Could you let me know which branch I should retarget to, or is the contribution-via-fork-to-main path expected to fail this gate at this stage of review?

(Or if it's easier, the workflow text at .github/workflows/guard-main-branch.yml references a branch that doesn't exist — happy to open a tiny follow-up PR fixing that pointer if you let me know the canonical name.)

…rge-after-nits, BerriAI/litellm#26636 merge-after-nits

freddyhaddad force-pushed the fix/anthropic-stream-multi-tool-call branch 2 times, most recently from 85fda80 to 6c34140 Compare April 27, 2026 21:22

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 27, 2026

View reviewed changes

Comment thread litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py Fixed

freddyhaddad force-pushed the fix/anthropic-stream-multi-tool-call branch from 6c34140 to 2e3da15 Compare April 27, 2026 21:55

Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request Apr 27, 2026

review: openai/codex#19882 merge-after-nits, BerriAI/litellm#26648 me…

a1730c1

…rge-after-nits, BerriAI/litellm#26636 merge-after-nits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(anthropic): preserve all tool_calls when an OpenAI delta contains multiple#26636

fix(anthropic): preserve all tool_calls when an OpenAI delta contains multiple#26636
freddyhaddad wants to merge 1 commit into
BerriAI:mainfrom
freddyhaddad:fix/anthropic-stream-multi-tool-call

freddyhaddad commented Apr 27, 2026

Uh oh!

CLAassistant commented Apr 27, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

Uh oh!

freddyhaddad commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

freddyhaddad commented Apr 27, 2026

Relevant issues

Pre-Submission checklist

Type

Changes

Problem

Trigger

Real-world symptom

Fix

Code change

Screenshots / Proof of Fix

Before

After

Claude Code TUI verification (end-to-end)

Tests

Backwards compatibility

Uh oh!

CLAassistant commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

codecov Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

freddyhaddad commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Apr 27, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 27, 2026 •

edited

Loading

codecov Bot commented Apr 27, 2026 •

edited

Loading

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading