Bug Description
Parallel tool calls fail when using the Responses API (/v1/responses) with Gemini 3 models (e.g. gemini-3-pro-preview) in streaming mode. The second and subsequent tool results raise:
litellm.APIConnectionError: Missing corresponding tool call for tool response message.
Steps to Reproduce
- Send a request to
/v1/responses with stream: true, a Gemini 3 model, and tools defined
- The model returns 2+ parallel tool calls (e.g.
tool_a and tool_b)
- Execute the tools and send results back as
function_call_output input items with the call_id values from the response
Expected Behavior
All tool results are matched to their corresponding tool calls and the conversation continues.
Actual Behavior
litellm.APIConnectionError: Missing corresponding tool call for tool response message.
Received - message={'role': 'tool', 'tool_call_id': 'call_AAA', 'content': '...'},
last_message_with_tool_calls={'role': 'assistant', 'tool_calls': [{'id': 'call_BBB', ...}]}
The first tool result (call_AAA) can't find its tool call because last_message_with_tool_calls only contains the second tool call (call_BBB).
Root Cause
The Responses API transformation layer (litellm/responses/litellm_completion_transformation/transformation.py) creates one separate assistant message per function_call input item, each containing a single tool call:
assistant msg 1: tool_calls = [{id: "call_AAA", name: "tool_a"}]
assistant msg 2: tool_calls = [{id: "call_BBB", name: "tool_b"}]
tool msg 1: tool_call_id = "call_AAA"
tool msg 2: tool_call_id = "call_BBB"
The Gemini format converter in litellm/llms/vertex_ai/gemini/transformation.py merges consecutive assistant messages into a single model turn, but on this line:
last_message_with_tool_calls = assistant_msg
...it overwrites on each iteration, so only the last assistant message's tool_calls survive. When convert_to_gemini_tool_call_result then tries to match tool results via exact ID comparison, the first tool result can't find its ID in the last assistant message.
Suggested Fix
Accumulate tool_calls from all consecutive assistant messages instead of overwriting:
_tool_calls = assistant_msg.get("tool_calls") or []
if _tool_calls:
if last_message_with_tool_calls is None:
last_message_with_tool_calls = {"tool_calls": list(_tool_calls)}
else:
last_message_with_tool_calls["tool_calls"].extend(_tool_calls)
Environment
- LiteLLM version: 1.82.0 (also reproduced on 1.80.x)
- Model:
gemini/gemini-3-pro-preview
- API endpoint:
/v1/responses with stream: true
- Using tool calling with 2+ parallel function calls
Bug Description
Parallel tool calls fail when using the Responses API (
/v1/responses) with Gemini 3 models (e.g.gemini-3-pro-preview) in streaming mode. The second and subsequent tool results raise:Steps to Reproduce
/v1/responseswithstream: true, a Gemini 3 model, and tools definedtool_aandtool_b)function_call_outputinput items with thecall_idvalues from the responseExpected Behavior
All tool results are matched to their corresponding tool calls and the conversation continues.
Actual Behavior
The first tool result (
call_AAA) can't find its tool call becauselast_message_with_tool_callsonly contains the second tool call (call_BBB).Root Cause
The Responses API transformation layer (
litellm/responses/litellm_completion_transformation/transformation.py) creates one separate assistant message perfunction_callinput item, each containing a single tool call:The Gemini format converter in
litellm/llms/vertex_ai/gemini/transformation.pymerges consecutive assistant messages into a singlemodelturn, but on this line:...it overwrites on each iteration, so only the last assistant message's tool_calls survive. When
convert_to_gemini_tool_call_resultthen tries to match tool results via exact ID comparison, the first tool result can't find its ID in the last assistant message.Suggested Fix
Accumulate
tool_callsfrom all consecutive assistant messages instead of overwriting:Environment
gemini/gemini-3-pro-preview/v1/responseswithstream: true