fix(core): enforce adjacent tool results#4622
Conversation
| !('tool_call_id' in nextMessage) || | ||
| !nextMessage.tool_call_id | ||
| ) { | ||
| break; |
There was a problem hiding this comment.
[Critical] The break here triggers on any non-tool message, including consecutive assistant messages. cleanOrphanedToolCalls runs at line 385 before mergeConsecutiveAssistantMessages at line 387, so consecutive model turns in Gemini history (common after compaction, rewind, or partial-push recovery — see geminiChat.ts:793-799) produce consecutive assistant messages that break this scan immediately.
Concrete trace:
[assistant(tool_calls=[A])]
[assistant(content="...")] ← role !== 'tool' → break. A gets zero adjacent responses.
[tool(A)] ← silently dropped
The same break also fires when splitToolMedia is enabled (default for mimo provider): the converter's own media user message lands between tool messages from different content items, breaking the scan and dropping valid parallel tool calls.
This is a regression from the old global-set matching. Fix options:
- Swap the pipeline order: run
mergeConsecutiveAssistantMessagesbeforecleanOrphanedToolCalls. - Skip (don't break on) non-tool-call assistant messages and converter-generated media messages in the inner scan.
— qwen3.7-max via Qwen Code /review
| const nextMessage = messages[nextIndex]; | ||
| if ( | ||
| nextMessage.role !== 'tool' || | ||
| !('tool_call_id' in nextMessage) || |
There was a problem hiding this comment.
[Suggestion] !nextMessage.tool_call_id is truthy for empty string ''. createToolMessage (lines 776/784) sets tool_call_id: response.id || '', so when response.id is falsy, the tool message gets an empty-string ID. This causes the scan to break here, dropping all subsequent valid tool responses in the contiguous block.
Consider separating the missing-property check from the empty-string check:
| !('tool_call_id' in nextMessage) || | |
| if (nextMessage.role !== 'tool' || !('tool_call_id' in nextMessage)) { | |
| break; | |
| } | |
| if (!nextMessage.tool_call_id) { | |
| continue; | |
| } |
— qwen3.7-max via Qwen Code /review
| @@ -1394,44 +1394,61 @@ function cleanOrphanedToolCalls( | |||
| messages: OpenAI.Chat.ChatCompletionMessageParam[], | |||
There was a problem hiding this comment.
[Suggestion] The function drops tool calls and responses silently — no logging at all. The file already uses createDebugLogger('CONVERTER') elsewhere. Consider adding a debug log when calls or responses are dropped, to aid production debugging of orphan-related issues:
if (droppedIds.size > 0) {
debugLog('Dropped non-adjacent tool calls: %o', [...droppedIds]);
}— qwen3.7-max via Qwen Code /review
| } | ||
|
|
||
| if (toolCallIds.has(nextMessage.tool_call_id)) { | ||
| adjacentToolResponseIds.add(nextMessage.tool_call_id); |
There was a problem hiding this comment.
[Suggestion] The loop intentionally continues past tool messages whose tool_call_id doesn't match the current assistant's calls (no else { break; }). This is correct — it tolerates orphaned tool responses from other calls within a contiguous block. However, this behavior is undocumented. A future maintainer who reads this loop and adds else { break; } would silently break parallel tool calls in histories where compaction left orphaned responses in the middle of a block.
Consider adding a comment:
// Non-matching tool responses are skipped (they may belong to other
// assistants' calls); only non-tool messages terminate the scan.— qwen3.7-max via Qwen Code /review
| }); | ||
| }); | ||
|
|
||
| it('should drop tool responses that are not adjacent to their assistant tool call', () => { |
There was a problem hiding this comment.
[Suggestion] The test covers partial trimming well, but several branches of the new code are untested:
- Content-fallback branch: assistant with text content + all-orphaned tool_calls → kept as text-only (lines 1459-1470 in converter.ts)
- Back-to-back assistant messages: consecutive
modelcontents with tool_calls — directly related to the Critical adjacency-break bug - splitToolMedia interleaving: parallel tool responses across multiple content items with media
Adding test cases for these would help catch the Critical bug and lock in the intended behavior.
— qwen3.7-max via Qwen Code /review
ae1ef9d to
8f5d58a
Compare
| } | ||
| } else if ( | ||
| typeof message.content === 'string' && | ||
| message.content.trim() |
There was a problem hiding this comment.
[Suggestion] Test gap: the content-fallback branch silently drops assistant messages that have only tool_calls (no text content, or content: '' from reasoning-only thinking-model turns) when all tool_calls are orphaned. The existing 5 new tests cover partial trimming (test 1), content-fallback with text (test 2), empty-id responses (test 3), merged consecutive assistants (test 4), and split media adjacency (test 5). None covers this silent-drop scenario.
Consider adding a test that constructs a model turn with only functionCall parts (no text), where none of the calls have adjacent responses. Assert the assistant message is absent from the output and no tool messages are emitted.
it('should drop assistant with only tool_calls when all are orphaned', () => {
const request: GenerateContentParameters = {
model: 'models/test',
contents: [
{
role: 'model',
parts: [
{ functionCall: { id: 'call_orphan', name: 'read_file', args: {} } },
],
},
{ role: 'user', parts: [{ text: 'break adjacency' }] },
{ role: 'user', parts: [{ text: 'continue' }] },
],
};
const messages = converter.convertGeminiRequestToOpenAI(request, requestContext);
expect(messages.some((m) => m.role === 'assistant')).toBe(false);
expect(messages.some((m) => m.role === 'tool')).toBe(false);
});— qwen3.7-max via Qwen Code /review
| const adjacentToolResponseIdsByAssistant = new Map<number, Set<string>>(); | ||
| const validToolResponseIndexesByAssistant = new Map<number, number[]>(); | ||
| const splitMediaIndexesByAssistant = new Map<number, number[]>(); | ||
| const validToolResponseIndexes = new Set<number>(); |
There was a problem hiding this comment.
[Suggestion] validToolResponseIndexes (Set) and validToolResponseIndexesByAssistant (Map) appear to be unnecessary. Any tool response added to validToolResponseIndexes during pass 1 was matched against some assistant's toolCallIds, so that assistant has validToolCalls.length > 0 and emits the response inline via emittedWithAssistant. Since assistants always precede their responses in the array, the assistant is processed first in the main loop — the response's index is already in emittedWithAssistant when the loop reaches it.
The if (validToolResponseIndexes.has(index)) true-branch at line 1538 is therefore unreachable. The else branch (drop with debug log) IS reachable for orphaned responses.
Suggested simplification: Remove both data structures. In the main loop's tool-response branch, just drop all tool responses that reach it (valid ones were already emitted with their assistant):
| const validToolResponseIndexes = new Set<number>(); | |
| } else if (message.role === 'tool' && 'tool_call_id' in message) { | |
| debugLogger.debug( | |
| `cleanOrphanedToolCalls: dropping orphaned tool response ${message.tool_call_id || '<empty>'}`, | |
| ); |
— qwen3.7-max via Qwen Code /review
| continue; | ||
| } | ||
|
|
||
| if (nextMessage.role === 'assistant' && !hasToolCalls(nextMessage)) { |
There was a problem hiding this comment.
[Suggestion] The continue past text-only assistants assumes consecutive assistants have already been merged by mergeConsecutiveAssistantMessages. Without that precondition, the scan would walk past an intermediate text-only assistant and attribute tool responses on the other side to the wrong declaring assistant.
Currently safe because merge runs first at line 384. But there's no assertion, JSDoc, or comment documenting this dependency. If this function is ever called from a new code path that doesn't pre-merge (e.g., a non-Gemini converter, a daemon replay path), the scan silently misattributes responses.
Consider adding a JSDoc @precondition note:
| if (nextMessage.role === 'assistant' && !hasToolCalls(nextMessage)) { | |
| /** | |
| * Clean up orphaned tool calls from message history to prevent OpenAI API errors. | |
| * | |
| * @precondition Consecutive assistant messages must be merged first | |
| * (via mergeConsecutiveAssistantMessages). Without this, the adjacency | |
| * scan may attribute tool responses to the wrong declaring assistant. | |
| */ |
— qwen3.7-max via Qwen Code /review
8f5d58a to
eddd7a2
Compare
|
Updated the branch to address the latest review callouts. Changes:
Validation:
|
Verification Report — PR #4622Commit: Test Results
Test File Breakdown
Verdict✅ All 5 checks from the PR test plan pass. No regressions detected. Safe to merge. |
| @@ -1390,152 +1390,164 @@ function mapGeminiFinishReasonToOpenAI( | |||
| /** | |||
| * Clean up orphaned tool calls from message history to prevent OpenAI API errors. | |||
| */ | |||
There was a problem hiding this comment.
[Suggestion] The JSDoc /** Clean up orphaned tool calls from message history to prevent OpenAI API errors. */ was originally above cleanOrphanedToolCalls, but when hasToolCalls() and isSplitToolMediaMessage() were extracted, the JSDoc got stranded above hasToolCalls. Now hasToolCalls carries documentation describing a completely different function, while cleanOrphanedToolCalls (line 1424) only has a terse line comment.
| */ | |
| /** Type guard: is this an assistant message with at least one tool call? */ | |
| function hasToolCalls( |
Then move the original JSDoc to above cleanOrphanedToolCalls:
/**
* Clean up orphaned tool calls from message history to prevent OpenAI API errors.
* Assumes consecutive assistant messages have already been merged.
*/
function cleanOrphanedToolCalls(— qwen3.7-max via Qwen Code /review
|
|
||
| if (isSplitToolMediaMessage(nextMessage)) { | ||
| splitMediaIndexes.push(nextIndex); | ||
| continue; |
There was a problem hiding this comment.
[Suggestion] The forward scan unconditionally collects every isSplitToolMediaMessage into splitMediaIndexes, regardless of which tool response the split media belongs to. If an orphaned tool response with images precedes a valid tool response in the same scan window, the orphaned response's images are emitted alongside the scanning assistant's valid tool calls — the model sees images from a tool it never called.
Concrete scenario with splitToolMedia: true:
[0] assistant: tool_calls [call_a]
[1] tool: call_x (orphaned — no assistant declared call_x)
[2] user: SPLIT_TOOL_MEDIA (images stripped from call_x's response)
[3] tool: call_a (valid response for call_a)
The scan from [0] records [2] in splitMediaIndexes and [3] in toolResponseIndexes. Output: assistant(call_a), tool(call_a), SPLIT_TOOL_MEDIA(images from call_x).
Consider tracking which tool response each split media follows, and only emitting media for surviving tool calls. Or break the scan on split media that follows a non-matching tool response.
— qwen3.7-max via Qwen Code /review
| ).tool_calls = finalValidToolCalls; | ||
| finalCleaned.push(cleanedMessage); | ||
| } | ||
| } else if ( |
There was a problem hiding this comment.
[Suggestion] The text-content fallback typeof message.content === 'string' && message.content.trim() doesn't check reasoning_content. An assistant with content: '' and reasoning_content: '...' (produced by reasoning models like DeepSeek/QwQ when issuing tool calls with chain-of-thought) is silently dropped — no debug log is emitted either.
This loses chain-of-thought reasoning from the conversation history. Consider broadening the check:
| } else if ( | |
| } else if ( | |
| (typeof message.content === 'string' && | |
| message.content.trim()) || | |
| (message as ExtendedChatCompletionAssistantMessageParam) | |
| .reasoning_content | |
| ) { |
Also consider adding a debug log for the implicit drop when neither condition is met:
} else {
debugLogger.debug(
`cleanOrphanedToolCalls: dropping assistant with ${message.tool_calls.length} orphaned tool call(s) and no text/reasoning content`,
);
}— qwen3.7-max via Qwen Code /review
|
One CI follow-up: the Windows test job failed in packages/cli at src/ui/components/InputPrompt.test.tsx:375 (expected onSubmit('commit this'), 0 calls). That test is outside the touched converter path; the focused core tests, typecheck, lint, format check, and build all passed locally, and ubuntu/macos CI are green. I tried to rerun the failed job, but GitHub reports rerun requires repository admin rights. |
|
Thanks for the verification pass. The latest branch includes the requested follow-ups, but GitHub still shows the older changes-requested review state. Could you re-review or clear the stale review when convenient? The only remaining red CI item appears to be the unrelated Windows InputPrompt test noted above; I do not have permission to rerun that job. |
| processContents(request.contents, messages, requestContext); | ||
|
|
||
| // Clean up orphaned tool calls and merge consecutive assistant messages | ||
| messages = mergeConsecutiveAssistantMessages(messages); |
There was a problem hiding this comment.
[Suggestion] cleanOrphanedToolCalls can re-introduce consecutive assistant messages in the output after mergeConsecutiveAssistantMessages already eliminated them. When an orphaned tool response sits between two assistant messages, dropping it creates adjacent assistants:
Input: assistant(call_a), tool(a), assistant("text"), tool(orphan), assistant(call_b), tool(b)
After merge: unchanged (no consecutive assistants)
After clean: tool(orphan) dropped → assistant(call_a), tool(a), assistant("text"), assistant(call_b), tool(b)
→ 3 consecutive assistant messages
This is reachable whenever history compaction or rewind leaves an orphaned tool response between model turns. Some providers reject consecutive assistant messages, and downstream code that assumes no consecutive assistants after conversion will be surprised.
| messages = mergeConsecutiveAssistantMessages(messages); | |
| messages = mergeConsecutiveAssistantMessages(messages); | |
| if (options.cleanOrphanToolCalls) { | |
| messages = cleanOrphanedToolCalls(messages); | |
| messages = mergeConsecutiveAssistantMessages(messages); | |
| } |
— qwen3.7-max via Qwen Code /review
|
Pushed 3a96af9 with the review follow-up. It now re-merges assistant turns after orphan cleanup, keeps split media only when it follows a surviving adjacent tool response, preserves reasoning-only assistant content when orphaned tool calls are removed, and moves the cleanup docs back onto the right helper. I also closed duplicate #4638 so #4622 remains the single review thread for #4619. Validation: npm run test --workspace=packages/core -- converter.test.ts; npm run typecheck --workspace=packages/core; npx eslint on converter.ts and converter.test.ts; npm run build --workspace=packages/core; git diff --check. |
BZ-D
left a comment
There was a problem hiding this comment.
LGTM. 这个 PR 改动量大但逻辑正确——从全局 ID 匹配改为严格邻接检测,scan-forward 算法处理了各种边界情况(orphaned call、split media、reasoning content、merge 后的 consecutive assistant),测试覆盖非常充分。
Fixes #4619.
Summary
cleanOrphanedToolCalls()keep only tool responses from the contiguous tool block after the assistant message that declared the tool calltool_callswhose results were separated by another user/assistant turncall_bexists globally but is no longer adjacent to its assistant turnTests
npm run test --workspace=packages/core -- converter.test.tsnpm run typecheck --workspace=packages/corenpx eslint packages/core/src/core/openaiContentGenerator/converter.ts packages/core/src/core/openaiContentGenerator/converter.test.ts --max-warnings 0npm run build --workspace=packages/coregit diff --check