Skip to content

fix(core): enforce adjacent tool results#4622

Merged
wenshao merged 2 commits into
QwenLM:mainfrom
he-yufeng:fix/tool-result-adjacency-cleanup
May 31, 2026
Merged

fix(core): enforce adjacent tool results#4622
wenshao merged 2 commits into
QwenLM:mainfrom
he-yufeng:fix/tool-result-adjacency-cleanup

Conversation

@he-yufeng

Copy link
Copy Markdown
Contributor

Fixes #4619.

Summary

  • make cleanOrphanedToolCalls() keep only tool responses from the contiguous tool block after the assistant message that declared the tool call
  • remove assistant tool_calls whose results were separated by another user/assistant turn
  • add a regression test for a history-repair sequence where call_b exists globally but is no longer adjacent to its assistant turn

Tests

  • npm run test --workspace=packages/core -- converter.test.ts
  • npm run typecheck --workspace=packages/core
  • npx eslint packages/core/src/core/openaiContentGenerator/converter.ts packages/core/src/core/openaiContentGenerator/converter.test.ts --max-warnings 0
  • npm run build --workspace=packages/core
  • git diff --check

!('tool_call_id' in nextMessage) ||
!nextMessage.tool_call_id
) {
break;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The break here triggers on any non-tool message, including consecutive assistant messages. cleanOrphanedToolCalls runs at line 385 before mergeConsecutiveAssistantMessages at line 387, so consecutive model turns in Gemini history (common after compaction, rewind, or partial-push recovery — see geminiChat.ts:793-799) produce consecutive assistant messages that break this scan immediately.

Concrete trace:

[assistant(tool_calls=[A])]
[assistant(content="...")]   ← role !== 'tool' → break. A gets zero adjacent responses.
[tool(A)]                    ← silently dropped

The same break also fires when splitToolMedia is enabled (default for mimo provider): the converter's own media user message lands between tool messages from different content items, breaking the scan and dropping valid parallel tool calls.

This is a regression from the old global-set matching. Fix options:

  1. Swap the pipeline order: run mergeConsecutiveAssistantMessages before cleanOrphanedToolCalls.
  2. Skip (don't break on) non-tool-call assistant messages and converter-generated media messages in the inner scan.

— qwen3.7-max via Qwen Code /review

const nextMessage = messages[nextIndex];
if (
nextMessage.role !== 'tool' ||
!('tool_call_id' in nextMessage) ||

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] !nextMessage.tool_call_id is truthy for empty string ''. createToolMessage (lines 776/784) sets tool_call_id: response.id || '', so when response.id is falsy, the tool message gets an empty-string ID. This causes the scan to break here, dropping all subsequent valid tool responses in the contiguous block.

Consider separating the missing-property check from the empty-string check:

Suggested change
!('tool_call_id' in nextMessage) ||
if (nextMessage.role !== 'tool' || !('tool_call_id' in nextMessage)) {
break;
}
if (!nextMessage.tool_call_id) {
continue;
}

— qwen3.7-max via Qwen Code /review

@@ -1394,44 +1394,61 @@ function cleanOrphanedToolCalls(
messages: OpenAI.Chat.ChatCompletionMessageParam[],

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The function drops tool calls and responses silently — no logging at all. The file already uses createDebugLogger('CONVERTER') elsewhere. Consider adding a debug log when calls or responses are dropped, to aid production debugging of orphan-related issues:

if (droppedIds.size > 0) {
  debugLog('Dropped non-adjacent tool calls: %o', [...droppedIds]);
}

— qwen3.7-max via Qwen Code /review

}

if (toolCallIds.has(nextMessage.tool_call_id)) {
adjacentToolResponseIds.add(nextMessage.tool_call_id);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The loop intentionally continues past tool messages whose tool_call_id doesn't match the current assistant's calls (no else { break; }). This is correct — it tolerates orphaned tool responses from other calls within a contiguous block. However, this behavior is undocumented. A future maintainer who reads this loop and adds else { break; } would silently break parallel tool calls in histories where compaction left orphaned responses in the middle of a block.

Consider adding a comment:

// Non-matching tool responses are skipped (they may belong to other
// assistants' calls); only non-tool messages terminate the scan.

— qwen3.7-max via Qwen Code /review

});
});

it('should drop tool responses that are not adjacent to their assistant tool call', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The test covers partial trimming well, but several branches of the new code are untested:

  1. Content-fallback branch: assistant with text content + all-orphaned tool_calls → kept as text-only (lines 1459-1470 in converter.ts)
  2. Back-to-back assistant messages: consecutive model contents with tool_calls — directly related to the Critical adjacency-break bug
  3. splitToolMedia interleaving: parallel tool responses across multiple content items with media

Adding test cases for these would help catch the Critical bug and lock in the intended behavior.

— qwen3.7-max via Qwen Code /review

}
} else if (
typeof message.content === 'string' &&
message.content.trim()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Test gap: the content-fallback branch silently drops assistant messages that have only tool_calls (no text content, or content: '' from reasoning-only thinking-model turns) when all tool_calls are orphaned. The existing 5 new tests cover partial trimming (test 1), content-fallback with text (test 2), empty-id responses (test 3), merged consecutive assistants (test 4), and split media adjacency (test 5). None covers this silent-drop scenario.

Consider adding a test that constructs a model turn with only functionCall parts (no text), where none of the calls have adjacent responses. Assert the assistant message is absent from the output and no tool messages are emitted.

it('should drop assistant with only tool_calls when all are orphaned', () => {
  const request: GenerateContentParameters = {
    model: 'models/test',
    contents: [
      {
        role: 'model',
        parts: [
          { functionCall: { id: 'call_orphan', name: 'read_file', args: {} } },
        ],
      },
      { role: 'user', parts: [{ text: 'break adjacency' }] },
      { role: 'user', parts: [{ text: 'continue' }] },
    ],
  };
  const messages = converter.convertGeminiRequestToOpenAI(request, requestContext);
  expect(messages.some((m) => m.role === 'assistant')).toBe(false);
  expect(messages.some((m) => m.role === 'tool')).toBe(false);
});

— qwen3.7-max via Qwen Code /review

const adjacentToolResponseIdsByAssistant = new Map<number, Set<string>>();
const validToolResponseIndexesByAssistant = new Map<number, number[]>();
const splitMediaIndexesByAssistant = new Map<number, number[]>();
const validToolResponseIndexes = new Set<number>();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] validToolResponseIndexes (Set) and validToolResponseIndexesByAssistant (Map) appear to be unnecessary. Any tool response added to validToolResponseIndexes during pass 1 was matched against some assistant's toolCallIds, so that assistant has validToolCalls.length > 0 and emits the response inline via emittedWithAssistant. Since assistants always precede their responses in the array, the assistant is processed first in the main loop — the response's index is already in emittedWithAssistant when the loop reaches it.

The if (validToolResponseIndexes.has(index)) true-branch at line 1538 is therefore unreachable. The else branch (drop with debug log) IS reachable for orphaned responses.

Suggested simplification: Remove both data structures. In the main loop's tool-response branch, just drop all tool responses that reach it (valid ones were already emitted with their assistant):

Suggested change
const validToolResponseIndexes = new Set<number>();
} else if (message.role === 'tool' && 'tool_call_id' in message) {
debugLogger.debug(
`cleanOrphanedToolCalls: dropping orphaned tool response ${message.tool_call_id || '<empty>'}`,
);

— qwen3.7-max via Qwen Code /review

continue;
}

if (nextMessage.role === 'assistant' && !hasToolCalls(nextMessage)) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The continue past text-only assistants assumes consecutive assistants have already been merged by mergeConsecutiveAssistantMessages. Without that precondition, the scan would walk past an intermediate text-only assistant and attribute tool responses on the other side to the wrong declaring assistant.

Currently safe because merge runs first at line 384. But there's no assertion, JSDoc, or comment documenting this dependency. If this function is ever called from a new code path that doesn't pre-merge (e.g., a non-Gemini converter, a daemon replay path), the scan silently misattributes responses.

Consider adding a JSDoc @precondition note:

Suggested change
if (nextMessage.role === 'assistant' && !hasToolCalls(nextMessage)) {
/**
* Clean up orphaned tool calls from message history to prevent OpenAI API errors.
*
* @precondition Consecutive assistant messages must be merged first
* (via mergeConsecutiveAssistantMessages). Without this, the adjacency
* scan may attribute tool responses to the wrong declaring assistant.
*/

— qwen3.7-max via Qwen Code /review

@he-yufeng he-yufeng force-pushed the fix/tool-result-adjacency-cleanup branch from 8f5d58a to eddd7a2 Compare May 30, 2026 07:06
@he-yufeng

Copy link
Copy Markdown
Contributor Author

Updated the branch to address the latest review callouts.

Changes:

  • added coverage for the all-orphaned, tool-call-only assistant case
  • removed the unreachable valid-tool-response fallback branch so orphaned tool responses are consistently logged/dropped once they reach the cleanup pass
  • documented the cleanup precondition around pre-merged consecutive assistant turns

Validation:

  • npm run test --workspace=packages/core -- converter.test.ts
  • npm run typecheck --workspace=packages/core
  • npx eslint packages/core/src/core/openaiContentGenerator/converter.ts packages/core/src/core/openaiContentGenerator/converter.test.ts --max-warnings 0
  • npx prettier --check packages/core/src/core/openaiContentGenerator/converter.ts packages/core/src/core/openaiContentGenerator/converter.test.ts
  • npm run build --workspace=packages/core
  • git diff --check

@wenshao

wenshao commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Verification Report — PR #4622

Commit: eddd7a26d (fix(core): enforce adjacent tool results)
Base: main
Tester: wenshao
Date: 2026-05-30


Test Results

Check Result Details
Unit tests (converter.test.ts) PASS 6 files, 208 tests passed
TypeScript (tsc --noEmit) PASS 0 errors
ESLint (--max-warnings 0) PASS 0 warnings, 0 errors
Build (packages/core) PASS Successfully copied files
git diff --check PASS No whitespace errors

Test File Breakdown

File Tests Time
openaiContentGenerator/converter.test.ts 108 24ms
anthropicContentGenerator/converter.test.ts 50 11ms
claude-converter.test.ts 19 397ms
schemaConverter.test.ts 12 4ms
toml-to-markdown-converter.test.ts 10 5ms
gemini-converter.test.ts 9 4ms

Verdict

All 5 checks from the PR test plan pass. No regressions detected. Safe to merge.

@@ -1390,152 +1390,164 @@ function mapGeminiFinishReasonToOpenAI(
/**
* Clean up orphaned tool calls from message history to prevent OpenAI API errors.
*/

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The JSDoc /** Clean up orphaned tool calls from message history to prevent OpenAI API errors. */ was originally above cleanOrphanedToolCalls, but when hasToolCalls() and isSplitToolMediaMessage() were extracted, the JSDoc got stranded above hasToolCalls. Now hasToolCalls carries documentation describing a completely different function, while cleanOrphanedToolCalls (line 1424) only has a terse line comment.

Suggested change
*/
/** Type guard: is this an assistant message with at least one tool call? */
function hasToolCalls(

Then move the original JSDoc to above cleanOrphanedToolCalls:

/**
 * Clean up orphaned tool calls from message history to prevent OpenAI API errors.
 * Assumes consecutive assistant messages have already been merged.
 */
function cleanOrphanedToolCalls(

— qwen3.7-max via Qwen Code /review


if (isSplitToolMediaMessage(nextMessage)) {
splitMediaIndexes.push(nextIndex);
continue;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The forward scan unconditionally collects every isSplitToolMediaMessage into splitMediaIndexes, regardless of which tool response the split media belongs to. If an orphaned tool response with images precedes a valid tool response in the same scan window, the orphaned response's images are emitted alongside the scanning assistant's valid tool calls — the model sees images from a tool it never called.

Concrete scenario with splitToolMedia: true:

[0] assistant: tool_calls [call_a]
[1] tool: call_x          (orphaned — no assistant declared call_x)
[2] user: SPLIT_TOOL_MEDIA (images stripped from call_x's response)
[3] tool: call_a           (valid response for call_a)

The scan from [0] records [2] in splitMediaIndexes and [3] in toolResponseIndexes. Output: assistant(call_a), tool(call_a), SPLIT_TOOL_MEDIA(images from call_x).

Consider tracking which tool response each split media follows, and only emitting media for surviving tool calls. Or break the scan on split media that follows a non-matching tool response.

— qwen3.7-max via Qwen Code /review

).tool_calls = finalValidToolCalls;
finalCleaned.push(cleanedMessage);
}
} else if (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The text-content fallback typeof message.content === 'string' && message.content.trim() doesn't check reasoning_content. An assistant with content: '' and reasoning_content: '...' (produced by reasoning models like DeepSeek/QwQ when issuing tool calls with chain-of-thought) is silently dropped — no debug log is emitted either.

This loses chain-of-thought reasoning from the conversation history. Consider broadening the check:

Suggested change
} else if (
} else if (
(typeof message.content === 'string' &&
message.content.trim()) ||
(message as ExtendedChatCompletionAssistantMessageParam)
.reasoning_content
) {

Also consider adding a debug log for the implicit drop when neither condition is met:

} else {
  debugLogger.debug(
    `cleanOrphanedToolCalls: dropping assistant with ${message.tool_calls.length} orphaned tool call(s) and no text/reasoning content`,
  );
}

— qwen3.7-max via Qwen Code /review

@he-yufeng

Copy link
Copy Markdown
Contributor Author

One CI follow-up: the Windows test job failed in packages/cli at src/ui/components/InputPrompt.test.tsx:375 (expected onSubmit('commit this'), 0 calls). That test is outside the touched converter path; the focused core tests, typecheck, lint, format check, and build all passed locally, and ubuntu/macos CI are green. I tried to rerun the failed job, but GitHub reports rerun requires repository admin rights.

@he-yufeng

Copy link
Copy Markdown
Contributor Author

Thanks for the verification pass. The latest branch includes the requested follow-ups, but GitHub still shows the older changes-requested review state. Could you re-review or clear the stale review when convenient? The only remaining red CI item appears to be the unrelated Windows InputPrompt test noted above; I do not have permission to rerun that job.

processContents(request.contents, messages, requestContext);

// Clean up orphaned tool calls and merge consecutive assistant messages
messages = mergeConsecutiveAssistantMessages(messages);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] cleanOrphanedToolCalls can re-introduce consecutive assistant messages in the output after mergeConsecutiveAssistantMessages already eliminated them. When an orphaned tool response sits between two assistant messages, dropping it creates adjacent assistants:

Input:  assistant(call_a), tool(a), assistant("text"), tool(orphan), assistant(call_b), tool(b)
After merge: unchanged (no consecutive assistants)
After clean: tool(orphan) dropped → assistant(call_a), tool(a), assistant("text"), assistant(call_b), tool(b)
→ 3 consecutive assistant messages

This is reachable whenever history compaction or rewind leaves an orphaned tool response between model turns. Some providers reject consecutive assistant messages, and downstream code that assumes no consecutive assistants after conversion will be surprised.

Suggested change
messages = mergeConsecutiveAssistantMessages(messages);
messages = mergeConsecutiveAssistantMessages(messages);
if (options.cleanOrphanToolCalls) {
messages = cleanOrphanedToolCalls(messages);
messages = mergeConsecutiveAssistantMessages(messages);
}

— qwen3.7-max via Qwen Code /review

@he-yufeng

Copy link
Copy Markdown
Contributor Author

Pushed 3a96af9 with the review follow-up. It now re-merges assistant turns after orphan cleanup, keeps split media only when it follows a surviving adjacent tool response, preserves reasoning-only assistant content when orphaned tool calls are removed, and moves the cleanup docs back onto the right helper. I also closed duplicate #4638 so #4622 remains the single review thread for #4619. Validation: npm run test --workspace=packages/core -- converter.test.ts; npm run typecheck --workspace=packages/core; npx eslint on converter.ts and converter.test.ts; npm run build --workspace=packages/core; git diff --check.

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 这个 PR 改动量大但逻辑正确——从全局 ID 匹配改为严格邻接检测,scan-forward 算法处理了各种边界情况(orphaned call、split media、reasoning content、merge 后的 consecutive assistant),测试覆盖非常充分。

@wenshao wenshao merged commit 9dafd60 into QwenLM:main May 31, 2026
10 checks passed
xaelistic pushed a commit to xaelistic/qwen-code that referenced this pull request Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(core): validate tool_result adjacency in cleanOrphanedToolCalls to prevent Anthropic API errors

3 participants