Skip to content

MCP tools that return images cause 400 Invalid 'messages' against strict OpenAI-compatible servers (LM Studio) #3616

@mohitsoni48

Description

@mohitsoni48

MCP tools that return images cause 400 Invalid 'messages' against strict OpenAI-compatible servers (LM Studio)

Summary

When an MCP tool returns an image (e.g. playwright/browser_take_screenshot, filesystem/read_media_file, computer-use/screenshot, android/take_android_screenshot), qwen-code emits an OpenAI role: "tool" message whose content is an array containing an image_url part. The OpenAI Chat Completions spec does not permit image / audio / video / file parts in tool-role messages ??? content must be a string or an array of text parts only.

Permissive providers silently accept this, but strict OpenAI-compatible servers reject the request:

API Error: 400 "Invalid 'messages' in payload. Please check the structure of your 'messages' and try again."

This breaks every vision workflow against LM Studio (and likely other strict compat layers) when images flow through MCP tool returns rather than the @path user-message ingestion path.

Reproduction

  • qwen-code: 0.15.2
  • Backend: LM Studio localhost:1234/v1 (OpenAI-compat)
  • Model: any vision-capable model loaded in LM Studio (verified with Qwen3.6-35B-A3B, also qwen3-vl-8b)
  • Steps:
    1. Configure an MCP server that returns images (e.g. @playwright/mcp browser, or @modelcontextprotocol/server-filesystem with read_media_file).
    2. From qwen, invoke a tool that returns an image, e.g. browser_take_screenshot after browser_navigate, or read_media_file on a JPEG.
    3. The next model turn fails with the 400 above.

Direct verification that the model/server are fine ??? sending the identical image as image_url inside a role: "user" message via curl succeeds. So this is purely a payload-shape bug in qwen-code's OpenAI converter, not a model or server issue.

Root cause

packages/core/src/core/openaiContentGenerator/converter.ts ??? createToolMessage() (around line 529) builds a tool message whose content array can include image_url / input_audio / video_url / file parts, then casts the result to ChatCompletionContentPartText[] via as unknown as ??? the existing comment ("some OpenAI-compatible APIs support richer content in tool messages") explicitly flags that this relies on provider permissiveness. Strict providers reject it.

Proposed fix

At the call site (processContent, around line 437), after building the tool message, split any non-text media content into a follow-up role: "user" message. The tool message keeps only its text payload (spec-compliant); the model still receives the media in the immediately-following user turn.

This matches how Anthropic's converter handles tool-call image returns (text in tool_result, image in user content).

A patch is attached / linked from PR #.

Workarounds

  • Use @path to attach images via user message instead of MCP tool returns (only works for files inside the workspace).
  • Switch from LM Studio to a permissive OpenAI-compat backend.

Both are unsatisfying; the correct fix is the converter change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions