Skip to content

fix(openai-responses): strip Anthropic cache_control from Responses API requests#28431

Merged
oss-pr-review-agent-shin[bot] merged 2 commits into
BerriAI:shin_agent_oss_staging_05_21_2026from
cwang-otto:fix/openai-responses-strip-cache-control-staging
May 21, 2026
Merged

fix(openai-responses): strip Anthropic cache_control from Responses API requests#28431
oss-pr-review-agent-shin[bot] merged 2 commits into
BerriAI:shin_agent_oss_staging_05_21_2026from
cwang-otto:fix/openai-responses-strip-cache-control-staging

Conversation

@cwang-otto

@cwang-otto cwang-otto commented May 21, 2026

Copy link
Copy Markdown
Contributor

Title

fix(openai-responses): strip Anthropic cache_control from Responses API requests

Relevant issues

OpenAI's Responses API rejects unknown fields on input content blocks with HTTP 400:

BadRequestError: Unknown parameter: 'input[0].content[0].cache_control'.

This bites callers who layer the Anthropic cache-control hook above a fallback chain whose primary route is the native OpenAI Responses API. The chat completions path already strips cache_control via remove_cache_control_flag_from_messages_and_tools (litellm/llms/openai/chat/gpt_transformation.py); the Responses path didn't.

Pre-Submission checklist

  • Made sure tests pass
  • Added new tests (3 new cases — input strip, tool strip, no-cache passthrough)
  • No regressions in openai/responses (103), chatgpt/responses (14), azure/response (22) — 139 passed
  • Mypy clean on touched file
  • Live-verified end-to-end against real OpenAI Responses API

Type

🐛 Bug Fix

Changes

litellm/llms/openai/responses/transformation.py

  • transform_responses_api_request now calls remove_cache_control_flag_from_input_and_tools after _validate_input_param.
  • New helper is the sibling of the chat path's remove_cache_control_flag_from_messages_and_tools: same signature shape (model, input, tools), same recursive primitive (filter_value_from_dict), strips from both input content blocks and tools for symmetry.
  • Returns Tuple[Union[str, ResponseInputParam], Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]]] — precise types so callers stay mypy-strict.
  • model: str parameter is unused today but mirrors the chat helper, leaving room for subclasses to selectively skip stripping (same hook the chat path comment calls out).

tests/test_litellm/llms/openai/responses/test_openai_responses_transformation.py

  • test_transform_strips_cache_control_from_input_content_blocks — the failing case (matches the live OpenAI 400 wording).
  • test_transform_strips_cache_control_from_tools — symmetry with chat path.
  • test_transform_preserves_input_without_cache_control — no-op regression guard.

Proof

Unit tests + mypy

litellm_pr_proof

Live verification (real OpenAI Responses API)

Ran a script that calls litellm.aresponses(model="gpt-4o-mini", input=…, tools=…) with cache_control: {"type": "ephemeral"} injected on both an input content block and a tool definition, then performs the same call directly through the raw openai SDK as a negative control:

[1] Calling litellm.aresponses() with cache_control on input + tools…
    OK — response text: 'pong'

[2] Calling raw OpenAI SDK with cache_control on input (no strip)…
    OK — OpenAI rejected as expected: BadRequestError
    error: Error code: 400 - {'error': {'message': "Unknown parameter: 'input[0].content[0].cache_control'.", 'type': 'invalid_request_error', 'param': 'input[0].content[0].cache_control', 'code': 'unknown_param…

=== Summary ===
  litellm strip works (200 OK):           True
  raw openai rejects cache_control (400): True

Confirms (a) the bug is reproducible against the live API and (b) this PR's strip lets the same payload through unchanged from the caller's perspective.

OpenAI's Responses API rejects unknown fields on input content blocks with HTTP 400 ("Unknown parameter: 'input[0].content[0].cache_control'").

Chat Completions already strips Anthropic-only `cache_control` markers via `remove_cache_control_flag_from_messages_and_tools`. Mirror that behavior in the native OpenAI Responses path so cross-provider callers (e.g. AnthropicCacheControlHook upstream of a fallback chain) don't trip a 400 when the primary route is OpenAI Responses.

Strips from both input content blocks and tools for symmetry with the chat path.
@greptile-apps

greptile-apps Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a BadRequestError: Unknown parameter: 'input[0].content[0].cache_control' that OpenAI's Responses API throws when Anthropic cache_control markers are present in request payloads. The fix mirrors the existing Chat Completions path by adding a _strip_cache_control_flag helper that recursively removes cache_control keys from input content blocks and tools using the shared filter_value_from_dict utility.

  • Adds _strip_cache_control_flag to OpenAIResponsesAPIConfig.transform_responses_api_request, called immediately after _validate_input_param so stripping applies to already-validated (dict-form) input.
  • Three new unit tests cover the stripping of input content blocks, stripping of tools, and a no-op passthrough for clean inputs; no existing tests were modified.

Confidence Score: 4/5

Safe to merge — the change is a targeted, in-place strip of Anthropic-only cache_control keys using the same recursive utility the chat path already relies on, with no modifications to existing tests.

The core logic is correct: filter_value_from_dict is called on each top-level message dict and each tool dict, and its recursive descent into nested lists/dicts correctly reaches content blocks where cache_control lives. The mutation-in-place pattern matches the existing chat path. The only gap is a loose tuple return annotation on _strip_cache_control_flag.

No files require special attention; both changed files are self-contained and straightforward.

Important Files Changed

Filename Overview
litellm/llms/openai/responses/transformation.py Adds _strip_cache_control_flag static method and wires it into transform_responses_api_request; logic is correct and symmetric with the chat path, minor return-type annotation could be tightened.
tests/test_litellm/llms/openai/responses/test_openai_responses_transformation.py Three new mock-only tests covering the exact error scenario, tool symmetry, and a no-op passthrough; no existing tests modified.

Reviews (1): Last reviewed commit: "fix(openai): strip cache_control from Re..." | Re-trigger Greptile

def _strip_cache_control_flag(
input: Union[str, ResponseInputParam],
response_api_optional_request_params: Dict,
) -> tuple:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The return annotation tuple is untyped — mypy will infer tuple[Any, ...] for the return, losing the precise types. Using Tuple[Union[str, ResponseInputParam], Dict] makes the contract explicit and keeps mypy checks tight on callers.

Suggested change
) -> tuple:
) -> Tuple[Union[str, ResponseInputParam], Dict]:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4f6d962 — return annotation is now Tuple[Union[str, ResponseInputParam], Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]]]. Also took the opportunity to mirror the chat-path API shape: renamed to public remove_cache_control_flag_from_input_and_tools (sibling of remove_cache_control_flag_from_messages_and_tools), instance method, separate tools arg, and added the model: str hook param for subclass selective-skip.

Per review feedback:

- Rename to public `remove_cache_control_flag_from_input_and_tools` (sibling of chat path's `remove_cache_control_flag_from_messages_and_tools`)

- Instance method, takes `tools` as explicit arg with `ALL_RESPONSES_API_TOOL_PARAMS` typing, and `model: str` hook for subclass selective-skip — parallel signature to chat helper

- Tighten return annotation to `Tuple[Union[str, ResponseInputParam], Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]]]` (addresses Greptile P2 on bare `tuple`)
@oss-pr-review-agent-shin oss-pr-review-agent-shin Bot changed the base branch from shin_agent_oss_staging_05_20_2026 to shin_agent_oss_staging_05_21_2026 May 21, 2026 02:55
@oss-pr-review-agent-shin oss-pr-review-agent-shin Bot merged commit f92e1b0 into BerriAI:shin_agent_oss_staging_05_21_2026 May 21, 2026
2 checks passed
@oss-pr-review-agent-shin

Copy link
Copy Markdown
Contributor

🤖 litellm-agent: Squash-merged into staging branch shin_agent_oss_staging_05_21_2026. Staging PR: #28432


Triage Summary
Adds a _strip_cache_control_flag helper to OpenAIResponsesAPIConfig.transform_responses_api_request that recursively removes cache_control keys from input content blocks and tools before sending requests to OpenAI's Responses API. OpenAI rejects these Anthropic-only fields with HTTP 400; the Chat Completions path already strips them via remove_cache_control_flag_from_messages_and_tools, and this PR mirrors that behavior for the Responses path. Three new unit tests cover the input strip, tool strip, and no-op passthrough cases.

Merge Confidence: 5/5 ✅ READY
Ready to ship.

All checks green. Greptile 4/5, no blocking pattern findings, no CircleCI runs (OSS-typical).

@cwang-otto

Copy link
Copy Markdown
Contributor Author

Update on this PR:

  • Addressed Greptile P2: tightened return annotation to Tuple[Union[str, ResponseInputParam], Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]]] in 4f6d962.
  • Refactor for symmetry: renamed helper to public remove_cache_control_flag_from_input_and_tools (sibling of chat path's remove_cache_control_flag_from_messages_and_tools), instance method, separate tools arg, added unused model: str param mirroring the chat helper's subclass-skip hook.
  • Live-verified end-to-end against real OpenAI Responses API (proof in updated PR description):
    • litellm.aresponses() with cache_control on input + tools → 200 OK
    • Raw openai SDK with the same payload → 400 Unknown parameter: 'input[0].content[0].cache_control' (negative control)

139 tests pass, mypy clean on touched file.

cwang-otto added a commit to cwang-otto/litellm that referenced this pull request May 21, 2026
…PI requests (BerriAI#28431)

Squash-merged by litellm-agent from cwang-otto's PR.
Sameerlite pushed a commit that referenced this pull request May 22, 2026
…PI requests (#28431)

Squash-merged by litellm-agent from cwang-otto's PR.
mateo-berri pushed a commit that referenced this pull request May 22, 2026
* fix(anthropic): handle empty streaming tool calls (#28549)

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* [Feature][Bug Fix] Decouple Azure OpenAI Deployment ID from model name via base_model to fix gpt5 model routing (#28490)

* feat(azure): decouple deployment ID from model name via base_model

Azure OpenAI deployments have arbitrary names (deployment IDs) that may
not match the underlying model. Previously, model-type detection
(o-series, gpt-5, etc.) relied on substring matching against the
deployment name, causing misrouted configs and rejected params when
deployment names were non-standard (e.g. 'my-deployment-id' for gpt-5.2).

This change extends the existing base_model field to drive model-type
detection, config selection, supported param resolution, and param
mapping throughout the Azure call path:

- _get_azure_config() uses base_model for is_o_series/is_gpt_5 checks
- get_provider_chat_config() threads base_model for Azure
- get_supported_openai_params() accepts and uses base_model
- get_optional_params() accepts base_model and passes it to all Azure
  config method calls (get_supported_openai_params, map_openai_params)
- azure.py completion handler uses base_model for GPT-5 detection
- Config internal methods (e.g. is_model_gpt_5_2_model) now receive
  base_model so features like logprobs are correctly enabled

Fully backward compatible - when base_model is unset, behavior is
identical. Existing o_series/ and gpt5_series/ prefix workarounds
continue to work.

Usage in proxy config:
  model_list:
    - model_name: my-gpt5
      litellm_params:
        model: azure/my-deployment-id
      model_info:
        base_model: azure/gpt-5.2

Fixes: non-standard deployment names like 'prefix-gpt-5.2' rejecting
logprobs/top_logprobs despite the underlying model supporting them.

* Addressing Greptile comments.

* gemini-3.1-flash-lite pricing (#27933)

* feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers

* fix pricing

* add service tier

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>

* fix(openai-responses): strip Anthropic cache_control from Responses API requests (#28431)

Squash-merged by litellm-agent from cwang-otto's PR.

* Treat None litellm_provider as wildcard in _check_provider_match (#28523)

Squash-merged by litellm-agent from adityasingh2400's PR.

* fix greptile

* fix: use _azure_detection_model in default Azure branch of get_supported_openai_params

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(openai-responses): strip cache_control on compact endpoint as well

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Felipe Garé <90070734+FelipeRodriguesGare@users.noreply.github.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: withomasmicrosoft <withomas@microsoft.com>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: cwang-otto <chengxuan.wang@ottotheagent.com>
Co-authored-by: Aditya Singh <60082699+adityasingh2400@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant