fix(openai-responses): strip Anthropic cache_control from Responses API requests#28431
Conversation
OpenAI's Responses API rejects unknown fields on input content blocks with HTTP 400 ("Unknown parameter: 'input[0].content[0].cache_control'").
Chat Completions already strips Anthropic-only `cache_control` markers via `remove_cache_control_flag_from_messages_and_tools`. Mirror that behavior in the native OpenAI Responses path so cross-provider callers (e.g. AnthropicCacheControlHook upstream of a fallback chain) don't trip a 400 when the primary route is OpenAI Responses.
Strips from both input content blocks and tools for symmetry with the chat path.
Greptile SummaryThis PR fixes a
Confidence Score: 4/5Safe to merge — the change is a targeted, in-place strip of Anthropic-only cache_control keys using the same recursive utility the chat path already relies on, with no modifications to existing tests. The core logic is correct: filter_value_from_dict is called on each top-level message dict and each tool dict, and its recursive descent into nested lists/dicts correctly reaches content blocks where cache_control lives. The mutation-in-place pattern matches the existing chat path. The only gap is a loose tuple return annotation on _strip_cache_control_flag. No files require special attention; both changed files are self-contained and straightforward.
|
| Filename | Overview |
|---|---|
| litellm/llms/openai/responses/transformation.py | Adds _strip_cache_control_flag static method and wires it into transform_responses_api_request; logic is correct and symmetric with the chat path, minor return-type annotation could be tightened. |
| tests/test_litellm/llms/openai/responses/test_openai_responses_transformation.py | Three new mock-only tests covering the exact error scenario, tool symmetry, and a no-op passthrough; no existing tests modified. |
Reviews (1): Last reviewed commit: "fix(openai): strip cache_control from Re..." | Re-trigger Greptile
| def _strip_cache_control_flag( | ||
| input: Union[str, ResponseInputParam], | ||
| response_api_optional_request_params: Dict, | ||
| ) -> tuple: |
There was a problem hiding this comment.
The return annotation
tuple is untyped — mypy will infer tuple[Any, ...] for the return, losing the precise types. Using Tuple[Union[str, ResponseInputParam], Dict] makes the contract explicit and keeps mypy checks tight on callers.
| ) -> tuple: | |
| ) -> Tuple[Union[str, ResponseInputParam], Dict]: |
There was a problem hiding this comment.
Fixed in 4f6d962 — return annotation is now Tuple[Union[str, ResponseInputParam], Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]]]. Also took the opportunity to mirror the chat-path API shape: renamed to public remove_cache_control_flag_from_input_and_tools (sibling of remove_cache_control_flag_from_messages_and_tools), instance method, separate tools arg, and added the model: str hook param for subclass selective-skip.
Per review feedback: - Rename to public `remove_cache_control_flag_from_input_and_tools` (sibling of chat path's `remove_cache_control_flag_from_messages_and_tools`) - Instance method, takes `tools` as explicit arg with `ALL_RESPONSES_API_TOOL_PARAMS` typing, and `model: str` hook for subclass selective-skip — parallel signature to chat helper - Tighten return annotation to `Tuple[Union[str, ResponseInputParam], Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]]]` (addresses Greptile P2 on bare `tuple`)
f92e1b0
into
BerriAI:shin_agent_oss_staging_05_21_2026
|
🤖 litellm-agent: Squash-merged into staging branch Triage Summary Merge Confidence: 5/5 ✅ READY All checks green. Greptile 4/5, no blocking pattern findings, no CircleCI runs (OSS-typical). |
|
Update on this PR:
139 tests pass, mypy clean on touched file. |
…PI requests (BerriAI#28431) Squash-merged by litellm-agent from cwang-otto's PR.
…PI requests (#28431) Squash-merged by litellm-agent from cwang-otto's PR.
* fix(anthropic): handle empty streaming tool calls (#28549) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * [Feature][Bug Fix] Decouple Azure OpenAI Deployment ID from model name via base_model to fix gpt5 model routing (#28490) * feat(azure): decouple deployment ID from model name via base_model Azure OpenAI deployments have arbitrary names (deployment IDs) that may not match the underlying model. Previously, model-type detection (o-series, gpt-5, etc.) relied on substring matching against the deployment name, causing misrouted configs and rejected params when deployment names were non-standard (e.g. 'my-deployment-id' for gpt-5.2). This change extends the existing base_model field to drive model-type detection, config selection, supported param resolution, and param mapping throughout the Azure call path: - _get_azure_config() uses base_model for is_o_series/is_gpt_5 checks - get_provider_chat_config() threads base_model for Azure - get_supported_openai_params() accepts and uses base_model - get_optional_params() accepts base_model and passes it to all Azure config method calls (get_supported_openai_params, map_openai_params) - azure.py completion handler uses base_model for GPT-5 detection - Config internal methods (e.g. is_model_gpt_5_2_model) now receive base_model so features like logprobs are correctly enabled Fully backward compatible - when base_model is unset, behavior is identical. Existing o_series/ and gpt5_series/ prefix workarounds continue to work. Usage in proxy config: model_list: - model_name: my-gpt5 litellm_params: model: azure/my-deployment-id model_info: base_model: azure/gpt-5.2 Fixes: non-standard deployment names like 'prefix-gpt-5.2' rejecting logprobs/top_logprobs despite the underlying model supporting them. * Addressing Greptile comments. * gemini-3.1-flash-lite pricing (#27933) * feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> * fix(openai-responses): strip Anthropic cache_control from Responses API requests (#28431) Squash-merged by litellm-agent from cwang-otto's PR. * Treat None litellm_provider as wildcard in _check_provider_match (#28523) Squash-merged by litellm-agent from adityasingh2400's PR. * fix greptile * fix: use _azure_detection_model in default Azure branch of get_supported_openai_params Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(openai-responses): strip cache_control on compact endpoint as well Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Felipe Garé <90070734+FelipeRodriguesGare@users.noreply.github.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: withomasmicrosoft <withomas@microsoft.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: cwang-otto <chengxuan.wang@ottotheagent.com> Co-authored-by: Aditya Singh <60082699+adityasingh2400@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>
Title
fix(openai-responses): strip Anthropic cache_control from Responses API requestsRelevant issues
OpenAI's Responses API rejects unknown fields on input content blocks with HTTP 400:
This bites callers who layer the Anthropic cache-control hook above a fallback chain whose primary route is the native OpenAI Responses API. The chat completions path already strips
cache_controlviaremove_cache_control_flag_from_messages_and_tools(litellm/llms/openai/chat/gpt_transformation.py); the Responses path didn't.Pre-Submission checklist
Type
🐛 Bug Fix
Changes
litellm/llms/openai/responses/transformation.pytransform_responses_api_requestnow callsremove_cache_control_flag_from_input_and_toolsafter_validate_input_param.remove_cache_control_flag_from_messages_and_tools: same signature shape (model,input,tools), same recursive primitive (filter_value_from_dict), strips from bothinputcontent blocks andtoolsfor symmetry.Tuple[Union[str, ResponseInputParam], Optional[List[ALL_RESPONSES_API_TOOL_PARAMS]]]— precise types so callers stay mypy-strict.model: strparameter is unused today but mirrors the chat helper, leaving room for subclasses to selectively skip stripping (same hook the chat path comment calls out).tests/test_litellm/llms/openai/responses/test_openai_responses_transformation.pytest_transform_strips_cache_control_from_input_content_blocks— the failing case (matches the live OpenAI 400 wording).test_transform_strips_cache_control_from_tools— symmetry with chat path.test_transform_preserves_input_without_cache_control— no-op regression guard.Proof
Unit tests + mypy
Live verification (real OpenAI Responses API)
Ran a script that calls
litellm.aresponses(model="gpt-4o-mini", input=…, tools=…)withcache_control: {"type": "ephemeral"}injected on both an input content block and a tool definition, then performs the same call directly through the rawopenaiSDK as a negative control:Confirms (a) the bug is reproducible against the live API and (b) this PR's strip lets the same payload through unchanged from the caller's perspective.