fix(bedrock): stop base_model label from stripping tools/tool_choice#29621
Conversation
A Router/proxy Bedrock deployment whose model_info.base_model is a friendly label (e.g. claude-haiku-4-5) silently lost tools/tool_choice: the outgoing Converse request was built without toolConfig, so the model behaved as if no tools were provided. Worked in v1.84.0, regressed in v1.85.0, and with drop_params=true it failed silently. Two changes compound into the bug. completion() passed model_info.base_model as the model argument to get_optional_params, so the real Bedrock model id never reached supported-param resolution; and get_supported_openai_params resolved the provider config's params from base_model or model, letting the label fully replace the real model. For Bedrock the label resolves to no tool support, so tools/tool_choice were dropped before transformation. completion() now keeps model as the real deployment model and threads the resolved base_model (kwarg or model_info) through separately, and get_supported_openai_params treats base_model as additive: it returns the union of the params supported by model and by base_model. A hint can only add capabilities, never strip ones the real model already exposes, which also preserves the original base_model behavior from BerriAI#27717 and Azure's base_model driven model-type detection. Fixes BerriAI#29618
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR fixes a regression introduced in v1.85.0 where Bedrock deployments with a friendly
Confidence Score: 5/5Safe to merge; the fix is narrowly scoped to how base_model is resolved and applied in get_optional_params, all changed paths have explicit regression tests, and the additive-union design prevents any existing capability from being stripped. The two-line change in main.py is mechanical and the union logic in get_supported_openai_params is straightforward. Tests cover the Bedrock regression, gemini reasoning-effort preservation, Azure detection, and edge cases. One small gap exists where spreading None could throw if a provider config returns None for an unrecognized model id, but this does not affect the intended code paths. litellm/litellm_core_utils/get_supported_openai_params.py — the union spread lacks a None guard for the unlikely case where get_supported_openai_params returns None for an unrecognized model.
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/get_supported_openai_params.py | Core fix: base_model is now additive (union of both param sets) instead of a full replacement; one defensive null-check gap when either get_supported_openai_params call returns None. |
| litellm/main.py | Resolves base_model from kwarg or model_info, stops overwriting model with the base_model label when calling get_optional_params, threads base_model as a separate additive hint instead. |
| tests/test_litellm/litellm_core_utils/test_get_supported_openai_params.py | New test file with thorough coverage: Bedrock regression, additive direction, Azure detection preserved, base_model==model edge case, and the gemini capability-adding scenario. |
| tests/test_litellm/test_main.py | Updated parametrized test to assert both model stays the real deployment id and base_model is separately threaded through; adds expected_base_model_param per case. |
| tests/test_litellm/test_utils.py | Adds TestBedrockBaseModelLabelKeepsTools class with two integration-style tests confirming tools survive drop_params with the real model id, and confirming the label alone correctly drops them. |
Reviews (2): Last reviewed commit: "test(main): make base_model param test r..." | Re-trigger Greptile
Restore an explicit per-case expected_model_param literal instead of hardcoding the gemini id, so a future case with a different model can't produce a misleading assertion failure.
57e777d
into
BerriAI:litellm_oss_staging_040626
* fix(azure): apply api_version fallback chain to image edit URL
`AzureImageEditConfig.get_complete_url` only read `api_version` from
`litellm_params`. When callers configured it via `litellm.api_version`
or `AZURE_API_VERSION`, the constructed URL had no `?api-version=` and
Azure responded `404 Resource not found`.
Apply the same fallback chain the Azure chat path already uses in
`common_utils.py`:
litellm_params > litellm.api_version > AZURE_API_VERSION env >
litellm.AZURE_DEFAULT_API_VERSION
Adds 5 unit tests pinning each layer of the chain plus a regression
guard for `api_base` that already carries `?api-version=`.
* feat(mcp): core sampling and elicitation flow with security hardening
- Add sampling_handler.py: full MCP sampling/createMessage flow with
model selection (hint-based + priority-based), auth enforcement,
budget checks, route restriction gates, and tag policy pre-auth
- Add elicitation_handler.py: MCP elicitation/create relay with
downstream client capability detection
- Wire sampling/elicitation callbacks in mcp_server_manager.py
gated behind allow_sampling/allow_elicitation config flags
- Add allow_sampling/allow_elicitation fields to MCPServer type
- Fix session lock deadlock: skip lock for JSON-RPC response POSTs
(elicitation/sampling replies) with truncated-body heuristic
- Extend client.py with sampling_callback and elicitation_callback
- Security: RouteChecks gate, tag-budget bypass fix, x-forwarded-for
spoofing fix, Latin-1 header encoding guard
- Add 4 new test modules (model access, priority selection, request
builder, tool conversion) + update existing MCP tests
* fix(security): run pre-call guardrails before MCP sampling acompletion
Without this, an upstream MCP server with allow_sampling enabled could
send prompts that bypass every guardrail (content filtering, PII
redaction, prompt-injection detection) configured on /chat/completions.
- Call proxy_logging_obj.pre_call_hook(call_type='acompletion') before
llm_router.acompletion so guardrails fire for sampling sub-calls
- Add HTTPException to the re-raise list so guardrail rejections
propagate correctly instead of being swallowed as generic errors
* feat(bedrock_mantle): add Responses API support (/openai/v1/responses) (#29490)
* feat(bedrock_mantle): add Responses API transformation config
* test(bedrock_mantle): cover trailing-slash api_base normalization
* feat(bedrock_mantle): export BedrockMantleResponsesAPIConfig
* feat(bedrock_mantle): register gpt-5.x Responses config (gpt-oss unchanged)
* feat(bedrock_mantle): add gpt-5.5/gpt-5.4 Responses price-map entries
* refactor(bedrock_mantle): exclude gpt-oss instead of allow-listing gpt-5 for Responses routing
Frontier OpenAI models on Bedrock Mantle are Responses-only on /openai/v1/responses;
gpt-oss is the legacy family that also speaks chat-completions. Gate by excluding
gpt-oss (which keeps its chat-completions emulation) and defaulting everything else
to the native Responses config, so future frontier models (gpt-6, etc.) route
correctly without a code change. Verified against the live us-east-2 Mantle endpoint:
gpt-oss 400s on /openai/v1/responses while gpt-5.5 400s on both standard paths.
* test(bedrock_mantle): cover supports_native_websocket opt-out
Closes the one uncovered line flagged by codecov on the Responses config.
The assertion documents that Mantle Responses has no realtime/websocket
transport, so realtime routing must not attempt a socket it cannot serve.
* fix(bedrock_mantle): route file_search through emulation instead of forwarding to Mantle
BedrockMantleResponsesAPIConfig inherited supports_native_file_search()
-> True from OpenAIResponsesAPIConfig but never overrode it. Mantle has no
OpenAI vector stores, so a forwarded file_search tool is rejected with a
400 (verified upstream: Tool type 'file_search' is not supported). Opting
out, like the existing supports_native_websocket override, routes the tool
through LiteLLM's file_search emulation instead.
* fix(bedrock_mantle): only route openai.gpt frontier models to Responses
The previous gate excluded gpt-oss and routed every other model to the
native Responses config. But on Mantle only the OpenAI gpt frontier models
(gpt-5.x) are served on /openai/v1/responses; gpt-oss and the non-OpenAI
families (nvidia, mistral, google, zai, ...) are chat-completions only and
400 on that path. Allow-list the openai.gpt- family (excluding gpt-oss)
instead, so chat-only models fall through to the chat-completions emulation.
Verified against the live us-east-2 endpoint: nvidia.nemotron-nano-9b-v2
returns 400 on /openai/v1/responses and 200 on /v1/chat/completions.
* feat(custom_llm): allow streaming/astreaming to yield ModelResponseStream (#27580)
* fix(custom_llm): allow streaming/astreaming to yield ModelResponseStream directly
* fix(streaming): enhance ModelResponseStream handling for custom LLM providers
* fix(streaming): strip finish_reason from content chunks and ensure tool_calls are preserved
* fix(streaming): add type ignore for finish_reason assignment in CustomStreamWrapper
* fix(proxy): strip stack trace from HTTP 503 responses (CWE-209) (#28330)
* fix(proxy/cwe-209): strip Python traceback from HTTP 503 error responses
The /cache/ping endpoint included a full Python traceback in its 503 error
response body (inside the ProxyException message), leaking internal file
paths, line numbers, and call stacks to any caller. Two MCP route handlers
in proxy_server.py similarly interpolated str(e) into "Internal server
error" detail strings.
Fix: log the traceback server-side via verbose_proxy_logger.exception()
and omit it from the ProxyException payload / HTTPException detail returned
to clients. Tests updated to assert no "traceback" keyword or frame paths
appear in the 503 body, with a new dedicated regression test.
CWE-209: Generation of Error Message Containing Sensitive Information.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(proxy/cwe-209): apply Greptile P2 fixes and add MCP exception-path tests
Greptile 4/5 review identified two remaining gaps and Codecov reported
0% coverage on the two MCP handler exception branches:
1. caching_routes.py — str(e) in "Service Unhealthy ({str(e)})" could
still leak Redis hostnames/IPs; replaced with static "Service Unhealthy".
HTTPException is now re-raised before the generic handler so the
"cache not initialized" 503 still reaches callers with its detail.
Removed the redundant str(e) arg from verbose_proxy_logger.exception()
(exception() already appends the traceback automatically).
2. tests — two new unit tests cover the exception paths in
dynamic_mcp_route and toolset_mcp_route that were previously at 0%:
- test_dynamic_mcp_route_unexpected_exception_returns_500_without_traceback
- test_toolset_mcp_route_unexpected_exception_returns_500_without_traceback
All 25 tests pass (9 caching + 16 MCP).
CWE-209: Generation of Error Message Containing Sensitive Information.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(caching_routes): restore precise assertion in test_cache_ping_no_cache_initialized
The assertion was weakened to `"Cache not initialized" in str(data)`, which
matches the raw string of the entire response dict and would pass even if the
error moved to an unexpected field or changed structure.
Restore a targeted check on the parsed response: assert the exact string in
the correct field `data["detail"]`, matching FastAPI's HTTPException
serialisation format {"detail": "<message>"}.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(caching_routes): restore precise assertion and add CWE-209 no-cache path test
The assertion in test_cache_ping_no_cache_initialized was weakened to
`"Cache not initialized" in str(data)`, which matched against the raw string
representation of the entire response dict. This would pass silently even if
the error message moved to an unexpected field or the structure changed.
Restore a targeted assertion on the parsed field:
assert data["detail"] == "Cache not initialized. litellm.cache is None"
matching FastAPI's HTTPException serialisation format exactly.
Add test_cache_ping_no_cache_does_not_expose_internals to show the code path
is still working correctly after the CWE-209 fix: verifies that the HTTPException
is re-raised as-is (no traceback, no source paths), and asserts the complete
response structure is exactly {"detail": "Cache not initialized. litellm.cache is None"}.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(caching_routes): restore ProxyException envelope for null-cache 503
The except HTTPException: raise guard (added in the CWE-209 fix) caused
the null-cache HTTPException to escape as FastAPI's {"detail": "..."} shape
instead of the {"error": {...}} ProxyException envelope that callers expect.
Move the null-cache guard before the try block and raise ProxyException
directly so the response structure is consistent with all other /cache/ping
503s, and the except HTTPException: raise guard is only reachable by
unexpected downstream HTTPExceptions.
Update the two no-cache tests to assert the correct ProxyException envelope.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Update utils.py (#26609)
* feat(pricing): add Snowflake Cortex REST API model pricing (#26612)
* feat(pricing): add Snowflake Cortex REST API model pricing
## Summary
Adds pricing and context window information for 20+ Snowflake Cortex REST API models to `model_prices_and_context_window.json`.
## What's included
- **7 Claude models** (sonnet-4-5, sonnet-4-6, 4-sonnet, 4-opus, haiku-4-5, 3-7-sonnet, 3-5-sonnet) — with prompt caching rates
- **4 OpenAI models** (gpt-4.1, gpt-5, gpt-5-mini, gpt-5-nano) — with prompt caching rates
- **5 Llama models** (3.1-8b, 3.1-70b, 3.1-405b, 3.3-70b, 4-maverick)
- **1 DeepSeek model** (deepseek-r1)
- **1 Mistral model** (mistral-large2)
- **1 Snowflake model** (snowflake-llama-3.3-70b)
- **2 Embedding models** (arctic-embed-l-v2.0, arctic-embed-m-v2.0)
Each entry includes `input_cost_per_token`, `output_cost_per_token`, `cache_read_input_token_cost` (where applicable), `max_input_tokens`, `max_output_tokens`, and capability flags (`supports_function_calling`, `supports_vision`, `supports_prompt_caching`, `supports_reasoning`).
## Pricing source
All prices are in USD per token, sourced from the official [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf) — Tables 6(b) (REST API with Prompt Caching) and 6(c) (REST API).
## Context
The existing `snowflake/` provider has zero model entries in the pricing JSON, which means LiteLLM cannot track costs for Snowflake Cortex calls. This PR fills that gap.
## Related
- Existing provider: `litellm/llms/snowflake/`
- Cortex REST API docs: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-rest-api
* Update model_prices_and_context_window.json
Fix the JSON parsing error
* Update model_prices_and_context_window.json
Removed the duplicate entry
* fix(utils): copy extra_body before adding unknown params to prevent model config mutation (#29620)
Fixes #29615. In add_provider_specific_params_to_optional_params, the line:
extra_body = passed_params.pop("extra_body", None) or {}
returns the original dict reference when extra_body is non-empty (truthy).
Subsequent writes like extra_body[k] = passed_params[k] then mutate the
shared model config object held by the router, poisoning /model/info and
all subsequent requests for that deployment.
The or {} short-circuit creates a new dict only when extra_body is falsy
(None or {}), which is why the bug does not reproduce with extra_body: {}.
Fix: wrap in dict() so we always work on a fresh shallow copy.
* fix(vertex_ai): Bake tool_choice into Gemini CachedContent body to prevent silent drop (#29097)
* fix(vertex_ai): bake tool_choice into Gemini CachedContent body to prevent silent drop
* address greptile feedback on tool_choice cache test
* adds test that uses ToolConfig(functionCallingConfig=FunctionCallingConfig(mode=ANY)) instead of a dict literal, mirroring what map_tool_choice_values actually produce
* fix(gemini/veo): move image from parameters into instances[0] (#29501)
* fix(gemini/veo): move image from parameters into instances[0]
Veo's predictLongRunning schema puts image (and prompt) on the
instances element; parameters is for aspectRatio/durationSeconds/etc.
The Gemini path was leaving image in params_copy, so it ended up
nested under parameters and the API silently ignored it.
The Vertex path already builds the instance dict explicitly, so this
just aligns the Gemini path with it.
Fixes #29498
* address greptile: unconditional pop + BytesIO test
- Pop `image` from params_copy unconditionally so it never reaches
GeminiVideoGenerationParameters even when None, removing implicit
reliance on Pydantic's extra-field-ignore.
- Add test_transform_video_create_request_image_filelike_goes_to_instance
covering the BytesIO path (_convert_image_to_gemini_format) — round-trips
the base64 to confirm encoding.
- Add test_transform_video_create_request_image_none_is_dropped covering
the new None branch.
* fix(huggingface): handle special token text in embedding usage (#29660)
* fix(guardrails): recompile ToolPermissionGuardrail rules on update_in_memory_litellm_params (#29655)
* fix(guardrails): recompile ToolPermissionGuardrail rules on update_in_memory_litellm_params
ToolPermissionGuardrail builds self.rules and the compiled target/pattern
maps only in __init__. The base update_in_memory_litellm_params re-sets raw
attributes via setattr but never rebuilds those maps, so a guardrail updated
in place (PUT /guardrails, or the immediate in-memory sync) keeps enforcing
the construction-time rules until it is reinitialized (PATCH path, periodic
DB poll, or restart).
Extract the compile step into _load_rules and override
update_in_memory_litellm_params to rebuild from it (dict- and model-safe),
re-normalizing default_action / on_disallowed_action. Mirrors the existing
PresidioGuardrail override of the same method. Adds regression tests.
Fixes #29592.
* fix(guardrails): handle dict params in ToolPermissionGuardrail in-memory update
Delegate to super() only for LitellmParams input (the base setattr loop is
model-only); apply the raw-dict case inline. Fixes the mypy arg-type error
and makes the recompile work when the proxy passes the raw DB dict.
* fix(guardrails): preserve tool-permission rules on a partial in-memory update
A partial update (e.g. a LitellmParams whose rules field is None) ran through
the generic setattr, which set self.rules to None, and the recompile was
skipped, leaving the guardrail with no rules. Snapshot the previous rules and
restore them when the update carries no rules; an explicit empty list still
clears them. Adds a regression test for the rules-absent case.
Addresses the Greptile review note on #29655.
* fix(bedrock): stop base_model label from stripping tools/tool_choice (#29621)
* fix(bedrock): stop base_model label from stripping tools/tool_choice
A Router/proxy Bedrock deployment whose model_info.base_model is a friendly
label (e.g. claude-haiku-4-5) silently lost tools/tool_choice: the outgoing
Converse request was built without toolConfig, so the model behaved as if no
tools were provided. Worked in v1.84.0, regressed in v1.85.0, and with
drop_params=true it failed silently.
Two changes compound into the bug. completion() passed model_info.base_model
as the model argument to get_optional_params, so the real Bedrock model id
never reached supported-param resolution; and get_supported_openai_params
resolved the provider config's params from base_model or model, letting the
label fully replace the real model. For Bedrock the label resolves to no tool
support, so tools/tool_choice were dropped before transformation.
completion() now keeps model as the real deployment model and threads the
resolved base_model (kwarg or model_info) through separately, and
get_supported_openai_params treats base_model as additive: it returns the
union of the params supported by model and by base_model. A hint can only add
capabilities, never strip ones the real model already exposes, which also
preserves the original base_model behavior from #27717 and Azure's base_model
driven model-type detection.
Fixes #29618
* test(main): make base_model param test robust to new parametrize cases
Restore an explicit per-case expected_model_param literal instead of
hardcoding the gemini id, so a future case with a different model can't
produce a misleading assertion failure.
* fix(fireworks_ai): pass response_format json_schema through unchanged (#29606)
FireworksAIConfig.map_openai_params was rewriting the OpenAI strict
`{type: json_schema, json_schema: {name, strict, schema}}` shape into
`{type: json_object, schema: ...}` before sending to Fireworks, dropping
`strict` and `name` and changing the `type`. Per Fireworks' docs json_object
means "force any valid JSON output (no specific schema)", so the schema
constraint was effectively dropped and grammar-guided decoding never ran;
model output silently violated the schema.
The rewrite landed in #7085 (Dec 2024) when Fireworks did not yet accept
native json_schema. Fireworks accepts the OpenAI strict shape natively now,
so the rewrite has become a regression.
Removes the rewrite. Passes response_format through unchanged. Updates the
existing test_map_response_format to assert pass-through. Adds focused
regression tests in tests/test_litellm/ covering preservation of type,
strict, name, and schema body, plus that json_object alone still works.
* fix(types): import Required from typing_extensions in gemini types
* style: reformat sampling_handler.py for py312 black compat
* refactor(mcp-sampling): extract helpers to fix PLR0915 too-many-statements in handle_sampling_create_message
* fix(proxy-server): add explicit ProxyLogging type annotation to proxy_logging_obj to fix mypy inference
* fix(mcp-sampling): suppress mypy assignment error on ImportError fallback for proxy_logging_obj
* fix(test): use .value when comparing LlmProviders enum against string in test_default_api_base
* fix(test): iterate LlmProviders enum in test_default_api_base to avoid str pollution from custom provider registration
litellm.provider_list is a mutable global initialized to list(LlmProviders) but custom_llm_setup() appends plain provider strings to it. When a test_custom_llm.py test runs first in the same xdist worker, provider_list contains a str and calling .value on it raises AttributeError. Iterate the immutable LlmProviders enum instead, which is deterministic and what the check intends.
* fix(mcp): depth-aware JSON-RPC response detection and neutral speed-priority fallback
Replace the flat substring check in the truncated-body routing path with a
top-level-key scan so a JSON-RPC response whose result payload nests a
"method" field is still detected as a response and skips the session lock,
removing a deadlock against the in-flight tool call awaiting it.
Drop the inverse max_output_tokens speed proxy when no model exposes
output_tokens_per_second; context-window size does not track latency, so a
neutral score avoids biasing speedPriority toward the smallest-context model.
* fix(guardrails): make ToolPermission rule reload atomic on invalid regex
_load_rules appended each rule to self.rules before compiling its regex, so an
invalid pattern raised mid-loop after the bad rule was already live but without
a _compiled_rule_targets entry. _matches_regex reads a missing compiled target
as a None pattern and returns True, turning the bad rule into a match-all that
silently applies its decision to every tool. Via update_in_memory_litellm_params
(PUT /guardrails) this corrupted the live guardrail.
Build the parsed rules and compiled maps into locals and swap them in only after
every regex compiles, and restore the previous ruleset if a live update is
rejected, so an invalid regex now fails the update without leaving the guardrail
enforcing a broken policy.
* test(mcp): cover sampling conversion, model resolution, and elicitation relay paths
The MCP sampling and elicitation handlers shipped with partial test
coverage, leaving the response-to-MCP conversion, the model resolution
fallback chain, completion-kwargs assembly, guardrail routing, and the
entire elicitation relay untested. That pulled the PR's diff (patch)
coverage below the codecov threshold even though overall project
coverage rose.
Add focused unit tests for _convert_openai_response_to_mcp_result,
_convert_mcp_tools_to_openai, _convert_mcp_tool_choice_to_openai, image
and audio content conversion, the hint-matching and fallback branches of
_resolve_model_from_preferences, _build_completion_kwargs, the router and
guardrail-rejection paths of _run_guardrails_and_call_llm, the
handle_sampling_create_message success and error-propagation flows, the
marker-hoisting fallback for tool content on unexpected roles, and the
elicitation form/url/generic relay together with its decline paths
---------
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: lengkejun <lengkejun@xd.com>
Co-authored-by: Yug <yugborana000@gmail.com>
Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com>
Co-authored-by: tanmay958 <53569547+tanmay958@users.noreply.github.com>
Co-authored-by: DrishnaTrivedi <142084770+DrishnaTrivedi@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Navnit Shukla <Navnit.shukla25@gmail.com>
Co-authored-by: PRABHU KIRAN VANDRANKI <72809214+VANDRANKI@users.noreply.github.com>
Co-authored-by: Adrian Lopez <109683617+adriangomez24@users.noreply.github.com>
Co-authored-by: hcl <chenglunhu@gmail.com>
Co-authored-by: JooHo Lee <96564470+BWAAEEEK@users.noreply.github.com>
Co-authored-by: Dinesh Girbide <85330597+Dinesh-Girbide@users.noreply.github.com>
Co-authored-by: cloudwiz <22098246+andrey-dubnik@users.noreply.github.com>
Co-authored-by: Ahmad Khan <ahmadkhan2508@gmail.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
|
@Sameerlite do you know when this changes is schedule for release? We can't upgrade without it in-place as Bedrock no longer works in LiteLLM. |
|
v1.89.0-rc.1 has the change |
Relevant issues
Fixes #29618
A Router/proxy Bedrock deployment whose
model_info.base_modelis a friendly label (e.g.claude-haiku-4-5) silently losttoolsandtool_choice. The outgoing Bedrock Converse request was built withouttoolConfig, so the model behaved as if no tools were provided. This worked in v1.84.0 and regressed in v1.85.0; withdrop_params: trueit failed silently rather than surfacing an errorLinear ticket
Root cause
Two changes compound into the bug.
completion()passedmodel_info.base_modelas themodelargument toget_optional_params(introduced in #27720), so the real Bedrock model id never reached supported-param resolution. Separately,get_supported_openai_paramsresolved the provider config's params frombase_model or model(#28582), letting the label fully replace the real model. For Bedrock the label resolves to no tool support (Bedrock gates tools on recognizing the Converse model id), sotools/tool_choicewere stripped before transformation and never reachedtoolConfig. Azure was unaffected because its deployment names are opaque, sobase_modelis the only signal it hasFix
completion()now keepsmodelas the real deployment model and threads the resolvedbase_model(the kwarg, falling back tomodel_info) through separately.get_supported_openai_paramstreatsbase_modelas additive: it returns the union of the params supported bymodeland bybase_model. A hint can only add capabilities, never strip ones the real model already exposes. That keeps the originalbase_modelbehavior from #27717 (a registered base_model addingreasoning_effort/thinkingfor an unregistered model) and Azure'sbase_modeldriven model-type detection working, while restoring Bedrock tool supportPre-Submission checklist
make test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
Verified against a live proxy hitting real Bedrock (real, billable calls)
The fix: a friendly
base_modellabel no longer strips toolsThis is the configuration from the issue, a Bedrock deployment whose
model_info.base_modelis the labelclaude-haiku-4-5, withdrop_params: trueResponse on this branch:
{ "model": "claude-haiku-4-5", "finish_reason": "tool_calls", "tool_calls": [ { "index": 0, "function": { "arguments": "{\"city\": \"Copenhagen\"}", "name": "get_weather" }, "id": "tooluse_bBi2LpRSJZixywecGRb5MV", "type": "function" } ], "usage": { "completion_tokens": 54, "prompt_tokens": 564, "total_tokens": 618 } }The model selects
get_weatherwith{"city": "Copenhagen"}, sotools/tool_choicereach Bedrock Converse. On the pre-fix code the same request drops both params before transformation, so notoolConfigis sent and the model cannot call the toolNo regression for the common pattern (no
base_model)The same model registered the usual way, without
model_info.base_model, which was never affected but is worth confirming stays correctSame curl, response on this branch:
{ "model": "claude-haiku-4-5", "finish_reason": "tool_calls", "tool_calls": [ { "index": 0, "function": { "arguments": "{\"city\": \"Copenhagen\"}", "name": "get_weather" }, "id": "tooluse_OGbWS86RtufrlCBmBiZhC3", "type": "function" } ], "usage": { "completion_tokens": 54, "prompt_tokens": 564, "total_tokens": 618 } }Type
Bug Fix
Changes
litellm/main.py: incompletion(), resolvebase_modelfrom the kwarg ormodel_info, and stop overwriting themodelpassed toget_optional_paramswith thebase_modellabel so the real model reaches capability resolutionlitellm/litellm_core_utils/get_supported_openai_params.py: makebase_modeladditive for the provider-config path by returning the union of the params supported bymodeland bybase_model, instead of lettingbase_modelreplacemodelTests: a
get_supported_openai_paramssuite covering the union both directions (Bedrock label that would striptools, gemini base_model that addsreasoning_effort, Azure detection preserved, no-base_model and base_model==model unchanged); aget_optional_paramsregression assertingtools/tool_choicesurvivedrop_paramswith the label; and an update totest_completion_optional_params_base_modelto assert per case thatmodelstays the real deployment model whilebase_modelis threaded through as the additive hint