fix(sagemaker): send native Cohere embed payload to Cohere SageMaker endpoints#28613
Conversation
SageMaker embedding only special-cased Voyage; every other endpoint received
HuggingFace TGI `{"inputs": [...]}`. AWS Marketplace Cohere containers expect
the native Cohere embed payload (`texts`, `input_type`) and reject the HF
shape with `422 EmbedReqV2.inputs is of type string but should be of type
Object`.
Add `SagemakerCohereEmbeddingConfig` that reuses Bedrock/Cohere request and
response transforms, and route SageMaker endpoint names containing `cohere`
or a Cohere embed model fragment (`embed-multilingual`, `embed-english`,
`embed-v3`, `embed-v4`) to it. Supports `input_type`, `dimensions`, and
`encoding_format`. Voyage and HuggingFace SageMaker endpoints are unchanged.
Co-authored-by: Cursor <cursoragent@cursor.com>
…nventions - Detect Cohere SageMaker endpoints with a single `"cohere" in model.lower()` check, mirroring the existing Voyage branch instead of a separate helper function and marker constant. - Drop instance caches of sub-configs; instantiate `BedrockCohereEmbeddingConfig` / `CohereEmbeddingConfig` per call to match the existing pattern in `BedrockCohereEmbeddingConfig._transform_request`. - Match `SagemakerEmbeddingConfig`'s signatures, defaults, and `Any` typing for `logging_obj`; collapse the input-normalization helper inline. - Inline `transform_embedding_response` input lookup; no behavior change. Co-authored-by: Cursor <cursoragent@cursor.com>
Greptile SummaryThis PR fixes a
Confidence Score: 5/5The change is well-scoped and additive: new SageMaker Cohere routing does not affect existing Voyage or HF paths, and the Cohere v1 refactor is a pure extract-method with no logic change for existing callers. All three code paths changed (new routing, extract-method, utils.py param restoration) are covered by targeted unit tests. The double-post_call fix is verified with an explicit assertion. The utils.py loop correctly gates on supported_params, OPENAI_EMBEDDING_PARAMS exclusion, and param not in optional_params, making unintended side-effects on other providers unlikely. The utils.py fallback loop merits a second look to confirm it should read from special_params rather than the mutated passed_params dict.
|
| Filename | Overview |
|---|---|
| litellm/llms/sagemaker/embedding/cohere_transformation.py | New config class routing Cohere SageMaker endpoints to the correct {"texts": [...]} payload format; correctly delegates to BedrockCohereEmbeddingConfig for request building and avoids double post_call logging. |
| litellm/utils.py | Adds a post-map loop to restore non-OpenAI embedding params (e.g. input_type) that are stripped by embedding_pre_process_non_default_params; fix works correctly but relies on an implicit side-effect mutation of passed_params. |
| litellm/llms/cohere/embed/v1_transformation.py | Extract-method refactor splits _transform_response into _populate_embedding_response to allow SageMaker to reuse parse logic without triggering a second post_call; correctly removes dead token-count loop in the process. |
| litellm/llms/sagemaker/embedding/transformation.py | Factory routing updated to dispatch Cohere-named endpoints to the new config; Voyage and HF paths unchanged. |
| tests/test_litellm/llms/sagemaker/test_sagemaker_embedding_voyage.py | New test coverage for Cohere request/response transform, factory routing, input_type pass-through, and no-double-post_call invariant; all mocked, no real network calls. |
| litellm/llms/sagemaker/completion/handler.py | Docstring-only change updating supported endpoint types list. |
Reviews (3): Last reviewed commit: "fix(sagemaker): avoid double post_call o..." | Re-trigger Greptile
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Cohere input_type is advertised in get_supported_openai_params but was filtered out of non_default_params by OPENAI_EMBEDDING_PARAMS before map_openai_params ran. Merge supported params from passed_params after map (same path Greptile flagged). Handle input_type explicitly in SagemakerCohereEmbeddingConfig.map_openai_params and add an integration test through get_optional_params_embeddings. Co-authored-by: Cursor <cursoragent@cursor.com>
The post-map restore loop must skip OPENAI_EMBEDDING_PARAMS so mapped fields (e.g. dimensions -> output_dimension) are not duplicated under their OpenAI names. Align SageMaker embedding import order with sibling files and add a regression test for dimensions mapping. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@greptileai check again |
Greptile review on #28613 caught that `CohereEmbeddingConfig._transform_response` calls `logging_obj.post_call` internally. The SageMaker embedding handler already calls `post_call` once before invoking the transform, so the Cohere SageMaker path fired callbacks, cost calculators, and log handlers twice per request. Extract the parsing body of `_transform_response` into `_populate_embedding_response` (pure extract-method, no behavior change for existing Cohere direct or Bedrock Cohere paths, which keep calling `_transform_response`). Have `SagemakerCohereEmbeddingConfig` call the new helper directly so it parses the response without re-logging. Add a regression test asserting `logging_obj.post_call` is not invoked by the SageMaker Cohere transform. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Addressed the 4/5 Greptile concern in Bug: Why I couldn't just remove the inner Fix: Pure extract-method on
Verification:
@greptileai check again |
Relevant issues
Customer report: SageMaker + AWS Marketplace Cohere (
cohere.embed-multilingual-v3) fails via LiteLLM with422 EmbedReqV2.inputs is of type string but should be of type Object; same endpoint works with directboto3and Bedrock. Reproduced on LiteLLM1.81.15and1.84.0.Root cause:
SagemakerEmbeddingConfigonly special-cases Voyage; all other endpoints get HuggingFace TGI{"inputs": [...]}. Marketplace Cohere containers expect{"texts": [...], "input_type": "..."}.Fix: Add
SagemakerCohereEmbeddingConfig(reuses Bedrock/Cohere request + response transforms). Route when"cohere" in model.lower(), same pattern as Voyage. HF and Voyage SageMaker paths unchanged.Linear ticket
Fixes LIT-3289
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit(CI on PR)@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link: (fill after PR opened)
CI run for the last commit
Link: (fill after PR opened)
Merge / cherry-pick CI run
Links: (fill after PR opened)
Screenshots / Proof of Fix
What we validated
End-to-end
litellm.embedding()on the SageMaker provider path — the same code path the customer hits (SagemakerLLM.embedding→transform_embedding_request→boto3invoke_endpoint). Not a transform-only unit test in isolation.EmbeddingResponseMethod:
boto3.Sessionis patched soinvoke_endpointnever runs against AWS. The mock recordsBody, then either returns a minimal Cohereembeddingspayload or raisesClientErrorwith the customer's 422 text. Input:model="sagemaker/cohere-embed-multilingual-v3-prod",input=["hello"],input_type="search_query".Request body (customer failure mode)
{"inputs": ["hello"], "input_type": "search_query"}{"texts": ["hello"], "input_type": "search_query"}On staging, a mock that returns the customer's 422 reproduces
BadRequestErrorwithEmbedReqV2.inputs is of type string but should be of type Object— same message as their log.On this branch, the body is correct in both mock modes below; the 422 no longer reflects a LiteLLM formatting bug.
Mock scenarios (runtime)
A — Happy path: mock returns
{"embeddings": [[0.1, 0.2, 0.3]], "meta": {...}}. Call completes;data[0].embeddingpopulated. Confirms response transform works with the new request shape.B — Server error path: mock raises the exact 422 string from the ticket. LiteLLM still surfaces
BadRequestError, but the captured body is alreadytexts+input_type— i.e. we are not sendinginputsanymore; a real Cohere endpoint would not reject the payload for this reason.Factory routing (no boto3)
SagemakerEmbeddingConfig.get_model_config(endpoint_name)for representative names:cohere.embed-multilingual-v3SagemakerCohereEmbeddingConfigcohere-marketplace-endpointSagemakerCohereEmbeddingConfigCOHERE-EMBED-V4SagemakerCohereEmbeddingConfigvoyage-3-5-embeddingVoyageEmbeddingConfigsentence-transformers-mpnetSagemakerEmbeddingConfigembed-multilingual-v3-prodSagemakerEmbeddingConfig* Names without
coherestill use HFinputs; intentional — customer ticket used acohere.*endpoint name.Type
🐛 Bug Fix
Changes
litellm/llms/sagemaker/embedding/cohere_transformation.py— newSagemakerCohereEmbeddingConfiglitellm/llms/sagemaker/embedding/transformation.py— factory routes"cohere"endpoints to Cohere configlitellm/llms/sagemaker/completion/handler.py— docstring onlytests/test_litellm/llms/sagemaker/test_sagemaker_embedding_voyage.py— Cohere request/response + factory tests