feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput#28875
Conversation
…riAI#28598) Squash-merged by litellm-agent from Terrajlz's PR.
Squash-merged by litellm-agent from devauxbr's PR.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR routes OpenAI
Confidence Score: 5/5Safe to merge — the change is additive, isolated to the batch JSONL transformation path, and doesn't touch the synchronous /v1/embeddings or chat completion flows. All new helpers are pure-function, well-guarded against bad input, and covered by 22 targeted unit tests. The registry-first detection approach is sound: the new No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/llms/bedrock/files/transformation.py | Adds embedding routing: _is_embedding_record, _is_titan_v2_embed_model (registry-first, substring fallback), _lookup_provider_specific_field, _coerce_embedding_input_to_string, and _map_openai_embedding_to_bedrock_params; dispatches per-record to chat or embedding transformer. |
| model_prices_and_context_window.json | Adds provider_specific_entry.bedrock_invocation_schema = "titan_v2" to the amazon.titan-embed-text-v2:0 entry; drives the registry-first detection path in _is_titan_v2_embed_model. |
| litellm/model_prices_and_context_window_backup.json | Mirror of the same provider_specific_entry change as the primary JSON; keeps the backup file in sync. |
| tests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py | Adds TestBedrockFilesEmbeddingTransformation with 22 unit tests covering happy paths, parameter mapping, routing logic, error cases, registry-driven detection, and substring-fallback boundaries; all mocked, no real network calls. |
| tests/test_litellm/llms/bedrock/files/input_batch_embeddings.jsonl | New fixture with three representative Titan v2 embedding records (plain string, dimensions, base64 encoding_format) used by the round-trip fixture test. |
| tests/test_litellm/llms/bedrock/files/expected_bedrock_batch_embeddings.jsonl | Expected Bedrock output fixture: inputText, dimensions, and embeddingTypes mappings for the three input records; used as the assertion baseline for the round-trip test. |
Reviews (3): Last reviewed commit: "fix(bedrock-batch): drive Titan v2 detec..." | Re-trigger Greptile
Greptile SummaryThis PR adds OpenAI
Confidence Score: 4/5The change is additive and isolated to the Bedrock JSONL transformer; existing chat-completion batches are unaffected unless a record lacks a The new embedding transformer is well-tested and the dispatch logic is correct. The main concern is litellm/llms/bedrock/files/transformation.py — specifically _is_titan_v2_embed_model and _coerce_embedding_input_to_string
|
| Filename | Overview |
|---|---|
| litellm/llms/bedrock/files/transformation.py | Adds embedding routing (Titan v2 only) to the Bedrock JSONL transformer; the hardcoded _TITAN_V2_EMBED_MODEL_MARKER constant partially violates the project rule against model-specific flags in code rather than the registry |
| tests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py | Adds 22 mocked unit tests covering happy paths, edge cases, and error paths for the new embedding transformer; all tests are properly mocked with no real network calls |
| tests/test_litellm/llms/bedrock/files/expected_bedrock_batch_embeddings.jsonl | New fixture with three expected Bedrock modelInput records covering string input, dimensions param, and binary encoding_format mapping |
| tests/test_litellm/llms/bedrock/files/input_batch_embeddings.jsonl | New input fixture with three OpenAI-format embedding batch records; exercised by the round-trip fixture test |
Comments Outside Diff (1)
-
litellm/llms/bedrock/files/transformation.py, line 49-96 (link)Hardcoded model marker violates project rule
_TITAN_V2_EMBED_MODEL_MARKER = "titan-embed-text-v2"is a model-specific flag hardcoded in the codebase. Per the project rule, model-specific flags should live inmodel_prices_and_context_window.jsonand be read viaget_model_info. The current design uses the registry only as a secondary confirmation layer, not as the authoritative source for which InvokeModel schema to use.The PR description explains that
mode == "embedding"is shared by multiple incompatible schemas (Titan G1, Titan Multimodal, Cohere, Nova Multimodal), so the registry alone can't discriminate. The cleaner fix is to add a new field to the relevant entries inmodel_prices_and_context_window.json(e.g.,"bedrock_invoke_schema": "titan_v2_embed") and look it up viaget_model_info, which would let future models opt-in without touching_TITAN_V2_EMBED_MODEL_MARKERat all. As new embedding models are added (the PR explicitly plans follow-ups for Titan G1, Cohere, etc.), every new branch would require another hardcoded marker constant if the current pattern is kept.Rule Used: What: Do not hardcode model-specific flags in the ... (source)
Reviews (2): Last reviewed commit: "fix(bedrock-batch): layer registry mode ..." | Re-trigger Greptile
|
Addressed the remaining policy concern in a490a6a2c5:
Used the existing The substring fallback is retained ONLY for ids the registry can't normalize - cross-region inference profile prefixes ( Future embedding-schema variants (Titan G1, Cohere Embed, Nova Multimodal) can be added by:
No code change to the detector required. 34/34 tests pass; integration check with |
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
Changes vs main:
- BedrockFilesConfig now detects OpenAI batch JSONL lines whose `url`
is /v1/embeddings (with body-shape fallback) and routes them through
a new `_map_openai_embedding_to_bedrock_params` helper instead of the
chat-completion transformer that silently produces an invalid body.
- The embedding helper currently supports Amazon Titan Text Embeddings V2
only. Other embed models (Titan G1, Titan Multimodal, Cohere Embed,
Nova Multimodal Embeddings) raise NotImplementedError with a clear
message; each will get a dedicated branch + tests in follow-up PRs to
keep schema-specific risks isolated.
- Validation refuses pre-tokenized inputs (List[int], List[List[int]])
and multi-element string lists with explicit errors so callers emit
one JSONL line per embedding instead of relying on us to fan out.
- Titan v2 model id match tolerates "bedrock/" prefix, cross-region
inference profile prefix ("us.", "eu.", etc.), and ARN forms; the
marker boundary check rejects lookalikes like "titan-embed-text-v20".
- Tests cover happy path (fixtures), dimensions/encoding_format mapping,
body-shape fallback, single-element list unwrap, error paths
(missing input, multi-element list, unsupported model, pre-tokenized),
mixed chat+embedding batch, and the model-id boundary check.
Greptile-flagged gap in `_is_embedding_record`: when an OpenAI batch JSONL line carries an explicit `url` pointing to a non-embedding endpoint (e.g. `/v1/chat/completions`) AND its body happens to have `input` without `messages`, the body-shape fallback would mis-route that record to the embedding transformer and corrupt the modelInput. Changes vs previous commit: - `_is_embedding_record` now short-circuits to NOT-embedding whenever `url` is non-empty and not equal to `/v1/embeddings`. The body-shape fallback only runs when `url` is missing or empty. Docstring updated to spell out the precedence rules. - Two new tests cover the case: direct helper assertion that an explicit chat url plus an input-bearing body returns False, plus an end-to-end check that the resulting modelInput contains no `inputText` key. A second test asserts the same short-circuit for arbitrary non-embeddings urls (`/v1/completions`, `/v1/responses`). 28/28 tests pass (was 26/26 before this commit + 2 new).
Splits the input-shape validation out of `_map_openai_embedding_to_bedrock_params` into a new static helper `_coerce_embedding_input_to_string`. Same semantics; the goal is to make the validation testable in isolation and to give future embedding-provider branches (Titan G1, Cohere) a reusable shaping function instead of duplicating type checks. - Helper accepts `str`, single-element `list[str]`, and raises `ValueError` / `NotImplementedError` with actionable messages for None, multi-element lists, pre-tokenized inputs (`list[int]` / `list[list[int]]`), and other unsupported types. - New unit test exercises the helper directly across happy paths, None / missing input, multi-element string list, multi-element int list (caught as 'one input per JSONL record' since we can't disambiguate from 'multiple strings' without more context), pre-tokenized single-element list-of-list, single-element list of bare int, and dict input. 29/29 tests in the file still pass.
…param Greptile-flagged issues on Titan v2 model detection: 1. Hardcoded substring without registry consultation conflicts with the project convention of treating `model_prices_and_context_window.json` as the source of truth for model capability flags. Fix: `_is_titan_v2_embed_model` now layers a registry mode check on top of the existing marker boundary. When `get_model_info` resolves the id, we additionally require `mode == "embedding"` so a malformed id whose path-component matches the marker but whose registered mode is "chat" doesn't slip through. Registry silence (cross-region inference profile prefixes like `us.amazon.titan-embed-text-v2:0`, ARN forms) keeps the substring-only behavior because the registry genuinely can't normalize those ids today. The substring is still needed because `mode == "embedding"` alone doesn't distinguish Titan v2's InvokeModel schema from Cohere Embed, Nova Multimodal, or Titan G1 - all also embedding mode, all with incompatible bodies. 2. `provider` parameter on `_map_openai_embedding_to_bedrock_params` was accepted but never used. Fix: dropped from the signature and the call site. Also adds `_lookup_registry_mode` static helper (mirrors the one in the sibling batches transformer) so the registry try/except shape lives in one place instead of being inlined into the detector. 5 new tests pin the layered behavior: - registry mode=chat overrides the marker match (rejected) - registry mode=embedding + marker match (accepted) - registry silent + marker match for cross-region and ARN ids (accepted) - direct `_lookup_registry_mode` coverage across all return paths 33/33 tests in the file pass.
…pecific_entry Addresses Greptile's remaining policy concern on PR BerriAI#28875: the hardcoded `_TITAN_V2_EMBED_MODEL_MARKER` substring required a code change to register new Titan v2 variants. Now the registry is the source of truth. Changes: - model_prices_and_context_window.json + model_prices_and_context_window_backup.json Adds `provider_specific_entry.bedrock_invocation_schema = "titan_v2"` to the `amazon.titan-embed-text-v2:0` entry. Uses the existing `provider_specific_entry` escape-hatch field (already surfaced by `get_model_info` via `ModelInfo.provider_specific_entry`) rather than introducing a new top-level field that would need a corresponding change in `utils.py::get_model_info` to flow through. - BedrockFilesConfig._is_titan_v2_embed_model now reads `get_model_info(model).provider_specific_entry.bedrock_invocation_schema` first. When the registry resolves the id we trust the field; registered ids without (or with a different) schema value are rejected outright - no substring second-chance for registered ids. The substring fallback only runs when `get_model_info` raises, which covers cross-region inference profile prefixes (`us.amazon.titan-embed-text-v2:0`) and Bedrock ARN forms - neither of which the registry normalizes today. - _lookup_registry_mode helper renamed to _lookup_provider_specific_field and generalized: takes a field name, reads `provider_specific_entry[field]` defensively. Future embed-schema branches (Titan G1, Cohere, Nova Multimodal) will share this helper. - New module-level constants: _BEDROCK_INVOCATION_SCHEMA_FIELD names the registry key; _TITAN_V2_INVOCATION_SCHEMA names the value. - _map_openai_embedding_to_bedrock_params: dropped the unused `provider` parameter (separate Greptile flag). Tests updated for the new nested-field semantics. 34/34 tests pass; one end-to-end integration check confirms that with LITELLM_LOCAL_MODEL_COST_MAP=true, the registry returns the new field and all 7 representative model ids classify correctly.
a490a6a to
93d5382
Compare
Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026 base branch - the lint CI check on this PR fails on `litellm/llms/base_llm/chat/transformation.py` even with zero changes from our side. Applying the formatter's preferred line-break inside `is_thinking_enabled` unblocks the lint check without touching the method's logic.
Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026 base branch - same fix as in PR BerriAI#28875 to unblock the lint CI check on this PR.
Relevant issues
Reopens the use case from #15506 (closed by stale-bot, no maintainer engagement). AWS Bedrock supports batch inference for
amazon.titan-embed-text-v2:0viaCreateModelInvocationJobnatively; the gap is only on the LiteLLM side, whereBedrockFilesConfig._map_openai_to_bedrock_paramshad no embedding branch and an embedding JSONL would have silently been routed through the chat transformer.Supersedes #28865 (Greptile reviewed the initial commits but went silent on the follow-up fix; fresh PR to reset the bot's review state). The code state is identical to where #28865 ended: both Greptile-flagged issues have already been addressed in this branch.
Changes vs
litellm_oss_branchlitellm/llms/bedrock/files/transformation.pyBedrockFilesConfig._is_embedding_recordstatic helper detects whether a JSONL line is an embedding request. Strict precedence: expliciturl == \"/v1/embeddings\"-> embedding; any other non-emptyurl-> NOT embedding (trust the caller's signal); only fall back to body-shape (inputpresent ANDmessagesabsent) whenurlis missing.BedrockFilesConfig._is_titan_v2_embed_modellayers amodel_prices_and_context_window.jsonregistry check (get_model_info(...).get(\"mode\") == \"embedding\") on top of a marker-boundary substring (titan-embed-text-v2followed by:,/, or end-of-string). The registry alone isn't sufficient becausemode == \"embedding\"is shared by Cohere Embed, Nova Multimodal, and Titan G1 - all with incompatible InvokeModel schemas. When the registry resolves the id we additionally requiremode == \"embedding\"so a malformed id whose marker matches but whose registered mode is "chat" doesn't slip through. Registry silence (cross-region inference profile prefixes, Bedrock ARN forms) keeps the substring-only behavior because the registry can't normalize those ids today.BedrockFilesConfig._lookup_registry_modestatic helper isolates the defensiveget_model_infotry/except shape so it lives in one place. Returns the mode string only when the registry resolves the id AND the entry has a non-empty string mode; returnsNonefor raise, non-dict, missing-mode, empty-mode, and non-string-mode cases.BedrockFilesConfig._coerce_embedding_input_to_stringstatic helper normalizes the OpenAIinputfield into the single string Bedrock Titan v2 expects ininputText. Accepts string or single-element list; rejects None, multi-element lists, pre-tokenized inputs (List[int],List[List[int]]), and other types with actionable error messages._map_openai_embedding_to_bedrock_paramshelper builds the Bedrock InvokeModel body via the existingAmazonTitanV2Config._transform_request, mapping OpenAIdimensionsandencoding_formatthroughAmazonTitanV2Config.map_openai_paramsso this stays in sync with the synchronous/v1/embeddingspath._transform_openai_jsonl_content_to_bedrock_jsonl_contentnow dispatches per record to either the chat or the embedding transformer; the chat helper keeps its narrow contract.NotImplementedErrorwith a clear message until they get dedicated branches in follow-up PRs.tests/test_litellm/llms/bedrock/files/input_batch_embeddings.jsonlandexpected_bedrock_batch_embeddings.jsonlfixtures.TestBedrockFilesEmbeddingTransformationclass adds 22 mocked tests: happy-path round-trip; simple string input; dimensions + encoding_format mapping; body-shape fallback; single-element list unwrap; error paths (missinginput, multi-element list, unsupported model, pre-tokenized List[int] and List[List[int]]); mixed chat+embedding batch; model-id boundary check; registry-mode disagreement rejection; registry-mode confirmation; registry silence with marker match; isolated_lookup_registry_modecoverage; isolated_coerce_embedding_input_to_stringcoverage;_is_embedding_recordhelper coverage; ambiguous "both input and messages" routes to chat; explicit chat URL short-circuits to chat; other non-embedding URLs route to chat.Companion PR
The endpoint-propagation side (
BedrockBatchesConfig.transform_*_batch_responsereportingendpoint=\"/v1/embeddings\"correctly for embedding batches) lives in #28867 (already at Greptile LGTM, ready for maintainer review). Splitting keeps each PR single-concern per the template.Pre-Submission checklist
tests/test_litellm/directory, Adding at least 1 test is a hard requirementmake test-unit(33/33 intests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py)@greptileaireview after opening this PRType
New Feature