feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput by OS-joaocastilho · Pull Request #28875 · BerriAI/litellm

OS-joaocastilho · 2026-05-26T13:15:20Z

Relevant issues

Reopens the use case from #15506 (closed by stale-bot, no maintainer engagement). AWS Bedrock supports batch inference for amazon.titan-embed-text-v2:0 via CreateModelInvocationJob natively; the gap is only on the LiteLLM side, where BedrockFilesConfig._map_openai_to_bedrock_params had no embedding branch and an embedding JSONL would have silently been routed through the chat transformer.

Supersedes #28865 (Greptile reviewed the initial commits but went silent on the follow-up fix; fresh PR to reset the bot's review state). The code state is identical to where #28865 ended: both Greptile-flagged issues have already been addressed in this branch.

Changes vs `litellm_oss_branch`

litellm/llms/bedrock/files/transformation.py
- BedrockFilesConfig._is_embedding_record static helper detects whether a JSONL line is an embedding request. Strict precedence: explicit url == \"/v1/embeddings\" -> embedding; any other non-empty url -> NOT embedding (trust the caller's signal); only fall back to body-shape (input present AND messages absent) when url is missing.
- BedrockFilesConfig._is_titan_v2_embed_model layers a model_prices_and_context_window.json registry check (get_model_info(...).get(\"mode\") == \"embedding\") on top of a marker-boundary substring (titan-embed-text-v2 followed by :, /, or end-of-string). The registry alone isn't sufficient because mode == \"embedding\" is shared by Cohere Embed, Nova Multimodal, and Titan G1 - all with incompatible InvokeModel schemas. When the registry resolves the id we additionally require mode == \"embedding\" so a malformed id whose marker matches but whose registered mode is "chat" doesn't slip through. Registry silence (cross-region inference profile prefixes, Bedrock ARN forms) keeps the substring-only behavior because the registry can't normalize those ids today.
- BedrockFilesConfig._lookup_registry_mode static helper isolates the defensive get_model_info try/except shape so it lives in one place. Returns the mode string only when the registry resolves the id AND the entry has a non-empty string mode; returns None for raise, non-dict, missing-mode, empty-mode, and non-string-mode cases.
- BedrockFilesConfig._coerce_embedding_input_to_string static helper normalizes the OpenAI input field into the single string Bedrock Titan v2 expects in inputText. Accepts string or single-element list; rejects None, multi-element lists, pre-tokenized inputs (List[int], List[List[int]]), and other types with actionable error messages.
- New _map_openai_embedding_to_bedrock_params helper builds the Bedrock InvokeModel body via the existing AmazonTitanV2Config._transform_request, mapping OpenAI dimensions and encoding_format through AmazonTitanV2Config.map_openai_params so this stays in sync with the synchronous /v1/embeddings path.
- _transform_openai_jsonl_content_to_bedrock_jsonl_content now dispatches per record to either the chat or the embedding transformer; the chat helper keeps its narrow contract.
- Other embedding models (Titan G1, Titan Multimodal, Cohere Embed, Nova Multimodal Embeddings) raise NotImplementedError with a clear message until they get dedicated branches in follow-up PRs.
tests/test_litellm/llms/bedrock/files/
- New input_batch_embeddings.jsonl and expected_bedrock_batch_embeddings.jsonl fixtures.
- TestBedrockFilesEmbeddingTransformation class adds 22 mocked tests: happy-path round-trip; simple string input; dimensions + encoding_format mapping; body-shape fallback; single-element list unwrap; error paths (missing input, multi-element list, unsupported model, pre-tokenized List[int] and List[List[int]]); mixed chat+embedding batch; model-id boundary check; registry-mode disagreement rejection; registry-mode confirmation; registry silence with marker match; isolated _lookup_registry_mode coverage; isolated _coerce_embedding_input_to_string coverage; _is_embedding_record helper coverage; ambiguous "both input and messages" routes to chat; explicit chat URL short-circuits to chat; other non-embedding URLs route to chat.

Companion PR

The endpoint-propagation side (BedrockBatchesConfig.transform_*_batch_response reporting endpoint=\"/v1/embeddings\" correctly for embedding batches) lives in #28867 (already at Greptile LGTM, ready for maintainer review). Splitting keeps each PR single-concern per the template.

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement
My PR passes all unit tests on make test-unit (33/33 in tests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py)
My PR's scope is as isolated as possible, it only solves 1 specific problem (input-side transform for Titan v2 embedding batches)
I will request @greptileai review after opening this PR

Type

New Feature

…riAI#28598) Squash-merged by litellm-agent from Terrajlz's PR.

Squash-merged by litellm-agent from devauxbr's PR.

OS-joaocastilho · 2026-05-26T13:15:41Z

@greptileai

codecov · 2026-05-26T13:19:09Z

Codecov Report

❌ Patch coverage is 98.57143% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/llms/bedrock/files/transformation.py	98.57%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-26T13:19:32Z

Greptile Summary

This PR routes OpenAI /v1/embeddings JSONL records through a new Titan v2 transformer in BedrockFilesConfig, enabling AWS Bedrock batch inference (CreateModelInvocationJob) for amazon.titan-embed-text-v2:0. Model detection is registry-driven first (via the new provider_specific_entry.bedrock_invocation_schema field added to model_prices_and_context_window.json) with a boundary-checked substring fallback for ID forms the registry can't normalize (cross-region prefixes, ARNs).

_is_embedding_record routes per JSONL record using url as the authoritative signal, falling back to body-shape detection (input present + messages absent) only when url is absent.
_map_openai_embedding_to_bedrock_params delegates to the existing AmazonTitanV2Config._transform_request and map_openai_params so OpenAI dimensions/encoding_format stay in sync with the synchronous path; unsupported embedding models raise NotImplementedError cleanly.
22 new unit tests cover happy paths, error cases, routing precedence, registry-mode confirmation/rejection, and substring-fallback boundaries — all mocked with no network calls.

Confidence Score: 5/5

Safe to merge — the change is additive, isolated to the batch JSONL transformation path, and doesn't touch the synchronous /v1/embeddings or chat completion flows.

All new helpers are pure-function, well-guarded against bad input, and covered by 22 targeted unit tests. The registry-first detection approach is sound: the new provider_specific_entry.bedrock_invocation_schema discriminator in the JSON is the authoritative signal, and the substring fallback is strictly last-resort for IDs the registry can't normalize. No existing code paths are modified beyond inserting the embedding branch in the per-record dispatch.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/llms/bedrock/files/transformation.py	Adds embedding routing: `_is_embedding_record`, `_is_titan_v2_embed_model` (registry-first, substring fallback), `_lookup_provider_specific_field`, `_coerce_embedding_input_to_string`, and `_map_openai_embedding_to_bedrock_params`; dispatches per-record to chat or embedding transformer.
model_prices_and_context_window.json	Adds `provider_specific_entry.bedrock_invocation_schema = "titan_v2"` to the `amazon.titan-embed-text-v2:0` entry; drives the registry-first detection path in `_is_titan_v2_embed_model`.
litellm/model_prices_and_context_window_backup.json	Mirror of the same `provider_specific_entry` change as the primary JSON; keeps the backup file in sync.
tests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py	Adds `TestBedrockFilesEmbeddingTransformation` with 22 unit tests covering happy paths, parameter mapping, routing logic, error cases, registry-driven detection, and substring-fallback boundaries; all mocked, no real network calls.
tests/test_litellm/llms/bedrock/files/input_batch_embeddings.jsonl	New fixture with three representative Titan v2 embedding records (plain string, dimensions, base64 encoding_format) used by the round-trip fixture test.
tests/test_litellm/llms/bedrock/files/expected_bedrock_batch_embeddings.jsonl	Expected Bedrock output fixture: `inputText`, `dimensions`, and `embeddingTypes` mappings for the three input records; used as the assertion baseline for the round-trip test.

_{Reviews (3): Last reviewed commit: "fix(bedrock-batch): drive Titan v2 detec..." | Re-trigger Greptile}

greptile-apps · 2026-05-26T13:20:29Z

Greptile Summary

This PR adds OpenAI /v1/embeddings batch support for Amazon Titan Text Embeddings V2 on Bedrock's CreateModelInvocationJob API, routing embedding JSONL records through AmazonTitanV2Config._transform_request instead of the chat transformer.

_is_embedding_record dispatches per-record by checking url first (authoritative), then body shape as a fallback; _is_titan_v2_embed_model combines a registry check via get_model_info with a substring marker boundary test to distinguish Titan v2 from other embedding models with incompatible InvokeModel schemas.
_coerce_embedding_input_to_string normalises the OpenAI input field to the single inputText string Titan v2 expects, rejecting multi-element lists, pre-tokenized integers, and None with actionable errors.
22 new mocked unit tests cover the round-trip fixture, all error paths, model-id boundary cases, registry-mode disagreement, and the mixed chat+embedding batch routing.

Confidence Score: 4/5

The change is additive and isolated to the Bedrock JSONL transformer; existing chat-completion batches are unaffected unless a record lacks a url field and carries input without messages.

The new embedding transformer is well-tested and the dispatch logic is correct. The main concern is _TITAN_V2_EMBED_MODEL_MARKER hardcoding "titan-embed-text-v2" directly in the source rather than in model_prices_and_context_window.json — this creates a maintenance liability when follow-up PRs add Titan G1, Cohere, and Nova Multimodal branches, each of which will need their own hardcoded marker. The empty-list input path also produces a misleading error message. Neither issue causes incorrect output in the happy path.

litellm/llms/bedrock/files/transformation.py — specifically _is_titan_v2_embed_model and _coerce_embedding_input_to_string

Important Files Changed

Filename	Overview
litellm/llms/bedrock/files/transformation.py	Adds embedding routing (Titan v2 only) to the Bedrock JSONL transformer; the hardcoded _TITAN_V2_EMBED_MODEL_MARKER constant partially violates the project rule against model-specific flags in code rather than the registry
tests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py	Adds 22 mocked unit tests covering happy paths, edge cases, and error paths for the new embedding transformer; all tests are properly mocked with no real network calls
tests/test_litellm/llms/bedrock/files/expected_bedrock_batch_embeddings.jsonl	New fixture with three expected Bedrock modelInput records covering string input, dimensions param, and binary encoding_format mapping
tests/test_litellm/llms/bedrock/files/input_batch_embeddings.jsonl	New input fixture with three OpenAI-format embedding batch records; exercised by the round-trip fixture test

Comments Outside Diff (1)

litellm/llms/bedrock/files/transformation.py, line 49-96 (link)

Hardcoded model marker violates project rule

_TITAN_V2_EMBED_MODEL_MARKER = "titan-embed-text-v2" is a model-specific flag hardcoded in the codebase. Per the project rule, model-specific flags should live in model_prices_and_context_window.json and be read via get_model_info. The current design uses the registry only as a secondary confirmation layer, not as the authoritative source for which InvokeModel schema to use.

The PR description explains that mode == "embedding" is shared by multiple incompatible schemas (Titan G1, Titan Multimodal, Cohere, Nova Multimodal), so the registry alone can't discriminate. The cleaner fix is to add a new field to the relevant entries in model_prices_and_context_window.json (e.g., "bedrock_invoke_schema": "titan_v2_embed") and look it up via get_model_info, which would let future models opt-in without touching _TITAN_V2_EMBED_MODEL_MARKER at all. As new embedding models are added (the PR explicitly plans follow-ups for Titan G1, Cohere, etc.), every new branch would require another hardcoded marker constant if the current pattern is kept.

Rule Used: What: Do not hardcode model-specific flags in the ... (source)

_{Reviews (2): Last reviewed commit: "fix(bedrock-batch): layer registry mode ..." | Re-trigger Greptile}

OS-joaocastilho · 2026-05-26T13:42:57Z

Addressed the remaining policy concern in a490a6a2c5:

_is_titan_v2_embed_model is now registry-driven. The Titan v2 entry in model_prices_and_context_window.json (and the bundled backup) gains a provider_specific_entry.bedrock_invocation_schema = \"titan_v2\" discriminator. The detector reads this first via get_model_info and trusts it as authoritative - registered ids with a different (or absent) schema value are rejected outright.

Used the existing provider_specific_entry escape hatch rather than introducing a brand-new top-level field; that field is already surfaced by get_model_info (ModelInfo.provider_specific_entry), so this change requires no modification to utils.py.

The substring fallback is retained ONLY for ids the registry can't normalize - cross-region inference profile prefixes (us.amazon.titan-embed-text-v2:0) and Bedrock ARN forms. get_model_info raises for those today; the fallback keeps them working without misclassifying lookalikes (the marker boundary check rejects titan-embed-text-v20 etc.).

Future embedding-schema variants (Titan G1, Cohere Embed, Nova Multimodal) can be added by:

Adding their registry entry with their own bedrock_invocation_schema value
Adding a dedicated transformer branch in _map_openai_embedding_to_bedrock_params

No code change to the detector required.

34/34 tests pass; integration check with LITELLM_LOCAL_MODEL_COST_MAP=true confirms the new field flows through get_model_info and all representative model ids classify correctly.

@greptileai

greptile-apps · 2026-05-26T13:48:29Z

Want your agent to iterate on Greptile's feedback? Try greploops.

Changes vs main: - BedrockFilesConfig now detects OpenAI batch JSONL lines whose `url` is /v1/embeddings (with body-shape fallback) and routes them through a new `_map_openai_embedding_to_bedrock_params` helper instead of the chat-completion transformer that silently produces an invalid body. - The embedding helper currently supports Amazon Titan Text Embeddings V2 only. Other embed models (Titan G1, Titan Multimodal, Cohere Embed, Nova Multimodal Embeddings) raise NotImplementedError with a clear message; each will get a dedicated branch + tests in follow-up PRs to keep schema-specific risks isolated. - Validation refuses pre-tokenized inputs (List[int], List[List[int]]) and multi-element string lists with explicit errors so callers emit one JSONL line per embedding instead of relying on us to fan out. - Titan v2 model id match tolerates "bedrock/" prefix, cross-region inference profile prefix ("us.", "eu.", etc.), and ARN forms; the marker boundary check rejects lookalikes like "titan-embed-text-v20". - Tests cover happy path (fixtures), dimensions/encoding_format mapping, body-shape fallback, single-element list unwrap, error paths (missing input, multi-element list, unsupported model, pre-tokenized), mixed chat+embedding batch, and the model-id boundary check.

Greptile-flagged gap in `_is_embedding_record`: when an OpenAI batch JSONL line carries an explicit `url` pointing to a non-embedding endpoint (e.g. `/v1/chat/completions`) AND its body happens to have `input` without `messages`, the body-shape fallback would mis-route that record to the embedding transformer and corrupt the modelInput. Changes vs previous commit: - `_is_embedding_record` now short-circuits to NOT-embedding whenever `url` is non-empty and not equal to `/v1/embeddings`. The body-shape fallback only runs when `url` is missing or empty. Docstring updated to spell out the precedence rules. - Two new tests cover the case: direct helper assertion that an explicit chat url plus an input-bearing body returns False, plus an end-to-end check that the resulting modelInput contains no `inputText` key. A second test asserts the same short-circuit for arbitrary non-embeddings urls (`/v1/completions`, `/v1/responses`). 28/28 tests pass (was 26/26 before this commit + 2 new).

Splits the input-shape validation out of `_map_openai_embedding_to_bedrock_params` into a new static helper `_coerce_embedding_input_to_string`. Same semantics; the goal is to make the validation testable in isolation and to give future embedding-provider branches (Titan G1, Cohere) a reusable shaping function instead of duplicating type checks. - Helper accepts `str`, single-element `list[str]`, and raises `ValueError` / `NotImplementedError` with actionable messages for None, multi-element lists, pre-tokenized inputs (`list[int]` / `list[list[int]]`), and other unsupported types. - New unit test exercises the helper directly across happy paths, None / missing input, multi-element string list, multi-element int list (caught as 'one input per JSONL record' since we can't disambiguate from 'multiple strings' without more context), pre-tokenized single-element list-of-list, single-element list of bare int, and dict input. 29/29 tests in the file still pass.

…param Greptile-flagged issues on Titan v2 model detection: 1. Hardcoded substring without registry consultation conflicts with the project convention of treating `model_prices_and_context_window.json` as the source of truth for model capability flags. Fix: `_is_titan_v2_embed_model` now layers a registry mode check on top of the existing marker boundary. When `get_model_info` resolves the id, we additionally require `mode == "embedding"` so a malformed id whose path-component matches the marker but whose registered mode is "chat" doesn't slip through. Registry silence (cross-region inference profile prefixes like `us.amazon.titan-embed-text-v2:0`, ARN forms) keeps the substring-only behavior because the registry genuinely can't normalize those ids today. The substring is still needed because `mode == "embedding"` alone doesn't distinguish Titan v2's InvokeModel schema from Cohere Embed, Nova Multimodal, or Titan G1 - all also embedding mode, all with incompatible bodies. 2. `provider` parameter on `_map_openai_embedding_to_bedrock_params` was accepted but never used. Fix: dropped from the signature and the call site. Also adds `_lookup_registry_mode` static helper (mirrors the one in the sibling batches transformer) so the registry try/except shape lives in one place instead of being inlined into the detector. 5 new tests pin the layered behavior: - registry mode=chat overrides the marker match (rejected) - registry mode=embedding + marker match (accepted) - registry silent + marker match for cross-region and ARN ids (accepted) - direct `_lookup_registry_mode` coverage across all return paths 33/33 tests in the file pass.

…pecific_entry Addresses Greptile's remaining policy concern on PR BerriAI#28875: the hardcoded `_TITAN_V2_EMBED_MODEL_MARKER` substring required a code change to register new Titan v2 variants. Now the registry is the source of truth. Changes: - model_prices_and_context_window.json + model_prices_and_context_window_backup.json Adds `provider_specific_entry.bedrock_invocation_schema = "titan_v2"` to the `amazon.titan-embed-text-v2:0` entry. Uses the existing `provider_specific_entry` escape-hatch field (already surfaced by `get_model_info` via `ModelInfo.provider_specific_entry`) rather than introducing a new top-level field that would need a corresponding change in `utils.py::get_model_info` to flow through. - BedrockFilesConfig._is_titan_v2_embed_model now reads `get_model_info(model).provider_specific_entry.bedrock_invocation_schema` first. When the registry resolves the id we trust the field; registered ids without (or with a different) schema value are rejected outright - no substring second-chance for registered ids. The substring fallback only runs when `get_model_info` raises, which covers cross-region inference profile prefixes (`us.amazon.titan-embed-text-v2:0`) and Bedrock ARN forms - neither of which the registry normalizes today. - _lookup_registry_mode helper renamed to _lookup_provider_specific_field and generalized: takes a field name, reads `provider_specific_entry[field]` defensively. Future embed-schema branches (Titan G1, Cohere, Nova Multimodal) will share this helper. - New module-level constants: _BEDROCK_INVOCATION_SCHEMA_FIELD names the registry key; _TITAN_V2_INVOCATION_SCHEMA names the value. - _map_openai_embedding_to_bedrock_params: dropped the unused `provider` parameter (separate Greptile flag). Tests updated for the new nested-field semantics. 34/34 tests pass; one end-to-end integration check confirms that with LITELLM_LOCAL_MODEL_COST_MAP=true, the registry returns the new field and all 7 representative model ids classify correctly.

Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026 base branch - the lint CI check on this PR fails on `litellm/llms/base_llm/chat/transformation.py` even with zero changes from our side. Applying the formatter's preferred line-break inside `is_thinking_enabled` unblocks the lint check without touching the method's logic.

Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026 base branch - same fix as in PR BerriAI#28875 to unblock the lint CI check on this PR.

Sameerlite

LGTM

Terrajlz and others added 2 commits May 22, 2026 13:45

fix(thinking): handle None thinking param in is_thinking_enabled (Ber…

a5939c8

…riAI#28598) Squash-merged by litellm-agent from Terrajlz's PR.

feat(helm): support tpl rendering in podAnnotations (BerriAI#28609)

2c144fc

Squash-merged by litellm-agent from devauxbr's PR.

OS-joaocastilho mentioned this pull request May 26, 2026

feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput #28865

Closed

4 tasks

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/llms/bedrock/files/transformation.py

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/llms/bedrock/files/transformation.py

OS-joaocastilho added 5 commits May 26, 2026 15:05

OS-joaocastilho force-pushed the feat/bedrock-batch-titan-embed-input-transform-v3 branch from a490a6a to 93d5382 Compare May 26, 2026 14:05

OS-joaocastilho changed the base branch from litellm_oss_branch to shin_agent_oss_staging_05_22_2026 May 26, 2026 14:06

Sameerlite approved these changes Jun 1, 2026

View reviewed changes

Sameerlite changed the base branch from shin_agent_oss_staging_05_22_2026 to litellm_010626 June 1, 2026 10:43

Sameerlite merged commit fd55450 into BerriAI:litellm_010626 Jun 1, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput#28875

feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput#28875
Sameerlite merged 8 commits into
BerriAI:litellm_010626from
OS-joaocastilho:feat/bedrock-batch-titan-embed-input-transform-v3

OS-joaocastilho commented May 26, 2026

Uh oh!

OS-joaocastilho commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 26, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

greptile-apps Bot commented May 26, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

OS-joaocastilho commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

Sameerlite left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

OS-joaocastilho commented May 26, 2026

Relevant issues

Changes vs litellm_oss_branch

Companion PR

Pre-Submission checklist

Type

Uh oh!

OS-joaocastilho commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

greptile-apps Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

OS-joaocastilho commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

Sameerlite left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Changes vs `litellm_oss_branch`

codecov Bot commented May 26, 2026 •

edited

Loading

greptile-apps Bot commented May 26, 2026 •

edited

Loading

greptile-apps Bot commented May 26, 2026 •

edited

Loading