Skip to content

feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput#28875

Merged
Sameerlite merged 8 commits into
BerriAI:litellm_010626from
OS-joaocastilho:feat/bedrock-batch-titan-embed-input-transform-v3
Jun 1, 2026
Merged

feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput#28875
Sameerlite merged 8 commits into
BerriAI:litellm_010626from
OS-joaocastilho:feat/bedrock-batch-titan-embed-input-transform-v3

Conversation

@OS-joaocastilho

Copy link
Copy Markdown

Relevant issues

Reopens the use case from #15506 (closed by stale-bot, no maintainer engagement). AWS Bedrock supports batch inference for amazon.titan-embed-text-v2:0 via CreateModelInvocationJob natively; the gap is only on the LiteLLM side, where BedrockFilesConfig._map_openai_to_bedrock_params had no embedding branch and an embedding JSONL would have silently been routed through the chat transformer.

Supersedes #28865 (Greptile reviewed the initial commits but went silent on the follow-up fix; fresh PR to reset the bot's review state). The code state is identical to where #28865 ended: both Greptile-flagged issues have already been addressed in this branch.

Changes vs litellm_oss_branch

  • litellm/llms/bedrock/files/transformation.py

    • BedrockFilesConfig._is_embedding_record static helper detects whether a JSONL line is an embedding request. Strict precedence: explicit url == \"/v1/embeddings\" -> embedding; any other non-empty url -> NOT embedding (trust the caller's signal); only fall back to body-shape (input present AND messages absent) when url is missing.
    • BedrockFilesConfig._is_titan_v2_embed_model layers a model_prices_and_context_window.json registry check (get_model_info(...).get(\"mode\") == \"embedding\") on top of a marker-boundary substring (titan-embed-text-v2 followed by :, /, or end-of-string). The registry alone isn't sufficient because mode == \"embedding\" is shared by Cohere Embed, Nova Multimodal, and Titan G1 - all with incompatible InvokeModel schemas. When the registry resolves the id we additionally require mode == \"embedding\" so a malformed id whose marker matches but whose registered mode is "chat" doesn't slip through. Registry silence (cross-region inference profile prefixes, Bedrock ARN forms) keeps the substring-only behavior because the registry can't normalize those ids today.
    • BedrockFilesConfig._lookup_registry_mode static helper isolates the defensive get_model_info try/except shape so it lives in one place. Returns the mode string only when the registry resolves the id AND the entry has a non-empty string mode; returns None for raise, non-dict, missing-mode, empty-mode, and non-string-mode cases.
    • BedrockFilesConfig._coerce_embedding_input_to_string static helper normalizes the OpenAI input field into the single string Bedrock Titan v2 expects in inputText. Accepts string or single-element list; rejects None, multi-element lists, pre-tokenized inputs (List[int], List[List[int]]), and other types with actionable error messages.
    • New _map_openai_embedding_to_bedrock_params helper builds the Bedrock InvokeModel body via the existing AmazonTitanV2Config._transform_request, mapping OpenAI dimensions and encoding_format through AmazonTitanV2Config.map_openai_params so this stays in sync with the synchronous /v1/embeddings path.
    • _transform_openai_jsonl_content_to_bedrock_jsonl_content now dispatches per record to either the chat or the embedding transformer; the chat helper keeps its narrow contract.
    • Other embedding models (Titan G1, Titan Multimodal, Cohere Embed, Nova Multimodal Embeddings) raise NotImplementedError with a clear message until they get dedicated branches in follow-up PRs.
  • tests/test_litellm/llms/bedrock/files/

    • New input_batch_embeddings.jsonl and expected_bedrock_batch_embeddings.jsonl fixtures.
    • TestBedrockFilesEmbeddingTransformation class adds 22 mocked tests: happy-path round-trip; simple string input; dimensions + encoding_format mapping; body-shape fallback; single-element list unwrap; error paths (missing input, multi-element list, unsupported model, pre-tokenized List[int] and List[List[int]]); mixed chat+embedding batch; model-id boundary check; registry-mode disagreement rejection; registry-mode confirmation; registry silence with marker match; isolated _lookup_registry_mode coverage; isolated _coerce_embedding_input_to_string coverage; _is_embedding_record helper coverage; ambiguous "both input and messages" routes to chat; explicit chat URL short-circuits to chat; other non-embedding URLs route to chat.

Companion PR

The endpoint-propagation side (BedrockBatchesConfig.transform_*_batch_response reporting endpoint=\"/v1/embeddings\" correctly for embedding batches) lives in #28867 (already at Greptile LGTM, ready for maintainer review). Splitting keeps each PR single-concern per the template.

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement
  • My PR passes all unit tests on make test-unit (33/33 in tests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py)
  • My PR's scope is as isolated as possible, it only solves 1 specific problem (input-side transform for Titan v2 embedding batches)
  • I will request @greptileai review after opening this PR

Type

New Feature

Terrajlz and others added 2 commits May 22, 2026 13:45
Squash-merged by litellm-agent from devauxbr's PR.
@OS-joaocastilho

Copy link
Copy Markdown
Author

@greptileai

@codecov

codecov Bot commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.57143% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/llms/bedrock/files/transformation.py 98.57% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@greptile-apps

greptile-apps Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR routes OpenAI /v1/embeddings JSONL records through a new Titan v2 transformer in BedrockFilesConfig, enabling AWS Bedrock batch inference (CreateModelInvocationJob) for amazon.titan-embed-text-v2:0. Model detection is registry-driven first (via the new provider_specific_entry.bedrock_invocation_schema field added to model_prices_and_context_window.json) with a boundary-checked substring fallback for ID forms the registry can't normalize (cross-region prefixes, ARNs).

  • _is_embedding_record routes per JSONL record using url as the authoritative signal, falling back to body-shape detection (input present + messages absent) only when url is absent.
  • _map_openai_embedding_to_bedrock_params delegates to the existing AmazonTitanV2Config._transform_request and map_openai_params so OpenAI dimensions/encoding_format stay in sync with the synchronous path; unsupported embedding models raise NotImplementedError cleanly.
  • 22 new unit tests cover happy paths, error cases, routing precedence, registry-mode confirmation/rejection, and substring-fallback boundaries — all mocked with no network calls.

Confidence Score: 5/5

Safe to merge — the change is additive, isolated to the batch JSONL transformation path, and doesn't touch the synchronous /v1/embeddings or chat completion flows.

All new helpers are pure-function, well-guarded against bad input, and covered by 22 targeted unit tests. The registry-first detection approach is sound: the new provider_specific_entry.bedrock_invocation_schema discriminator in the JSON is the authoritative signal, and the substring fallback is strictly last-resort for IDs the registry can't normalize. No existing code paths are modified beyond inserting the embedding branch in the per-record dispatch.

No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/bedrock/files/transformation.py Adds embedding routing: _is_embedding_record, _is_titan_v2_embed_model (registry-first, substring fallback), _lookup_provider_specific_field, _coerce_embedding_input_to_string, and _map_openai_embedding_to_bedrock_params; dispatches per-record to chat or embedding transformer.
model_prices_and_context_window.json Adds provider_specific_entry.bedrock_invocation_schema = "titan_v2" to the amazon.titan-embed-text-v2:0 entry; drives the registry-first detection path in _is_titan_v2_embed_model.
litellm/model_prices_and_context_window_backup.json Mirror of the same provider_specific_entry change as the primary JSON; keeps the backup file in sync.
tests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py Adds TestBedrockFilesEmbeddingTransformation with 22 unit tests covering happy paths, parameter mapping, routing logic, error cases, registry-driven detection, and substring-fallback boundaries; all mocked, no real network calls.
tests/test_litellm/llms/bedrock/files/input_batch_embeddings.jsonl New fixture with three representative Titan v2 embedding records (plain string, dimensions, base64 encoding_format) used by the round-trip fixture test.
tests/test_litellm/llms/bedrock/files/expected_bedrock_batch_embeddings.jsonl Expected Bedrock output fixture: inputText, dimensions, and embeddingTypes mappings for the three input records; used as the assertion baseline for the round-trip test.

Reviews (3): Last reviewed commit: "fix(bedrock-batch): drive Titan v2 detec..." | Re-trigger Greptile

Comment thread litellm/llms/bedrock/files/transformation.py
@greptile-apps

greptile-apps Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds OpenAI /v1/embeddings batch support for Amazon Titan Text Embeddings V2 on Bedrock's CreateModelInvocationJob API, routing embedding JSONL records through AmazonTitanV2Config._transform_request instead of the chat transformer.

  • _is_embedding_record dispatches per-record by checking url first (authoritative), then body shape as a fallback; _is_titan_v2_embed_model combines a registry check via get_model_info with a substring marker boundary test to distinguish Titan v2 from other embedding models with incompatible InvokeModel schemas.
  • _coerce_embedding_input_to_string normalises the OpenAI input field to the single inputText string Titan v2 expects, rejecting multi-element lists, pre-tokenized integers, and None with actionable errors.
  • 22 new mocked unit tests cover the round-trip fixture, all error paths, model-id boundary cases, registry-mode disagreement, and the mixed chat+embedding batch routing.

Confidence Score: 4/5

The change is additive and isolated to the Bedrock JSONL transformer; existing chat-completion batches are unaffected unless a record lacks a url field and carries input without messages.

The new embedding transformer is well-tested and the dispatch logic is correct. The main concern is _TITAN_V2_EMBED_MODEL_MARKER hardcoding "titan-embed-text-v2" directly in the source rather than in model_prices_and_context_window.json — this creates a maintenance liability when follow-up PRs add Titan G1, Cohere, and Nova Multimodal branches, each of which will need their own hardcoded marker. The empty-list input path also produces a misleading error message. Neither issue causes incorrect output in the happy path.

litellm/llms/bedrock/files/transformation.py — specifically _is_titan_v2_embed_model and _coerce_embedding_input_to_string

Important Files Changed

Filename Overview
litellm/llms/bedrock/files/transformation.py Adds embedding routing (Titan v2 only) to the Bedrock JSONL transformer; the hardcoded _TITAN_V2_EMBED_MODEL_MARKER constant partially violates the project rule against model-specific flags in code rather than the registry
tests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py Adds 22 mocked unit tests covering happy paths, edge cases, and error paths for the new embedding transformer; all tests are properly mocked with no real network calls
tests/test_litellm/llms/bedrock/files/expected_bedrock_batch_embeddings.jsonl New fixture with three expected Bedrock modelInput records covering string input, dimensions param, and binary encoding_format mapping
tests/test_litellm/llms/bedrock/files/input_batch_embeddings.jsonl New input fixture with three OpenAI-format embedding batch records; exercised by the round-trip fixture test

Comments Outside Diff (1)

  1. litellm/llms/bedrock/files/transformation.py, line 49-96 (link)

    P2 Hardcoded model marker violates project rule

    _TITAN_V2_EMBED_MODEL_MARKER = "titan-embed-text-v2" is a model-specific flag hardcoded in the codebase. Per the project rule, model-specific flags should live in model_prices_and_context_window.json and be read via get_model_info. The current design uses the registry only as a secondary confirmation layer, not as the authoritative source for which InvokeModel schema to use.

    The PR description explains that mode == "embedding" is shared by multiple incompatible schemas (Titan G1, Titan Multimodal, Cohere, Nova Multimodal), so the registry alone can't discriminate. The cleaner fix is to add a new field to the relevant entries in model_prices_and_context_window.json (e.g., "bedrock_invoke_schema": "titan_v2_embed") and look it up via get_model_info, which would let future models opt-in without touching _TITAN_V2_EMBED_MODEL_MARKER at all. As new embedding models are added (the PR explicitly plans follow-ups for Titan G1, Cohere, etc.), every new branch would require another hardcoded marker constant if the current pattern is kept.

    Rule Used: What: Do not hardcode model-specific flags in the ... (source)

Reviews (2): Last reviewed commit: "fix(bedrock-batch): layer registry mode ..." | Re-trigger Greptile

Comment thread litellm/llms/bedrock/files/transformation.py
@OS-joaocastilho

Copy link
Copy Markdown
Author

Addressed the remaining policy concern in a490a6a2c5:

_is_titan_v2_embed_model is now registry-driven. The Titan v2 entry in model_prices_and_context_window.json (and the bundled backup) gains a provider_specific_entry.bedrock_invocation_schema = \"titan_v2\" discriminator. The detector reads this first via get_model_info and trusts it as authoritative - registered ids with a different (or absent) schema value are rejected outright.

Used the existing provider_specific_entry escape hatch rather than introducing a brand-new top-level field; that field is already surfaced by get_model_info (ModelInfo.provider_specific_entry), so this change requires no modification to utils.py.

The substring fallback is retained ONLY for ids the registry can't normalize - cross-region inference profile prefixes (us.amazon.titan-embed-text-v2:0) and Bedrock ARN forms. get_model_info raises for those today; the fallback keeps them working without misclassifying lookalikes (the marker boundary check rejects titan-embed-text-v20 etc.).

Future embedding-schema variants (Titan G1, Cohere Embed, Nova Multimodal) can be added by:

  1. Adding their registry entry with their own bedrock_invocation_schema value
  2. Adding a dedicated transformer branch in _map_openai_embedding_to_bedrock_params

No code change to the detector required.

34/34 tests pass; integration check with LITELLM_LOCAL_MODEL_COST_MAP=true confirms the new field flows through get_model_info and all representative model ids classify correctly.

@greptileai

@greptile-apps

greptile-apps Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Want your agent to iterate on Greptile's feedback? Try greploops.

Changes vs main:
- BedrockFilesConfig now detects OpenAI batch JSONL lines whose `url`
  is /v1/embeddings (with body-shape fallback) and routes them through
  a new `_map_openai_embedding_to_bedrock_params` helper instead of the
  chat-completion transformer that silently produces an invalid body.
- The embedding helper currently supports Amazon Titan Text Embeddings V2
  only. Other embed models (Titan G1, Titan Multimodal, Cohere Embed,
  Nova Multimodal Embeddings) raise NotImplementedError with a clear
  message; each will get a dedicated branch + tests in follow-up PRs to
  keep schema-specific risks isolated.
- Validation refuses pre-tokenized inputs (List[int], List[List[int]])
  and multi-element string lists with explicit errors so callers emit
  one JSONL line per embedding instead of relying on us to fan out.
- Titan v2 model id match tolerates "bedrock/" prefix, cross-region
  inference profile prefix ("us.", "eu.", etc.), and ARN forms; the
  marker boundary check rejects lookalikes like "titan-embed-text-v20".
- Tests cover happy path (fixtures), dimensions/encoding_format mapping,
  body-shape fallback, single-element list unwrap, error paths
  (missing input, multi-element list, unsupported model, pre-tokenized),
  mixed chat+embedding batch, and the model-id boundary check.
Greptile-flagged gap in `_is_embedding_record`: when an OpenAI batch
JSONL line carries an explicit `url` pointing to a non-embedding
endpoint (e.g. `/v1/chat/completions`) AND its body happens to have
`input` without `messages`, the body-shape fallback would mis-route
that record to the embedding transformer and corrupt the modelInput.

Changes vs previous commit:
- `_is_embedding_record` now short-circuits to NOT-embedding whenever
  `url` is non-empty and not equal to `/v1/embeddings`. The body-shape
  fallback only runs when `url` is missing or empty. Docstring updated
  to spell out the precedence rules.
- Two new tests cover the case: direct helper assertion that an
  explicit chat url plus an input-bearing body returns False, plus an
  end-to-end check that the resulting modelInput contains no `inputText`
  key. A second test asserts the same short-circuit for arbitrary
  non-embeddings urls (`/v1/completions`, `/v1/responses`).

28/28 tests pass (was 26/26 before this commit + 2 new).
Splits the input-shape validation out of
`_map_openai_embedding_to_bedrock_params` into a new static helper
`_coerce_embedding_input_to_string`. Same semantics; the goal is to make
the validation testable in isolation and to give future
embedding-provider branches (Titan G1, Cohere) a reusable shaping
function instead of duplicating type checks.

- Helper accepts `str`, single-element `list[str]`, and raises
  `ValueError` / `NotImplementedError` with actionable messages for
  None, multi-element lists, pre-tokenized inputs (`list[int]` /
  `list[list[int]]`), and other unsupported types.
- New unit test exercises the helper directly across happy paths,
  None / missing input, multi-element string list, multi-element int
  list (caught as 'one input per JSONL record' since we can't
  disambiguate from 'multiple strings' without more context),
  pre-tokenized single-element list-of-list, single-element list of
  bare int, and dict input.

29/29 tests in the file still pass.
…param

Greptile-flagged issues on Titan v2 model detection:

1. Hardcoded substring without registry consultation conflicts with the
   project convention of treating `model_prices_and_context_window.json`
   as the source of truth for model capability flags.

   Fix: `_is_titan_v2_embed_model` now layers a registry mode check on
   top of the existing marker boundary. When `get_model_info` resolves
   the id, we additionally require `mode == "embedding"` so a malformed
   id whose path-component matches the marker but whose registered mode
   is "chat" doesn't slip through. Registry silence (cross-region
   inference profile prefixes like `us.amazon.titan-embed-text-v2:0`,
   ARN forms) keeps the substring-only behavior because the registry
   genuinely can't normalize those ids today. The substring is still
   needed because `mode == "embedding"` alone doesn't distinguish
   Titan v2's InvokeModel schema from Cohere Embed, Nova Multimodal,
   or Titan G1 - all also embedding mode, all with incompatible bodies.

2. `provider` parameter on `_map_openai_embedding_to_bedrock_params`
   was accepted but never used.

   Fix: dropped from the signature and the call site.

Also adds `_lookup_registry_mode` static helper (mirrors the one in the
sibling batches transformer) so the registry try/except shape lives in
one place instead of being inlined into the detector.

5 new tests pin the layered behavior:
- registry mode=chat overrides the marker match (rejected)
- registry mode=embedding + marker match (accepted)
- registry silent + marker match for cross-region and ARN ids (accepted)
- direct `_lookup_registry_mode` coverage across all return paths

33/33 tests in the file pass.
…pecific_entry

Addresses Greptile's remaining policy concern on PR BerriAI#28875: the hardcoded
`_TITAN_V2_EMBED_MODEL_MARKER` substring required a code change to
register new Titan v2 variants. Now the registry is the source of truth.

Changes:
- model_prices_and_context_window.json + model_prices_and_context_window_backup.json
  Adds `provider_specific_entry.bedrock_invocation_schema = "titan_v2"`
  to the `amazon.titan-embed-text-v2:0` entry. Uses the existing
  `provider_specific_entry` escape-hatch field (already surfaced by
  `get_model_info` via `ModelInfo.provider_specific_entry`) rather than
  introducing a new top-level field that would need a corresponding
  change in `utils.py::get_model_info` to flow through.

- BedrockFilesConfig._is_titan_v2_embed_model now reads
  `get_model_info(model).provider_specific_entry.bedrock_invocation_schema`
  first. When the registry resolves the id we trust the field;
  registered ids without (or with a different) schema value are rejected
  outright - no substring second-chance for registered ids. The
  substring fallback only runs when `get_model_info` raises, which
  covers cross-region inference profile prefixes
  (`us.amazon.titan-embed-text-v2:0`) and Bedrock ARN forms - neither of
  which the registry normalizes today.

- _lookup_registry_mode helper renamed to _lookup_provider_specific_field
  and generalized: takes a field name, reads
  `provider_specific_entry[field]` defensively. Future embed-schema
  branches (Titan G1, Cohere, Nova Multimodal) will share this helper.

- New module-level constants: _BEDROCK_INVOCATION_SCHEMA_FIELD names the
  registry key; _TITAN_V2_INVOCATION_SCHEMA names the value.

- _map_openai_embedding_to_bedrock_params: dropped the unused `provider`
  parameter (separate Greptile flag).

Tests updated for the new nested-field semantics. 34/34 tests pass; one
end-to-end integration check confirms that with
LITELLM_LOCAL_MODEL_COST_MAP=true, the registry returns the new field
and all 7 representative model ids classify correctly.
@OS-joaocastilho OS-joaocastilho force-pushed the feat/bedrock-batch-titan-embed-input-transform-v3 branch from a490a6a to 93d5382 Compare May 26, 2026 14:05
@OS-joaocastilho OS-joaocastilho changed the base branch from litellm_oss_branch to shin_agent_oss_staging_05_22_2026 May 26, 2026 14:06
Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026
base branch - the lint CI check on this PR fails on
`litellm/llms/base_llm/chat/transformation.py` even with zero changes
from our side. Applying the formatter's preferred line-break inside
`is_thinking_enabled` unblocks the lint check without touching the
method's logic.
OS-joaocastilho added a commit to OS-joaocastilho/litellm that referenced this pull request May 26, 2026
Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026
base branch - same fix as in PR BerriAI#28875 to unblock the lint CI check
on this PR.

@Sameerlite Sameerlite left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Sameerlite Sameerlite changed the base branch from shin_agent_oss_staging_05_22_2026 to litellm_010626 June 1, 2026 10:43
@Sameerlite Sameerlite merged commit fd55450 into BerriAI:litellm_010626 Jun 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants