feat(bedrock-batch): propagate embedding endpoint through batch responses by OS-joaocastilho · Pull Request #28867 · BerriAI/litellm

OS-joaocastilho · 2026-05-26T11:14:25Z

Relevant issues

Companion to #28865 (Titan v2 embedding JSONL input transform). That PR makes embedding JSONL records produce a valid Bedrock modelInput; this PR makes the resulting LiteLLMBatch object correctly report endpoint=\"/v1/embeddings\" to OpenAI clients that introspect it.

Reopens the use case from #15506. Supersedes #28863, which was opened against main (wrong base) and accumulated force-pushes / close-reopen churn while we fixed up the base + addressed Greptile feedback. This PR is the clean, rebased version against litellm_oss_branch.

Problem

BedrockBatchesConfig.transform_create_batch_response and transform_retrieve_batch_response both hardcoded endpoint=\"/v1/chat/completions\". For an embedding batch the create path could read original_batch_request.endpoint but defaulted to chat-completions when the caller didn't set it; the retrieve path had no fallback at all because it has no original request to read from.

Changes vs `litellm_oss_branch`

litellm/llms/bedrock/batches/transformation.py
- New static helper _lookup_registry_mode reads mode for a model id from model_prices_and_context_window.json via litellm.get_model_info. Returns the mode string when the registry knows the id and the entry has a non-empty string mode, else None. Catches the failure modes: registry raises, registry returns a non-dict, registry returns a dict without (or with empty / non-string) mode.
- New static helper _infer_openai_endpoint_from_model_id returns /v1/embeddings when the registry says mode == \"embedding\". Falls back to substring (\"embed\" in model_id.lower()) for ids the registry can't resolve - this catches cross-region inference profile prefixes (e.g. us.amazon.titan-embed-text-v2:0) and Bedrock ARN forms, neither of which get_model_info handles today. Otherwise defaults to /v1/chat/completions.
- transform_create_batch_response now prefers original_batch_request.endpoint from litellm_params and falls back to the heuristic instead of hardcoded chat-completions.
- transform_retrieve_batch_response replaces the hardcoded value with the same heuristic, fed from the caller-supplied model argument or the modelId AWS returns on GetModelInvocationJob.
tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py (new)
- 11 direct tests for _infer_openai_endpoint_from_model_id: Titan v2 text, Titan multimodal, Nova multimodal embeddings, Cohere Embed, ARN form, Anthropic chat, cross-region chat profile, Nova chat, None, empty, case-insensitive.
- 4 create-batch tests: explicit override wins, model-id fallback for embed, chat default when no signal, model=None with original_request.endpoint=chat.
- 4 retrieve-batch tests: embed via AWS modelId, chat via AWS modelId, caller model arg wins over AWS modelId, default-chat when neither present.
- 5 registry-driven tests: registry classifies known embed id, registry chat mode wins over substring "embed" in name, substring fallback used when registry raises, substring fallback chat default when registry raises, no mode field falls back to substring.
- 6 isolated _lookup_registry_mode tests: returns mode when resolves, returns None on raise / non-dict / missing mode / empty mode / non-string mode.

What this does NOT change

The actual batch processing path is unchanged. AWS Bedrock's CreateModelInvocationJob is endpoint-agnostic at the wire level: only the modelId and the JSONL modelInput schema vary. So this PR is metadata fidelity, not feature unlock; feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput #28865 is the schema fix.
The output-file transform (Bedrock modelOutput -> OpenAI batch embedding response shape) is intentionally not in this PR. It will be a follow-up so the schema-specific risk stays isolated.

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement
My PR passes all unit tests on make test-unit (30/30 in tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py plus 9 existing tests in test_batch_metadata_sanitization.py)
My PR's scope is as isolated as possible, it only solves 1 specific problem (endpoint propagation in batch response transformers)
I will request @greptileai review after opening this PR

Type

New Feature

OS-joaocastilho · 2026-05-26T11:15:01Z

@greptileai

greptile-apps · 2026-05-26T11:16:58Z

Greptile Summary

This PR fixes a metadata fidelity bug where both transform_create_batch_response and transform_retrieve_batch_response in BedrockBatchesConfig unconditionally reported endpoint=\"/v1/chat/completions\", mislabelling embedding batch jobs for OpenAI clients that introspect the returned LiteLLMBatch object.

Adds _lookup_registry_mode (reads mode from model_prices_and_context_window.json via get_model_info) and _infer_openai_endpoint_from_model_id (registry-first, substring fallback for ARNs and cross-region prefixes, chat-default) as isolated static helpers.
Updates the create path to prefer original_batch_request.endpoint from litellm_params and fall back to the new heuristic; updates the retrieve path to infer from the caller-supplied model arg or the modelId field AWS returns on GetModelInvocationJob.
Adds 30 new unit tests covering the helpers end-to-end, with all calls mocked — no real network traffic.

Confidence Score: 4/5

The change is narrowly scoped to endpoint metadata on batch response objects and does not touch any request-processing or auth paths; existing chat batch behaviour is preserved by the chat-default fallback.

Both transform methods now correctly resolve the endpoint field for embedding and chat batches. The two minor points are: the or short-circuit in the create path silently drops an explicitly-supplied empty-string endpoint (instead of the more explicit is not None check), and the _lookup_registry_mode type hint is str rather than Optional[str], which could confuse type checkers. Neither causes wrong behaviour with realistic inputs.

No files require special attention; both changed files are confined to the Bedrock batches transformation layer.

Important Files Changed

Filename	Overview
litellm/llms/bedrock/batches/transformation.py	Adds _lookup_registry_mode and _infer_openai_endpoint_from_model_id helpers; updates both transform_create_batch_response and transform_retrieve_batch_response to infer endpoint from model id instead of hardcoding chat completions. Logic is sound; one minor precision issue with the `or` operator for endpoint resolution.
tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py	New test file with 30 unit tests covering the inference helpers, create-batch and retrieve-batch response transformers, and registry-driven classification. All tests use mocks (no real network calls). Good coverage of edge cases.

_{Reviews (1): Last reviewed commit: "refactor(bedrock-batch): extract registr..." | Re-trigger Greptile}

greptile-apps · 2026-05-26T11:20:41Z

Greptile Summary

This PR fixes a metadata fidelity gap where transform_create_batch_response and transform_retrieve_batch_response both hardcoded /v1/chat/completions as the batch endpoint, causing embedding batch jobs to be misreported to OpenAI clients that inspect the LiteLLMBatch.endpoint field.

Introduces _lookup_registry_mode (reads mode from model_prices_and_context_window.json via get_model_info) and _infer_openai_endpoint_from_model_id (registry-first, substring fallback, default-to-chat) as static helpers on BedrockBatchesConfig.
transform_create_batch_response now prefers litellm_params["original_batch_request"]["endpoint"] and falls back to the heuristic; transform_retrieve_batch_response always uses the heuristic since no original request is available at poll time.
Adds 30 new unit tests (no network calls) covering all resolution paths including registry, substring fallback, ARN forms, and cross-region inference profiles.

Confidence Score: 4/5

Safe to merge — the change is isolated to endpoint metadata on batch response objects and does not affect the AWS API call path, request payloads, or any auth logic.

The two new helpers carry minor robustness gaps: the registry lookup omits an explicit provider hint, and the mode dispatch defaults all non-embedding registry modes to chat. Neither causes a regression on any current Bedrock model.

litellm/llms/bedrock/batches/transformation.py — the _lookup_registry_mode and _infer_openai_endpoint_from_model_id helpers warrant a second look for the custom_llm_provider omission and the open-ended non-embedding mode fallback.

Important Files Changed

Filename	Overview
litellm/llms/bedrock/batches/transformation.py	Replaces hardcoded /v1/chat/completions in both batch response transformers with a two-tier resolver: registry lookup via get_model_info then substring fallback on 'embed'. Logic is sound; minor concern around missing explicit custom_llm_provider on the registry call and non-exhaustive mode handling.
tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py	New test file with 30 well-structured tests covering direct unit, integration, registry-driven, and create/retrieve end-to-end paths. No network calls; all mock-dependent tests use pytest-mock correctly.

_{Reviews (2): Last reviewed commit: "refactor(bedrock-batch): extract registr..." | Re-trigger Greptile}

OS-joaocastilho · 2026-05-26T11:24:55Z

Addressed the precision nit in ee155c0b5d: replaced the or short-circuit in transform_create_batch_response with an explicit is not None check so a caller passing endpoint=\"\" keeps that explicit signal instead of being silently overridden by the model-id heuristic. New test pins the empty-string preservation. 31/31 tests pass.

On the other Greptile note about _lookup_registry_mode's type hint - just to confirm, the return type is already declared Optional[str] (line 313 of the source). Happy to revisit if there's a specific stricter narrowing you'd prefer.

@greptileai

greptile-apps · 2026-05-26T11:25:26Z

Both points are well-addressed:

Empty-string preservation fix (ee155c0b) — the is not None guard is correct. explicit_endpoint if explicit_endpoint is not None else self._infer_openai_endpoint_from_model_id(model) properly preserves "" as an explicit caller signal while only triggering the heuristic when the key is genuinely absent from original_batch_request. The logic is sound.
_lookup_registry_mode return type — confirmed at line 313: -> Optional[str]. The original observation was already satisfied; nothing to change.

The resolution order in _infer_openai_endpoint_from_model_id (registry → "embed" substring → chat default) is correct and the docstring explains exactly why each fallback exists. Both concerns are fully resolved — LGTM.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

…nses Changes vs main: - BedrockBatchesConfig.transform_create_batch_response now prefers the endpoint set on `litellm_params["original_batch_request"]` and falls back to a model-id heuristic when none is present, instead of unconditionally defaulting to `/v1/chat/completions`. - BedrockBatchesConfig.transform_retrieve_batch_response replaces the hardcoded `/v1/chat/completions` with the same heuristic, fed from the caller-provided `model` argument or the `modelId` AWS returns on GetModelInvocationJob. This is necessary because the retrieve path has no original request to read from. - New static helper `_infer_openai_endpoint_from_model_id` returns `/v1/embeddings` when the lowercased model id contains "embed" (covers Titan v2 text, Titan multimodal, Cohere embed, Nova multimodal embeddings, ARN forms thereof), else `/v1/chat/completions`. Best-effort by design: chat models without "embed" in the name keep the same endpoint they reported before. - 19 new mocked unit tests across direct helper coverage, create-batch endpoint resolution (explicit override wins, model-id fallback, no-signal default), and retrieve-batch resolution (model arg wins, AWS modelId fallback, no-id default). 28 existing batch tests pass unchanged.

Greptile-flagged convention: model-capability flags should come from `model_prices_and_context_window.json` (via `litellm.get_model_info`), not from hardcoded substring matches. Every current Bedrock embedding model is registered there with `"mode": "embedding"`, so future models get classified automatically once their entry lands. Changes vs previous commit: - `_infer_openai_endpoint_from_model_id` now consults `get_model_info` first. If the registry returns `mode == "embedding"`, route to `/v1/embeddings`. If it returns any other mode, route to `/v1/chat/completions` (registry wins even if the model id name contains "embed"). - If `get_model_info` raises (cross-region inference profile ids like `us.amazon.titan-embed-text-v2:0`, Bedrock ARN forms, or unreleased models the registry can't normalize) or returns no mode, we fall back to the previous substring heuristic. This keeps the existing cross-region and ARN test cases passing without regression. - 5 new tests pin the layered behavior: registry classifies known embed ids; registry chat mode wins over substring "embed" in the name; substring fallback engages when registry raises (with and without embed marker); empty-info dict falls through to substring path. 33 batch tests pass (24 in this file + 9 in test_batch_metadata_sanitization).

…lper Splits `_lookup_registry_mode` out of `_infer_openai_endpoint_from_model_id` so the registry-vs-fallback split is unit-testable in isolation and so future call sites (e.g. cost calculation, billing routing) can reuse the same defensive lookup without duplicating the try/except shape. - New `_lookup_registry_mode` returns the registry's `mode` string when `get_model_info` resolves and the entry has a non-empty string mode, else `None`. Catches the three real-world failure modes: registry raises, registry returns a non-dict, registry returns a dict without (or with empty / non-string) `mode`. - `_infer_openai_endpoint_from_model_id` now reads from the helper and the dispatch is linear: registry-says-embedding -> embeddings, registry-says-anything-else -> chat, registry-silent -> substring fallback -> chat default. - 6 new isolated tests pin the helper's behavior: returns mode for known ids, returns None on each of the four failure modes (raise, non-dict, missing mode, empty string mode, non-string mode). 30/30 tests in the file still pass.

…check Greptile-flagged precision issue in transform_create_batch_response: the previous `or` short-circuit silently replaced an explicit empty-string endpoint with the model-id heuristic, dropping the caller's signal. Changes: - Replace `original_request.get("endpoint") or heuristic` with an explicit `is not None` check so only a genuinely-missing endpoint key falls through to the heuristic. - New test pins the empty-string preservation behavior: passing `endpoint=""` on `original_batch_request` results in `LiteLLMBatch.endpoint == ""` instead of `/v1/embeddings`.

codecov · 2026-05-26T14:13:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026 base branch - same fix as in PR BerriAI#28875 to unblock the lint CI check on this PR.

OS-joaocastilho mentioned this pull request May 26, 2026

feat(bedrock-batch): propagate embedding endpoint through batch responses #28863

Closed

4 tasks

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/llms/bedrock/batches/transformation.py Outdated

Comment thread litellm/llms/bedrock/batches/transformation.py

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/llms/bedrock/batches/transformation.py

Comment thread litellm/llms/bedrock/batches/transformation.py

OS-joaocastilho mentioned this pull request May 26, 2026

feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput #28875

Merged

4 tasks

OS-joaocastilho added 4 commits May 26, 2026 15:05

OS-joaocastilho force-pushed the feat/bedrock-batch-embedding-endpoint-propagation-v2 branch from ee155c0 to ab6127f Compare May 26, 2026 14:06

OS-joaocastilho changed the base branch from litellm_oss_branch to shin_agent_oss_staging_05_22_2026 May 26, 2026 14:06

chore(lint): apply Black formatting to is_thinking_enabled

d245479

Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026 base branch - same fix as in PR BerriAI#28875 to unblock the lint CI check on this PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(bedrock-batch): propagate embedding endpoint through batch responses#28867

feat(bedrock-batch): propagate embedding endpoint through batch responses#28867
OS-joaocastilho wants to merge 5 commits into
BerriAI:shin_agent_oss_staging_05_22_2026from
OS-joaocastilho:feat/bedrock-batch-embedding-endpoint-propagation-v2

OS-joaocastilho commented May 26, 2026

Uh oh!

OS-joaocastilho commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 26, 2026

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

OS-joaocastilho commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

OS-joaocastilho commented May 26, 2026

Relevant issues

Problem

Changes vs litellm_oss_branch

What this does NOT change

Pre-Submission checklist

Type

Uh oh!

OS-joaocastilho commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 26, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

OS-joaocastilho commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Changes vs `litellm_oss_branch`