Skip to content

feat(bedrock-batch): propagate embedding endpoint through batch responses#28867

Open
OS-joaocastilho wants to merge 5 commits into
BerriAI:shin_agent_oss_staging_05_22_2026from
OS-joaocastilho:feat/bedrock-batch-embedding-endpoint-propagation-v2
Open

feat(bedrock-batch): propagate embedding endpoint through batch responses#28867
OS-joaocastilho wants to merge 5 commits into
BerriAI:shin_agent_oss_staging_05_22_2026from
OS-joaocastilho:feat/bedrock-batch-embedding-endpoint-propagation-v2

Conversation

@OS-joaocastilho

Copy link
Copy Markdown

Relevant issues

Companion to #28865 (Titan v2 embedding JSONL input transform). That PR makes embedding JSONL records produce a valid Bedrock modelInput; this PR makes the resulting LiteLLMBatch object correctly report endpoint=\"/v1/embeddings\" to OpenAI clients that introspect it.

Reopens the use case from #15506. Supersedes #28863, which was opened against main (wrong base) and accumulated force-pushes / close-reopen churn while we fixed up the base + addressed Greptile feedback. This PR is the clean, rebased version against litellm_oss_branch.

Problem

BedrockBatchesConfig.transform_create_batch_response and transform_retrieve_batch_response both hardcoded endpoint=\"/v1/chat/completions\". For an embedding batch the create path could read original_batch_request.endpoint but defaulted to chat-completions when the caller didn't set it; the retrieve path had no fallback at all because it has no original request to read from.

Changes vs litellm_oss_branch

  • litellm/llms/bedrock/batches/transformation.py

    • New static helper _lookup_registry_mode reads mode for a model id from model_prices_and_context_window.json via litellm.get_model_info. Returns the mode string when the registry knows the id and the entry has a non-empty string mode, else None. Catches the failure modes: registry raises, registry returns a non-dict, registry returns a dict without (or with empty / non-string) mode.
    • New static helper _infer_openai_endpoint_from_model_id returns /v1/embeddings when the registry says mode == \"embedding\". Falls back to substring (\"embed\" in model_id.lower()) for ids the registry can't resolve - this catches cross-region inference profile prefixes (e.g. us.amazon.titan-embed-text-v2:0) and Bedrock ARN forms, neither of which get_model_info handles today. Otherwise defaults to /v1/chat/completions.
    • transform_create_batch_response now prefers original_batch_request.endpoint from litellm_params and falls back to the heuristic instead of hardcoded chat-completions.
    • transform_retrieve_batch_response replaces the hardcoded value with the same heuristic, fed from the caller-supplied model argument or the modelId AWS returns on GetModelInvocationJob.
  • tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py (new)

    • 11 direct tests for _infer_openai_endpoint_from_model_id: Titan v2 text, Titan multimodal, Nova multimodal embeddings, Cohere Embed, ARN form, Anthropic chat, cross-region chat profile, Nova chat, None, empty, case-insensitive.
    • 4 create-batch tests: explicit override wins, model-id fallback for embed, chat default when no signal, model=None with original_request.endpoint=chat.
    • 4 retrieve-batch tests: embed via AWS modelId, chat via AWS modelId, caller model arg wins over AWS modelId, default-chat when neither present.
    • 5 registry-driven tests: registry classifies known embed id, registry chat mode wins over substring "embed" in name, substring fallback used when registry raises, substring fallback chat default when registry raises, no mode field falls back to substring.
    • 6 isolated _lookup_registry_mode tests: returns mode when resolves, returns None on raise / non-dict / missing mode / empty mode / non-string mode.

What this does NOT change

  • The actual batch processing path is unchanged. AWS Bedrock's CreateModelInvocationJob is endpoint-agnostic at the wire level: only the modelId and the JSONL modelInput schema vary. So this PR is metadata fidelity, not feature unlock; feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput #28865 is the schema fix.
  • The output-file transform (Bedrock modelOutput -> OpenAI batch embedding response shape) is intentionally not in this PR. It will be a follow-up so the schema-specific risk stays isolated.

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement
  • My PR passes all unit tests on make test-unit (30/30 in tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py plus 9 existing tests in test_batch_metadata_sanitization.py)
  • My PR's scope is as isolated as possible, it only solves 1 specific problem (endpoint propagation in batch response transformers)
  • I will request @greptileai review after opening this PR

Type

New Feature

@OS-joaocastilho

Copy link
Copy Markdown
Author

@greptileai

@greptile-apps

greptile-apps Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a metadata fidelity bug where both transform_create_batch_response and transform_retrieve_batch_response in BedrockBatchesConfig unconditionally reported endpoint=\"/v1/chat/completions\", mislabelling embedding batch jobs for OpenAI clients that introspect the returned LiteLLMBatch object.

  • Adds _lookup_registry_mode (reads mode from model_prices_and_context_window.json via get_model_info) and _infer_openai_endpoint_from_model_id (registry-first, substring fallback for ARNs and cross-region prefixes, chat-default) as isolated static helpers.
  • Updates the create path to prefer original_batch_request.endpoint from litellm_params and fall back to the new heuristic; updates the retrieve path to infer from the caller-supplied model arg or the modelId field AWS returns on GetModelInvocationJob.
  • Adds 30 new unit tests covering the helpers end-to-end, with all calls mocked — no real network traffic.

Confidence Score: 4/5

The change is narrowly scoped to endpoint metadata on batch response objects and does not touch any request-processing or auth paths; existing chat batch behaviour is preserved by the chat-default fallback.

Both transform methods now correctly resolve the endpoint field for embedding and chat batches. The two minor points are: the or short-circuit in the create path silently drops an explicitly-supplied empty-string endpoint (instead of the more explicit is not None check), and the _lookup_registry_mode type hint is str rather than Optional[str], which could confuse type checkers. Neither causes wrong behaviour with realistic inputs.

No files require special attention; both changed files are confined to the Bedrock batches transformation layer.

Important Files Changed

Filename Overview
litellm/llms/bedrock/batches/transformation.py Adds _lookup_registry_mode and _infer_openai_endpoint_from_model_id helpers; updates both transform_create_batch_response and transform_retrieve_batch_response to infer endpoint from model id instead of hardcoding chat completions. Logic is sound; one minor precision issue with the or operator for endpoint resolution.
tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py New test file with 30 unit tests covering the inference helpers, create-batch and retrieve-batch response transformers, and registry-driven classification. All tests use mocks (no real network calls). Good coverage of edge cases.

Reviews (1): Last reviewed commit: "refactor(bedrock-batch): extract registr..." | Re-trigger Greptile

Comment thread litellm/llms/bedrock/batches/transformation.py Outdated
Comment thread litellm/llms/bedrock/batches/transformation.py
@greptile-apps

greptile-apps Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a metadata fidelity gap where transform_create_batch_response and transform_retrieve_batch_response both hardcoded /v1/chat/completions as the batch endpoint, causing embedding batch jobs to be misreported to OpenAI clients that inspect the LiteLLMBatch.endpoint field.

  • Introduces _lookup_registry_mode (reads mode from model_prices_and_context_window.json via get_model_info) and _infer_openai_endpoint_from_model_id (registry-first, substring fallback, default-to-chat) as static helpers on BedrockBatchesConfig.
  • transform_create_batch_response now prefers litellm_params["original_batch_request"]["endpoint"] and falls back to the heuristic; transform_retrieve_batch_response always uses the heuristic since no original request is available at poll time.
  • Adds 30 new unit tests (no network calls) covering all resolution paths including registry, substring fallback, ARN forms, and cross-region inference profiles.

Confidence Score: 4/5

Safe to merge — the change is isolated to endpoint metadata on batch response objects and does not affect the AWS API call path, request payloads, or any auth logic.

The two new helpers carry minor robustness gaps: the registry lookup omits an explicit provider hint, and the mode dispatch defaults all non-embedding registry modes to chat. Neither causes a regression on any current Bedrock model.

litellm/llms/bedrock/batches/transformation.py — the _lookup_registry_mode and _infer_openai_endpoint_from_model_id helpers warrant a second look for the custom_llm_provider omission and the open-ended non-embedding mode fallback.

Important Files Changed

Filename Overview
litellm/llms/bedrock/batches/transformation.py Replaces hardcoded /v1/chat/completions in both batch response transformers with a two-tier resolver: registry lookup via get_model_info then substring fallback on 'embed'. Logic is sound; minor concern around missing explicit custom_llm_provider on the registry call and non-exhaustive mode handling.
tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py New test file with 30 well-structured tests covering direct unit, integration, registry-driven, and create/retrieve end-to-end paths. No network calls; all mock-dependent tests use pytest-mock correctly.

Reviews (2): Last reviewed commit: "refactor(bedrock-batch): extract registr..." | Re-trigger Greptile

Comment thread litellm/llms/bedrock/batches/transformation.py
Comment thread litellm/llms/bedrock/batches/transformation.py
@OS-joaocastilho

Copy link
Copy Markdown
Author

Addressed the precision nit in ee155c0b5d: replaced the or short-circuit in transform_create_batch_response with an explicit is not None check so a caller passing endpoint=\"\" keeps that explicit signal instead of being silently overridden by the model-id heuristic. New test pins the empty-string preservation. 31/31 tests pass.

On the other Greptile note about _lookup_registry_mode's type hint - just to confirm, the return type is already declared Optional[str] (line 313 of the source). Happy to revisit if there's a specific stricter narrowing you'd prefer.

@greptileai

@greptile-apps

greptile-apps Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Both points are well-addressed:

  1. Empty-string preservation fix (ee155c0b) — the is not None guard is correct. explicit_endpoint if explicit_endpoint is not None else self._infer_openai_endpoint_from_model_id(model) properly preserves "" as an explicit caller signal while only triggering the heuristic when the key is genuinely absent from original_batch_request. The logic is sound.

  2. _lookup_registry_mode return type — confirmed at line 313: -> Optional[str]. The original observation was already satisfied; nothing to change.

The resolution order in _infer_openai_endpoint_from_model_id (registry → "embed" substring → chat default) is correct and the docstring explains exactly why each fallback exists. Both concerns are fully resolved — LGTM.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

…nses

Changes vs main:
- BedrockBatchesConfig.transform_create_batch_response now prefers the
  endpoint set on `litellm_params["original_batch_request"]` and falls
  back to a model-id heuristic when none is present, instead of
  unconditionally defaulting to `/v1/chat/completions`.
- BedrockBatchesConfig.transform_retrieve_batch_response replaces the
  hardcoded `/v1/chat/completions` with the same heuristic, fed from the
  caller-provided `model` argument or the `modelId` AWS returns on
  GetModelInvocationJob. This is necessary because the retrieve path
  has no original request to read from.
- New static helper `_infer_openai_endpoint_from_model_id` returns
  `/v1/embeddings` when the lowercased model id contains "embed"
  (covers Titan v2 text, Titan multimodal, Cohere embed, Nova
  multimodal embeddings, ARN forms thereof), else
  `/v1/chat/completions`. Best-effort by design: chat models without
  "embed" in the name keep the same endpoint they reported before.
- 19 new mocked unit tests across direct helper coverage, create-batch
  endpoint resolution (explicit override wins, model-id fallback,
  no-signal default), and retrieve-batch resolution (model arg wins,
  AWS modelId fallback, no-id default). 28 existing batch tests pass
  unchanged.
Greptile-flagged convention: model-capability flags should come from
`model_prices_and_context_window.json` (via `litellm.get_model_info`),
not from hardcoded substring matches. Every current Bedrock embedding
model is registered there with `"mode": "embedding"`, so future models
get classified automatically once their entry lands.

Changes vs previous commit:
- `_infer_openai_endpoint_from_model_id` now consults `get_model_info`
  first. If the registry returns `mode == "embedding"`, route to
  `/v1/embeddings`. If it returns any other mode, route to
  `/v1/chat/completions` (registry wins even if the model id name
  contains "embed").
- If `get_model_info` raises (cross-region inference profile ids like
  `us.amazon.titan-embed-text-v2:0`, Bedrock ARN forms, or unreleased
  models the registry can't normalize) or returns no mode, we fall back
  to the previous substring heuristic. This keeps the existing
  cross-region and ARN test cases passing without regression.
- 5 new tests pin the layered behavior: registry classifies known embed
  ids; registry chat mode wins over substring "embed" in the name;
  substring fallback engages when registry raises (with and without
  embed marker); empty-info dict falls through to substring path.

33 batch tests pass (24 in this file + 9 in test_batch_metadata_sanitization).
…lper

Splits `_lookup_registry_mode` out of
`_infer_openai_endpoint_from_model_id` so the registry-vs-fallback split
is unit-testable in isolation and so future call sites (e.g. cost
calculation, billing routing) can reuse the same defensive lookup
without duplicating the try/except shape.

- New `_lookup_registry_mode` returns the registry's `mode` string when
  `get_model_info` resolves and the entry has a non-empty string mode,
  else `None`. Catches the three real-world failure modes: registry
  raises, registry returns a non-dict, registry returns a dict without
  (or with empty / non-string) `mode`.
- `_infer_openai_endpoint_from_model_id` now reads from the helper and
  the dispatch is linear: registry-says-embedding -> embeddings,
  registry-says-anything-else -> chat, registry-silent -> substring
  fallback -> chat default.
- 6 new isolated tests pin the helper's behavior: returns mode for
  known ids, returns None on each of the four failure modes (raise,
  non-dict, missing mode, empty string mode, non-string mode).

30/30 tests in the file still pass.
…check

Greptile-flagged precision issue in transform_create_batch_response:
the previous `or` short-circuit silently replaced an explicit
empty-string endpoint with the model-id heuristic, dropping the
caller's signal.

Changes:
- Replace `original_request.get("endpoint") or heuristic` with an
  explicit `is not None` check so only a genuinely-missing endpoint
  key falls through to the heuristic.
- New test pins the empty-string preservation behavior: passing
  `endpoint=""` on `original_batch_request` results in
  `LiteLLMBatch.endpoint == ""` instead of `/v1/embeddings`.
@OS-joaocastilho OS-joaocastilho force-pushed the feat/bedrock-batch-embedding-endpoint-propagation-v2 branch from ee155c0 to ab6127f Compare May 26, 2026 14:06
@OS-joaocastilho OS-joaocastilho changed the base branch from litellm_oss_branch to shin_agent_oss_staging_05_22_2026 May 26, 2026 14:06
@codecov

codecov Bot commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026
base branch - same fix as in PR BerriAI#28875 to unblock the lint CI check
on this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant