feat(bedrock-batch): propagate embedding endpoint through batch responses#28867
Conversation
Greptile SummaryThis PR fixes a metadata fidelity bug where both
Confidence Score: 4/5The change is narrowly scoped to endpoint metadata on batch response objects and does not touch any request-processing or auth paths; existing chat batch behaviour is preserved by the chat-default fallback. Both transform methods now correctly resolve the endpoint field for embedding and chat batches. The two minor points are: the No files require special attention; both changed files are confined to the Bedrock batches transformation layer.
|
| Filename | Overview |
|---|---|
| litellm/llms/bedrock/batches/transformation.py | Adds _lookup_registry_mode and _infer_openai_endpoint_from_model_id helpers; updates both transform_create_batch_response and transform_retrieve_batch_response to infer endpoint from model id instead of hardcoding chat completions. Logic is sound; one minor precision issue with the or operator for endpoint resolution. |
| tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py | New test file with 30 unit tests covering the inference helpers, create-batch and retrieve-batch response transformers, and registry-driven classification. All tests use mocks (no real network calls). Good coverage of edge cases. |
Reviews (1): Last reviewed commit: "refactor(bedrock-batch): extract registr..." | Re-trigger Greptile
Greptile SummaryThis PR fixes a metadata fidelity gap where
Confidence Score: 4/5Safe to merge — the change is isolated to endpoint metadata on batch response objects and does not affect the AWS API call path, request payloads, or any auth logic. The two new helpers carry minor robustness gaps: the registry lookup omits an explicit provider hint, and the mode dispatch defaults all non-embedding registry modes to chat. Neither causes a regression on any current Bedrock model. litellm/llms/bedrock/batches/transformation.py — the _lookup_registry_mode and _infer_openai_endpoint_from_model_id helpers warrant a second look for the custom_llm_provider omission and the open-ended non-embedding mode fallback.
|
| Filename | Overview |
|---|---|
| litellm/llms/bedrock/batches/transformation.py | Replaces hardcoded /v1/chat/completions in both batch response transformers with a two-tier resolver: registry lookup via get_model_info then substring fallback on 'embed'. Logic is sound; minor concern around missing explicit custom_llm_provider on the registry call and non-exhaustive mode handling. |
| tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py | New test file with 30 well-structured tests covering direct unit, integration, registry-driven, and create/retrieve end-to-end paths. No network calls; all mock-dependent tests use pytest-mock correctly. |
Reviews (2): Last reviewed commit: "refactor(bedrock-batch): extract registr..." | Re-trigger Greptile
|
Addressed the precision nit in ee155c0b5d: replaced the On the other Greptile note about |
|
Both points are well-addressed:
The resolution order in Tip: You can customize Greptile's behavior for this repo with |
…nses Changes vs main: - BedrockBatchesConfig.transform_create_batch_response now prefers the endpoint set on `litellm_params["original_batch_request"]` and falls back to a model-id heuristic when none is present, instead of unconditionally defaulting to `/v1/chat/completions`. - BedrockBatchesConfig.transform_retrieve_batch_response replaces the hardcoded `/v1/chat/completions` with the same heuristic, fed from the caller-provided `model` argument or the `modelId` AWS returns on GetModelInvocationJob. This is necessary because the retrieve path has no original request to read from. - New static helper `_infer_openai_endpoint_from_model_id` returns `/v1/embeddings` when the lowercased model id contains "embed" (covers Titan v2 text, Titan multimodal, Cohere embed, Nova multimodal embeddings, ARN forms thereof), else `/v1/chat/completions`. Best-effort by design: chat models without "embed" in the name keep the same endpoint they reported before. - 19 new mocked unit tests across direct helper coverage, create-batch endpoint resolution (explicit override wins, model-id fallback, no-signal default), and retrieve-batch resolution (model arg wins, AWS modelId fallback, no-id default). 28 existing batch tests pass unchanged.
Greptile-flagged convention: model-capability flags should come from `model_prices_and_context_window.json` (via `litellm.get_model_info`), not from hardcoded substring matches. Every current Bedrock embedding model is registered there with `"mode": "embedding"`, so future models get classified automatically once their entry lands. Changes vs previous commit: - `_infer_openai_endpoint_from_model_id` now consults `get_model_info` first. If the registry returns `mode == "embedding"`, route to `/v1/embeddings`. If it returns any other mode, route to `/v1/chat/completions` (registry wins even if the model id name contains "embed"). - If `get_model_info` raises (cross-region inference profile ids like `us.amazon.titan-embed-text-v2:0`, Bedrock ARN forms, or unreleased models the registry can't normalize) or returns no mode, we fall back to the previous substring heuristic. This keeps the existing cross-region and ARN test cases passing without regression. - 5 new tests pin the layered behavior: registry classifies known embed ids; registry chat mode wins over substring "embed" in the name; substring fallback engages when registry raises (with and without embed marker); empty-info dict falls through to substring path. 33 batch tests pass (24 in this file + 9 in test_batch_metadata_sanitization).
…lper Splits `_lookup_registry_mode` out of `_infer_openai_endpoint_from_model_id` so the registry-vs-fallback split is unit-testable in isolation and so future call sites (e.g. cost calculation, billing routing) can reuse the same defensive lookup without duplicating the try/except shape. - New `_lookup_registry_mode` returns the registry's `mode` string when `get_model_info` resolves and the entry has a non-empty string mode, else `None`. Catches the three real-world failure modes: registry raises, registry returns a non-dict, registry returns a dict without (or with empty / non-string) `mode`. - `_infer_openai_endpoint_from_model_id` now reads from the helper and the dispatch is linear: registry-says-embedding -> embeddings, registry-says-anything-else -> chat, registry-silent -> substring fallback -> chat default. - 6 new isolated tests pin the helper's behavior: returns mode for known ids, returns None on each of the four failure modes (raise, non-dict, missing mode, empty string mode, non-string mode). 30/30 tests in the file still pass.
…check
Greptile-flagged precision issue in transform_create_batch_response:
the previous `or` short-circuit silently replaced an explicit
empty-string endpoint with the model-id heuristic, dropping the
caller's signal.
Changes:
- Replace `original_request.get("endpoint") or heuristic` with an
explicit `is not None` check so only a genuinely-missing endpoint
key falls through to the heuristic.
- New test pins the empty-string preservation behavior: passing
`endpoint=""` on `original_batch_request` results in
`LiteLLMBatch.endpoint == ""` instead of `/v1/embeddings`.
ee155c0 to
ab6127f
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Pre-existing Black violation on the shin_agent_oss_staging_05_22_2026 base branch - same fix as in PR BerriAI#28875 to unblock the lint CI check on this PR.
Relevant issues
Companion to #28865 (Titan v2 embedding JSONL input transform). That PR makes embedding JSONL records produce a valid Bedrock modelInput; this PR makes the resulting
LiteLLMBatchobject correctly reportendpoint=\"/v1/embeddings\"to OpenAI clients that introspect it.Reopens the use case from #15506. Supersedes #28863, which was opened against
main(wrong base) and accumulated force-pushes / close-reopen churn while we fixed up the base + addressed Greptile feedback. This PR is the clean, rebased version againstlitellm_oss_branch.Problem
BedrockBatchesConfig.transform_create_batch_responseandtransform_retrieve_batch_responseboth hardcodedendpoint=\"/v1/chat/completions\". For an embedding batch the create path could readoriginal_batch_request.endpointbut defaulted to chat-completions when the caller didn't set it; the retrieve path had no fallback at all because it has no original request to read from.Changes vs
litellm_oss_branchlitellm/llms/bedrock/batches/transformation.py_lookup_registry_modereadsmodefor a model id frommodel_prices_and_context_window.jsonvialitellm.get_model_info. Returns the mode string when the registry knows the id and the entry has a non-empty string mode, elseNone. Catches the failure modes: registry raises, registry returns a non-dict, registry returns a dict without (or with empty / non-string)mode._infer_openai_endpoint_from_model_idreturns/v1/embeddingswhen the registry saysmode == \"embedding\". Falls back to substring (\"embed\" in model_id.lower()) for ids the registry can't resolve - this catches cross-region inference profile prefixes (e.g.us.amazon.titan-embed-text-v2:0) and Bedrock ARN forms, neither of whichget_model_infohandles today. Otherwise defaults to/v1/chat/completions.transform_create_batch_responsenow prefersoriginal_batch_request.endpointfromlitellm_paramsand falls back to the heuristic instead of hardcoded chat-completions.transform_retrieve_batch_responsereplaces the hardcoded value with the same heuristic, fed from the caller-suppliedmodelargument or themodelIdAWS returns on GetModelInvocationJob.tests/test_litellm/llms/bedrock/batches/test_endpoint_inference.py(new)_infer_openai_endpoint_from_model_id: Titan v2 text, Titan multimodal, Nova multimodal embeddings, Cohere Embed, ARN form, Anthropic chat, cross-region chat profile, Nova chat, None, empty, case-insensitive.modelarg wins over AWS modelId, default-chat when neither present._lookup_registry_modetests: returns mode when resolves, returns None on raise / non-dict / missing mode / empty mode / non-string mode.What this does NOT change
CreateModelInvocationJobis endpoint-agnostic at the wire level: only themodelIdand the JSONLmodelInputschema vary. So this PR is metadata fidelity, not feature unlock; feat(bedrock-batch): route /v1/embeddings JSONL to Titan v2 modelInput #28865 is the schema fix.modelOutput-> OpenAI batch embedding response shape) is intentionally not in this PR. It will be a follow-up so the schema-specific risk stays isolated.Pre-Submission checklist
tests/test_litellm/directory, Adding at least 1 test is a hard requirementmake test-unit(30/30 intests/test_litellm/llms/bedrock/batches/test_endpoint_inference.pyplus 9 existing tests intest_batch_metadata_sanitization.py)@greptileaireview after opening this PRType
New Feature