Opik user auth key metadata extractors#28397
Conversation
…ntext fixed after refactoring to extractor logic
Greptile SummaryThis PR refactors
Confidence Score: 5/5The change is isolated to a single helper function and its tests; the priority-merge logic is straightforward and verified by the new test suite. The three-level merge in No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/integrations/opik/opik_payload_builder/extractors.py | Refactored extract_opik_metadata to support three-level priority merging (auth-key defaults → request-level → requester); logic is correct and all three new tests pass. Minor docstring typo and a redundant guard remain. |
| tests/test_litellm/integrations/opik/test_opik_extractors.py | Three new unit tests covering the three-tier priority merge; no real network calls, all assertions are consistent with the updated implementation. |
Reviews (7): Last reviewed commit: "test: add unit tests for OPik metadata e..." | Re-trigger Greptile
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR extends
Confidence Score: 3/5Not safe to merge as-is — the core new behavior (filling individual missing keys from auth metadata) is not implemented, and two of the three new tests will fail. The implementation uses an all-or-nothing guard that skips auth metadata entirely when any request-level Opik data is present. The tests were written to verify per-key fill behavior, creating a direct contradiction that will surface as failing tests on CI. litellm/integrations/opik/opik_payload_builder/extractors.py — the merge logic in extract_opik_metadata needs to change from a conditional fallback to always starting with auth as the base.
|
| Filename | Overview |
|---|---|
| litellm/integrations/opik/opik_payload_builder/extractors.py | Adds user_api_key_auth_metadata fallback to extract_opik_metadata, but the if not opik_meta: guard makes it all-or-nothing; tests 1 and 2 document key-level fill behavior that the current implementation does not produce. |
| tests/litellm/integrations/opik/test_opik_extractors.py | Three new mock-only unit tests for auth-metadata fallback; tests 1 and 2 will fail against the current implementation due to the all-or-nothing guard mismatch. Missing trailing newline. |
Reviews (2): Last reviewed commit: "test: add unit tests for OPik metadata e..." | Re-trigger Greptile
…urces for improved accuracy
fixed |
|
🤖 litellm-agent: Auto-merge skipped — the staging branch Please rebase your branch onto |
|
I'm not entirely sure if the AI check is correct—why should I rebase to the “shin_agent_oss_staging_05_21_2026” branch? Is it wrong of me to want to merge on litellm_internal_staging branch? |
|
@greptileai re review |
7513a9a
into
BerriAI:litellm_oss_staging_030626
* Fix incorrect agent API request example payload structure (#29556) * fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs (#29427) * fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs On /v1/messages and other LITELLM_METADATA_ROUTES, the parent OTel span is stored in litellm_params['litellm_metadata'] instead of litellm_params['metadata']. When the request body contains a native 'metadata' field (e.g. Anthropic's {"user_id": "..."}), litellm_params['metadata'] gets overwritten and the parent span is lost, producing orphan root spans with a different trace_id. Add fallback checks to litellm_metadata in: - _get_span_context(): so child spans find the correct parent - _end_proxy_span_from_kwargs(): so the proxy span gets closed Fixes: #27934 * test(otel): tighten assertions per Greptile review - test_span_context_metadata_takes_priority: assert litellm_metadata span is never accessed, proving metadata takes priority - test_span_context_no_parent_when_neither_has_span: assert both ctx and detected_span are None --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix: remove premature end-user budget check from get_end_user_object (#29420) * fix(proxy): remove premature end-user budget check from get_end_user_object Problem: - `_check_end_user_budget()` was called inside `get_end_user_object()` - This caused budget checks to run BEFORE `skip_budget_checks` could be evaluated - Zero-cost models (e.g., local vLLM) were incorrectly blocked when end-users exceeded their budget, even though they should bypass budget checks Solution: - Remove `_check_end_user_budget()` calls from `get_end_user_object()` - Budget enforcement now happens exclusively in `common_checks()` where `skip_budget_checks` context is available - `get_end_user_object()` keeps `route` as optional in function parameter for backwards compatibility and future implementation. * refactor(tests): update budget enforcement tests to reflect changes in get_end_user_object - test_get_end_user_object() verifies data fetching - test_check_end_user_budget() verifies enforcement - test_budget_enforcement_blocks_over_budget_users() integrates _check_end_user_budget() - test_resolve_end_user_reraises_budget_exceeded() is now test_resolve_end_user since no budget exceeded is thrown in get_end_user_object() * Gemini /images/generate and /images/edits billing fixes + add support for size and aspect ratio params (#29534) * Fix Gemini image config mapping * Address Gemini image config review * Format Gemini image generation transform * Fix Gemini image token usage logging * Share Gemini image request helpers * Fix Gemini Imagen model routing * Fixes as per self code review * Fixes per internal code review * Stop gating Imagen imageSize forwarding * Document Gemini image size mapping source * chore: retrigger lint * Clarify Gemini candidate count precedence * Add Inception provider (#29522) * add inception as provider (chat, fim) * linting * seperate test suite for chat and fim * fix test coverage * fix: model hub custom pricing model info (#29293) * Opik user auth key metadata extractors (#28397) * fix: enhance Opik metadata extraction to include user API key auth context fixed after refactoring to extractor logic * test: add unit tests for OPik metadata extraction logic * fix: enhance extract_opik_metadata function to prioritize metadata sources for improved accuracy * fix(ci): clarified comments and edited unit tests * test: add unit tests for OPik metadata extraction with auth and requester overrides * fix(ui): replace fixed favicon.ico with current api get /get_favicon (#29532) Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar> * fix(vertex/gemini): keep tool_call reference when a text-only assistant message follows (#29561) `_gemini_convert_messages_with_history` tracks `last_message_with_tool_calls` so a following tool result can be matched back to its tool call. The assignment was inside a branch guarded by `assistant_msg.get("tool_calls", []) is not None`, which is also True for a text-only assistant message (an empty list is not None). As a result, an assistant message with no tool calls that appears between a tool call and its tool result overwrote the reference, and conversion failed with: Exception: Missing corresponding tool call for tool response message. This shape is common: a model emits a short narration/assistant message after a tool call before the tool result is appended. Only update `last_message_with_tool_calls` when the assistant message actually carries tool_calls (or a function_call). Adds a regression test. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models (#28572) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models The 1-hour prompt-cache write tier (`cache_creation_input_token_cost_above_1hr`) was added to the us./global. variants of the Claude 4.5/4.6/4.7 family on Bedrock, but the eu./au./jp. cross-region inference profiles were left without it. AWS Bedrock pricing applies the same +10% regional premium across all geo profiles, so eu./au./jp. should carry the same 1-hour rates as us. (1.6x the 5-minute regional rate). Without these fields, cost tracking on EU/AU/JP Bedrock 1-hour-TTL prompt caching falls back to the 5-minute write rate and undercounts spend by ~60% for European, Australian, and Japanese tenants. Adds the 1-hour tier (and Sonnet 4.5's long-context >200K tier where AWS publishes one) to 14 regional Bedrock entries in both `model_prices_and_context_window.json` and the bundled `model_prices_and_context_window_backup.json`: - eu./au. Opus 4.6 ($11.00 / MTok) - eu./au. Opus 4.7 ($11.00 / MTok) - eu./au./jp. Sonnet 4.6 ($6.60 / MTok) - eu./au./jp. Sonnet 4.5 ($6.60 / MTok regular, $13.20 / MTok LC) - eu./au./jp. Haiku 4.5 ($2.20 / MTok) Also extends `tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py` with a `REGIONAL_EXPECTED` parametrized block covering all 13 new entries plus the existing 1.6x ratio invariant. Note: `eu.anthropic.claude-opus-4-5-20251101-v1:0` carries the wrong 5m rate today (base 6.25e-06 instead of regional 6.875e-06), which would break the 1.6x ratio check. It is intentionally left out of this PR so the scope stays "1-hour cache tier addition" — a separate follow-up should correct the EU 5m rates for Opus 4.5. --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * Add 1-hour cache write pricing tier for Vertex AI Anthropic models (#28569) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add 1-hour cache write pricing tier for Vertex AI Anthropic models GCP Vertex AI publishes a separate 1-hour cache write column for the Claude family (1.6x the 5-minute write rate, matching the documented Bedrock ratio). LiteLLM's Vertex AI Anthropic entries only carry the 5-minute tier, so any request that uses `cache_control: {"ttl": "1h"}` on Vertex AI Claude is undercounted in cost tracking by ~60%. The runtime side already supports the 1-hour tier — `VertexAIAnthropicConfig` extends `AnthropicConfig`, populating `ephemeral_1h_input_tokens`, and `_calculate_cache_creation_cost` reads `cache_creation_input_token_cost_above_1hr`. Only the price registry was missing data. Adds the field to 19 vertex_ai/claude-* entries across both `model_prices_and_context_window.json` and the bundled `model_prices_and_context_window_backup.json`: - Haiku 4.5 ($1.25 -> $2.00 / MTok) - Sonnet 3.7 / 4 / 4.5 / 4.6 ($3.75 -> $6.00 / MTok) - Opus 4.5 / 4.6 / 4.7 ($6.25 -> $10.00 / MTok) - Opus 4 / 4.1 ($18.75 -> $30.00 / MTok) Adds `tests/test_litellm/test_vertex_anthropic_1hr_cache_pricing.py` mirroring the Bedrock equivalent — pins each (5m, 1h) pair per model and asserts the 1.6x ratio across the family. Fixes #27781. --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * Fix Gemini multimodal function responses (#29325) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * address greptile review: add _transform_image_usage method and model-map supports_image_size flag - Add _transform_image_usage instance method to GoogleImageGenConfig that delegates to transform_gemini_image_usage, fixing the regression test - Replace hardcoded "2.5-flash" string check in supports_gemini_image_size with a get_model_info lookup on supports_image_size (default true) - Add supports_image_size: false to all gemini-2.5-flash model entries in model_prices_and_context_window.json so capability is controlled via the model map rather than embedded in code * fix test failures: schema validation, mypy type, model info plumbing, pricing test - Add supports_image_size to ModelInfoBase TypedDict so get_model_info surfaces it - Pass supports_image_size through _get_model_info_helper constructor call - Fix supports_gemini_image_size to use value is not False (None means unset, defaults to True) - Add supports_image_size to JSON schema in test_aaamodel_prices_and_context_window_json_is_valid - Correct gemini-3.1-flash-lite pricing assertions in test to match JSON values * Add Azure AI Kimi K2.6 metadata (#27052) * Add Azure AI Kimi K2.6 metadata * Scope Kimi metadata test cost map setup * fall back to substring check for models not in model_prices_and_context_window.json Models like gemini-2.5-flash-image-preview are not in the pricing JSON, so get_model_info raises. Fall back to "2.5-flash" not in model when the JSON has no explicit supports_image_size entry for the model. * fix(inception): don't forward global litellm.api_key to Inception FIM Match the Inception chat config: resolve only an Inception-specific key (param, litellm.inception_key, or INCEPTION_API_KEY) for the text-completion FIM path. The global litellm.api_key (often an OpenAI key) was both leaking to api.inceptionlabs.ai and taking precedence over the configured Inception key when set. * fix(auth): enforce end-user budget on custom-auth path that skips common_checks get_end_user_object() no longer raises BudgetExceededError, so custom-auth deployments with custom_auth_run_common_checks unset (which skip the centralized common_checks gate) stopped enforcing the end-user budget, letting an over-budget end user keep making requests. Re-enforce the budget in _run_post_custom_auth_checks on that path. --------- Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar> Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com> Co-authored-by: aneeshsangvikar <aneeshsangvikar@fiddler.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com> Co-authored-by: Suleiman Elkhoury <108065141+suleimanelkhoury@users.noreply.github.com> Co-authored-by: Dmitriy Alergant <93501479+DmitriyAlergant@users.noreply.github.com> Co-authored-by: Yanis Miraoui <yanis.miraoui19@imperial.ac.uk> Co-authored-by: Lovro Seder <vrovro@gmail.com> Co-authored-by: Thomas Mildner <12685945+Thomas-Mildner@users.noreply.github.com> Co-authored-by: José Luis Di Biase <josx@interorganic.com.ar> Co-authored-by: Lai Quang Huy <64073540+1qh@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: ZHONG Ziwen <67355585+zzw-math@users.noreply.github.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>


Relevant issues
Fixes #27740
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
adding this config as metadata
result in this expected behaviour

Type
🆕 New Feature
🐛 Bug Fix
Changes