fix: route volcengine (Doubao) tiered-pricing models to the tiered cost handler by vineethsaivs · Pull Request #30357 · BerriAI/litellm

vineethsaivs · 2026-06-13T09:09:00Z

Relevant issues

Type

🐛 Bug Fix

Changes

Volcengine doubao models (for example volcengine/doubao-seed-2-0-pro-260215) define tiered_pricing in the model map but carry no flat input_cost_per_token/output_cost_per_token. The cost_per_token dispatcher had no volcengine branch, so these models fell through to generic_cost_per_token, which only reads flat per-token costs. The result was that every volcengine doubao request was tracked as $0

The dashscope cost calculator already implements graduated tiered pricing and reads tiered_pricing generically from the model map; the only provider-specific part was the hardcoded provider passed to get_model_info. This change makes that provider an argument (defaulting to dashscope, so existing behavior is unchanged) and routes volcengine to the same handler, mirroring how dashscope is dispatched

Added a regression test that asserts the graduated cost across two tiers for a doubao model; it fails on the current code (cost is $0) and passes with the fix

Screenshots / Proof of Fix

This path is network-free: cost is computed from the bundled model_prices_and_context_window_backup.json, so it does not require a live volcengine key. Reproduction on the local model map:

Before (current behavior):

volcengine/doubao-seed-2-0-pro-260215, prompt_tokens=10000, completion_tokens=2000
-> prompt_cost=$0.000000, completion_cost=$0.000000  (tracked as $0)

After this change:

volcengine/doubao-seed-2-0-pro-260215, prompt_tokens=10000, completion_tokens=2000
-> prompt_cost > 0, completion_cost > 0, matching the graduated tiered_pricing in the model map

The regression test test_volcengine_tiered_pricing_graduated_cost in tests/test_litellm/test_cost_calculator.py encodes the expected graduated cost across the first two tiers and verifies it end to end through cost_per_token(..., custom_llm_provider="volcengine")

CLAassistant · 2026-06-13T09:09:06Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
6 out of 7 committers have signed the CLA.

✅ ryan-crabbe-berri
✅ Sameerlite
✅ yuneng-berri
✅ shivamrawat1
✅ mateo-berri
✅ vineethsaivs
❌ yassin-berriai
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2026-06-13T09:12:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-06-13T09:12:24Z

Greptile Summary

This PR fixes volcengine (Doubao) models being billed at $0 by adding a volcengine branch to the cost_per_token dispatcher that routes to the existing tiered-pricing handler, and parameterising that handler with a custom_llm_provider argument so it resolves model info for any provider.

litellm/cost_calculator.py: A new elif custom_llm_provider == "volcengine" branch imports and calls the shared tiered-pricing handler with custom_llm_provider="volcengine", exactly mirroring the existing dashscope branch.
litellm/llms/dashscope/cost_calculator.py: The cost_per_token function gains an optional custom_llm_provider parameter (default "dashscope") passed straight through to get_model_info; all existing callers are unaffected.
tests/test_litellm/test_cost_calculator.py: A network-free regression test (test_volcengine_tiered_pricing_graduated_cost) reads the bundled model map, constructs a cross-tier token count for doubao-seed-2-0-pro-260215, and asserts the exact graduated cost; uses monkeypatch.setattr for proper state cleanup.

Confidence Score: 5/5

Safe to merge — the change adds a new provider branch and a backward-compatible parameter; no existing behaviour is altered.

The dispatcher change is additive: it inserts a new elif that was previously missing, so no existing provider path is touched. The shared tiered-pricing function change is fully backward-compatible (new parameter defaults to dashscope). The regression test exercises the full cost-calculation path using the bundled model map, requires no live credentials, and uses monkeypatch for clean teardown. There are no data-path mutations, no auth changes, and no schema modifications.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/cost_calculator.py	Adds a `volcengine` branch to the cost_per_token dispatcher that routes to the shared tiered-pricing handler, mirroring the existing `dashscope` branch; the lazy import avoids circular-import risk and the branch placement is consistent with all other provider branches in the file.
litellm/llms/dashscope/cost_calculator.py	Adds an optional `custom_llm_provider` parameter (default `"dashscope"`) to `cost_per_token` so the same tiered-pricing logic can resolve model info for any provider; the change is fully backward-compatible and the docstring is updated to reflect the broader purpose.
tests/test_litellm/test_cost_calculator.py	Adds `test_volcengine_tiered_pricing_graduated_cost`, a network-free regression test that reads tier data from the bundled model map, constructs a cross-tier token count, and verifies the graduated cost exactly; correctly uses `monkeypatch.setattr` (unlike many existing tests in the file that directly mutate `litellm.model_cost`).

_{Reviews (3): Last reviewed commit: "fix: route volcengine (Doubao) tiered-pr..." | Re-trigger Greptile}

greptile-apps · 2026-06-13T09:12:29Z

+    elif custom_llm_provider == "volcengine":
+        # Volcengine (Doubao) models share Dashscope's tiered-pricing structure
+        from litellm.llms.dashscope.cost_calculator import (
+            cost_per_token as tiered_cost_per_token,
+        )
+
+        return tiered_cost_per_token(
+            model=model, usage=usage_block, custom_llm_provider="volcengine"
+        )


Volcengine coupled to dashscope's cost calculator file

The volcengine branch imports from litellm.llms.dashscope.cost_calculator, which ties an unrelated provider to dashscope's internal module. Any future dashscope-specific changes to that file (e.g., dashscope-flavoured token handling) could silently affect volcengine cost calculations. Consider either creating a thin litellm/llms/volcengine/cost_calculator.py that re-exports the shared logic, or extracting the reusable tiered-pricing function into a provider-neutral location (e.g., litellm/llms/utils/tiered_cost_calculator.py) that both dashscope and volcengine import from.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

vineethsaivs · 2026-06-15T01:20:19Z

Good catches, thanks. I've switched the test to monkeypatch.setattr(litellm, "model_cost", ...) so it no longer leaks global state into later tests, matching the other tests in this file.

On the cross-provider coupling: I kept the change minimal by reusing the existing tiered-pricing calculator via a custom_llm_provider argument (defaulting to dashscope, so existing behavior is unchanged) rather than duplicating the graduated-tier logic. The function only reads tiered_pricing from the model map, so it's already provider-agnostic. Happy to relocate it to a provider-neutral module (or add a thin volcengine/cost_calculator.py wrapper) if you'd prefer that boundary; just let me know which you'd like.

Sameerlite · 2026-06-15T03:43:00Z

Thanks for the fix — the before/after cost output in the description is great proof! The Greptile review is a bit stale (commits landed after the last review). Triggering a fresh pass:

@greptileai

vineethsaivs · 2026-06-17T02:53:29Z

Retargeted this PR onto litellm_oss_branch per the contribution guard and rebased it, so the diff is just the three intended files (the volcengine dispatch branch in cost_calculator.py, the custom_llm_provider argument on the shared tiered-pricing handler, and the regression test). @greptileai

Sameerlite · 2026-06-17T03:57:39Z

Thanks for the contribution! A few things to get this ready for review:

Wrong base branch: This PR targets litellm_oss_branch but community PRs should target litellm_internal_staging. Could you rebase?
```
git fetch origin
git rebase --onto origin/litellm_internal_staging origin/litellm_oss_branch <your-branch>
git push --force-with-lease
```
Then update the base in GitHub's UI (Edit → Base: litellm_internal_staging).
Unresolved Greptile review threads — there are open threads from Greptile's review that haven't been resolved yet. Could you address those?

Once those are addressed, we'll take a closer look — thanks again!

…st handler Volcengine (Doubao) models define `tiered_pricing` but no flat per-token cost, so cost_per_token fell through to generic_cost_per_token (which only reads flat costs) and tracked them at $0 Route custom_llm_provider == "volcengine" to the shared tiered-pricing handler in litellm/llms/dashscope/cost_calculator.py, which already computes graduated tier costs. Make that handler provider-agnostic by adding a custom_llm_provider argument (default "dashscope" preserves existing behavior) so get_model_info resolves the correct model map entry Fixes BerriAI#30346

vineethsaivs · 2026-06-17T04:09:52Z

Done. Rebased onto litellm_internal_staging and updated the base.

On the Greptile threads: the global-state one is resolved, the test now uses monkeypatch.setattr(litellm, "model_cost", ...) so it no longer leaks state into later tests. On the dashscope coupling: the shared cost_per_token only reads tiered_pricing from the model map and takes custom_llm_provider as a parameter, so it is already provider-agnostic; routing volcengine through it avoids duplicating the graduated-tier logic. I am happy to extract it into a provider-neutral module (e.g. litellm/llms/utils/tiered_cost_calculator.py) that both dashscope and volcengine import from if you prefer that boundary; I kept it in place for now since moving it would also relocate the private tier helpers and widen the diff. @greptileai

greptile-apps · 2026-06-17T04:10:31Z

Thanks for addressing both threads and the clear explanation.

The global-state fix is correct — monkeypatch.setattr is the right approach and matches the pattern used elsewhere in the file.

On the dashscope coupling: your reasoning holds. The function is already provider-agnostic in behavior — get_model_info is parameterized, the tier math is generic, and the dashscope module path is purely a naming artifact. Extracting to litellm/llms/utils/tiered_cost_calculator.py would be the cleaner long-term home if a third provider ever needs to share it, but it's not necessary to block this fix. The diff is already minimal and the regression test gives a clear signal if the shared function ever diverges.

LGTM — the fix is correct, the test is solid, and the approach is reasonable given the scope of the change.

…iered cost handler (#30357)" This reverts commit 172e302.

vineethsaivs · 2026-06-17T17:06:20Z

I noticed this was merged into litellm_oss_170626_1 and then reverted there. If the revert was due to something on my end (a failing test or an integration conflict in the batch), I am happy to dig in and re-submit against the right branch; just let me know what you saw and I will turn it around quickly.

@Sameerlite

* fix(proxy): allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes (#30089) * fix(proxy): allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes Add the realtime WebRTC HTTP sub-routes (/realtime/client_secrets, /realtime/calls and their /v1 + /openai/v1 variants) to LiteLLMRoutes.openai_routes so is_llm_api_route() classifies them as LLM API routes. Without this, non-admin virtual keys received 401 'Only proxy admin can be used to generate, delete, update info for new keys/users/teams' when calling these endpoints. Fixes #29923 * fix(proxy): validate session.model for realtime routes in model-access check The GA Realtime WebRTC HTTP routes resolve the effective model from the nested session.model (falling back to the top-level model), but the auth layer's get_model_from_request() only extracted the top-level model. A model-restricted virtual key could therefore place a disallowed model in session.model, leave the top-level model unset, and skip can_key_call_model() entirely - obtaining an ephemeral token for a model it is not allowed to use. Extract session.model for the realtime client_secrets/calls routes so the model-access check runs against the model the request will actually use. Legitimate callers are unaffected; their permitted model still validates. Relates to #29923 * fix(proxy): classify realtime transcription_sessions routes as LLM API routes Add the GA Realtime WebRTC transcription_sessions HTTP routes to openai_routes so is_llm_api_route() returns True for them, matching the client_secrets and calls routes already fixed. These endpoints are registered with user_api_key_auth in realtime_endpoints/endpoints.py, so without this a non-admin virtual key calling POST /v1/realtime/transcription_sessions would hit the admin-only 401 branch. Extends the regression test parametrization accordingly. --------- Co-authored-by: habonlaci <4699494+habonlaci@users.noreply.github.com> * feat(proxy): surface max_input_tokens/max_output_tokens on /v1/models (#30272) * feat(proxy): surface max_input_tokens/max_output_tokens on /v1/models * fix(proxy): degrade /v1/models gracefully when model-group lookup fails --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix: sort tiered token-cost thresholds numerically (#30375) * fix: sort tiered token-cost thresholds numerically _get_token_base_cost iterated input_cost_per_token_above_<N>_tokens keys with a lexicographic sort, so for tiers whose thresholds have different digit lengths (e.g. 90k vs 128k) a request crossing both was billed at the lower tier that sorted first. Sort by the parsed numeric threshold instead, so the highest tier the request actually crosses is applied. * refactor: reuse _parse_above_token_threshold for inline threshold parse --------- Co-authored-by: Eric (GabiDevFamily) <271972409+santino18727-debug@users.noreply.github.com> * fix(openai): preserve cache_control for openai-compatible custom endpoints (#30387) * fix(openai): preserve cache_control for openai-compatible custom endpoints * fix(openai): use parsed hostname to detect real OpenAI for cache_control preservation * fix(proxy): drain all daily-spend batches per flush cycle (#30281) (#30505) * fix(types): prevent internal parallel_request_limiter fields from leaking to upstream providers (#30545) * fix(types): add internal parallel_request_limiter fields to all_litellm_params to prevent forwarding to upstream providers * test(types): add regression test for internal rate-limit fields in all_litellm_params * fix(init): add bool type annotation to suppress_debug_info (#30531) Module-level `suppress_debug_info = False` had no annotation, so strict type checkers (e.g. ty) infer it as `Literal[False]`. Reassigning it to `True` (as done in proxy_server.py and router.py) then fails with an invalid-assignment error. Annotate it as `bool` to match every other flag in this module. * fix: coalesce null aggregates in update_metrics for no-spend keys (#29945) * feat(team_endpoints): add query parameter `key_limit` to `/team/info` endpoint (#30006) * feat(team_endpoints): Add query parameter key_limit to /team/info * feat(team_endpoints): update schema.d.ts to include the new query parameter * feat(team_endpoints): add tests for limitting key count in /team/info response * feat(team_endpoints): Apply suggestions from greptile * Set greater-than constraint on key-limit * Fix type * fix(router): release aiohttp connection when stream iteration ends abnormally (#30271) * fix(router): release aiohttp connection when stream iteration ends abnormally A streaming response that terminates with a mid-stream read timeout, a task cancellation (client disconnect), or GeneratorExit never closed the underlying aiohttp ClientResponse. aiohttp only auto-releases the connector slot at body EOF, so each abnormally terminated stream permanently leaked one slot from the shared TCPConnector pool. During a backend traffic spike the pool drains; once exhausted every subsequent request to that host waits for a slot, times out and surfaces as a 408, indefinitely, even after the backend recovers. Only a proxy restart cleared the in-memory sessions, which matched the reported symptom of a router stuck returning 408 for a healthy vLLM backend. Close the response in a finally clause when iteration ends. On a fully read response the connection was already released at EOF and close() is a no-op, so keep-alive reuse for normal requests is unchanged. Fixes #30192 * test(aiohttp): cover GeneratorExit path with a mock instead of a live socket The previous slot-release test started a real aiohttp TCP server, which can flake in offline CI and does not exercise this fix's code path directly. Replace it with a dependency-injected mock that closes the stream generator (GeneratorExit) and asserts the response is closed, covering the third abnormal-exit path the finally block handles * feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery (#30273) * feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery * refactor(proxy): move Anthropic model-list formatter into llms/anthropic/common_utils * fix(proxy): make model_list request param optional for direct callers * feat(dashscope): add Responses API support (#30286) * feat(dashscope): add Responses API support DashScope's OpenAI-compatible endpoint serves /responses, so register a DashScopeResponsesAPIConfig that routes dashscope/* responses calls to {api_base}/responses without rewriting the upstream model id, instead of falling back to the chat-completions -> responses emulation pipeline. Closes #29780 * feat(dashscope): mark responses API as not supporting native websocket Matches the hosted_vllm/perplexity/openrouter responses configs, which all override supports_native_websocket() to False since the OpenAI-compatible endpoint has no native wss:// responses transport. --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix(spend-logs): preserve error_message on ProxyException failures (#30381) * fix(spend-logs): preserve error_message on ProxyException failures `StandardLoggingPayloadSetup.get_error_information` used `str(original_exception)` to populate the human-readable error message stored in `spend_logs.metadata.error_information.error_message`. `ProxyException` (litellm/proxy/_types.py:3453) sets `self.message` in its constructor but does NOT call `super().__init__(message)` and does NOT define `__str__`. As a result, `str(ProxyException(...))` returns the empty string, and every auth/budget/quota rejection was landing in spend_logs with `error_message=""` despite a fully populated traceback. Operator impact: dashboard "LLM Failure" rows became untriageable — the only way to tell a 401 from a 429 was to manually unpack the traceback JSON via psql. Burst failure patterns (e.g. a UI session polling with a stale token) produced 20-30 indistinguishable `error_code=401` rows per second. Fix: prefer the `.message` attribute (set by ProxyException and every litellm.exceptions.* class) over `str(exc)`. The `str(exc)` fallback is retained for non-litellm exception types, preserving prior behavior. Test plan: - 2 new unit tests in tests/test_litellm/litellm_core_utils/ test_litellm_logging.py: * test_get_error_information_prefers_message_attribute_over_str * test_get_error_information_falls_back_to_str_when_no_message_attr - Existing test_get_error_information_error_code_priority still passes - End-to-end verified: bad-key 401 now stores full "Authentication Error, Invalid proxy server token passed..." message in spend_logs.metadata.error_information.error_message * fix(spend-logs): preserve explicit empty .message + drop dead reference Greptile P2 on #30381. The truthiness check `if message_attr:` silently skipped an explicit empty-string `.message` and fell through to `str(original_exception)`. For ProxyException-shaped objects both produce empty, so the bug was latent; for other exception types it would inject a different string into error_information.error_message and corrupt the signal. Use `is not None` so an empty string survives verbatim. Also drop the stale `See e2e/cases/11.` comment reference — that path does not exist anywhere in the repo and confuses future readers. Regression test added: an exception with `.message=""` and a non-empty `super().__init__()` arg must yield error_message == "". * ci: retrigger workflows after base branch change to litellm_internal_staging * fix(anthropic): strip LiteLLM-injected total_tokens from /v1/messages response (#30382) * fix(anthropic): strip LiteLLM-injected total_tokens from /v1/messages response The non-streaming /v1/messages response carries a LiteLLM-injected usage.total_tokens = input_tokens + output_tokens that is not part of the Anthropic API spec. This caused three problems: 1. Shape divergence with streaming on the same endpoint. message_delta.usage in the SSE path never carries total_tokens. Clients parsing both paths get two different schemas from one endpoint. 2. Shape divergence with upstream. Direct calls to https://api.anthropic.com/v1/messages return no total_tokens field, so clients using the official Anthropic SDK couldn't rely on it, and clients that did rely on the LiteLLM-injected one broke when bypassing the proxy. 3. Numerical misuse. total = input + output undercounts when cache_read_input_tokens and cache_creation_input_tokens are non-zero, because cache tokens are reported in their own fields. A 100k-token cached prompt with 1 non-cache input token + 200 output tokens reports total_tokens = 201, off by ~99.8% from any reasonable definition of "total." Fix: add _strip_total_tokens_from_anthropic_response in litellm/proxy/anthropic_endpoints/endpoints.py and invoke it in the success path of anthropic_response right before returning. Only mutates dict-shaped responses; streaming (which already lacks the field) is left untouched. spend_logs / Prometheus continue to compute total_tokens internally for billing — this fix only strips the field from the wire response. Scope: only the Anthropic passthrough endpoint /v1/messages. The OpenAI-shape /v1/chat/completions is unaffected. * fix(anthropic): gate total_tokens strip behind flag + handle Pydantic .usage Two P1 greptile threads on #30382: P1 — **Backwards-incompatible removal without a feature flag** Stripping `usage.total_tokens` unconditionally breaks any client currently reading the LiteLLM-shaped non-streaming /v1/messages response. Per the codebase's policy (mirrors #30418), gate behind a new flag. - `litellm.strip_anthropic_total_tokens: bool = False` (default — backward-compat: clients keep seeing total_tokens). - Env override: `LITELLM_STRIP_ANTHROPIC_TOTAL_TOKENS=true`. - Docstring: planned to flip to True in a future major release; opt in early. P1 — **Silent no-op if `result` is a Pydantic model** `base_process_llm_request` may return a Pydantic-style object whose `.usage` is a plain dict (the most common shape — e.g. objects wrapping raw upstream JSON). The original `isinstance(response, dict)` guard skipped strip on those, so `total_tokens` would still hit the wire. Helper now also reads `getattr(response, "usage", None)` and strips when that's a dict. Strongly-typed Pydantic `Usage` sub-models with required `total_tokens` fields are still skipped — those impose type constraints the helper doesn't try to subvert. Tests: - `test_strips_total_tokens_on_pydantic_model_with_dict_usage` - `test_flag_defaults_off` 8/8 pass locally. * fix(anthropic): drop env var for strip flag (docs CI) Mirrors #30418's pattern (`expose_router_debug_in_errors: bool = True`, no `os.getenv`). The `LITELLM_STRIP_ANTHROPIC_TOTAL_TOKENS` env var introduced in the prior commit was flagged by `tests/documentation_tests/test_env_keys.py` because the documentation file `docs/my-website/docs/proxy/config_settings.md` lives in `BerriAI/litellm-docs` (separate repo) and registering a new env key requires a parallel docs PR — a friction we avoid here by exposing the flag only as a Python attribute + `litellm_settings` config key, both of which load through the existing proxy config plumbing without needing the env-var registry to be updated. No semantic change: default still False, behavior identical when set via `litellm.strip_anthropic_total_tokens = True` or `litellm_settings.strip_anthropic_total_tokens: true` in config.yaml. Verified locally: env scan no longer surfaces the key; 8/8 tests pass. * ci: retrigger workflows after base branch change to litellm_internal_staging * fix(pricing): correct swapped input/output token costs for command-r7b-12-2024 (#30413) * fix(pricing): correct swapped input/output token costs for command-r7b-12-2024 * test: resolve model prices JSON relative to test file for pip installs * fix(exception-mapping): map Gemini upstream-error body code 429 to RateLimitError (#30417) * fix(exception-mapping): map Gemini upstream-error body code 429 to RateLimitError Some Gemini-compatible gateways (e.g. new-api) wrap a 429 rate-limit signal from upstream inside an HTTP 500/503 envelope, with the real code only surfaced in the JSON body: {"error":{"message":"...high demand...","type":"upstream_error", "param":"","code":429}} Previously LiteLLM only looked at the HTTP status and mapped this to InternalServerError, which Router treats as non-retryable for many configs — so users got hard 500s instead of fallback/retry. Now the Gemini/Vertex exception mapper parses error.code from the body and routes code 429 to RateLimitError before falling through to the HTTP-status branches. Other body codes fall through unchanged. Tests cover: - new-api gateway's `code:429` payload now maps to RateLimitError - Genuine 500-body responses stay InternalServerError - Non-JSON body strings fall through to status-code mapping unchanged * fix(exception-mapping): scope body-code 429 promotion to 5xx envelopes Addresses greptile P1/P2 + @Sameerlite's review on #30417. The new elif branch was firing for any HTTP status, so a gateway response of HTTP 400 with body {"error":{"code":429,...}} would be incorrectly promoted to RateLimitError (retryable) instead of falling through to BadRequestError. Same trap for 401 -> AuthenticationError. Scoped the body-code 429 check to `500 <= status_code < 600` — covers 500/502/503/504 (gateways wrapping upstream 429 in any 5xx envelope) without inviting the 4xx misclassification. Tests: parametrized table now covers 5xx (500/502/503), 4xx (400/401), and the existing fall-through cases, asserting each maps to the exception type that matches the HTTP status code. 50/50 pass locally. * ci: retrigger workflows after base branch change to litellm_internal_staging * feat(router): add expose_router_debug_in_errors flag (default True) to redact internal model_group/fallback names (#30418) * feat(router)!: redact internal model_group/fallback names from exception messages The Router was unconditionally appending internal config names onto exception.message: - "Received Model Group=..." - "Available Model Group Fallbacks=..." - "No fallback model group found... Fallbacks={...}" - "context_window_fallbacks={...}" - Deployment-timeout messages including model_group - Fallback failure detail listing fallback chain ProxyException forwards .message verbatim to clients, so gateways were leaking their model_name / fallback wiring in every failed call. Fix: gate all five mutation sites on a new `litellm.expose_router_debug_in_errors` flag (default False). Set to True to restore upstream debug behavior for local debugging. Why: matches the redaction posture this codebase already has for upstream model identifiers (cf. _litellm_returned_model_name) and removes the last common error-path leak of internal model_group names. Breaking change marker (!): if anything parses "Received Model Group=" out of client error messages, flip the flag on or migrate to the x-litellm-* response headers instead. Tests: 7 cases covering each of the 5 redaction sites + the flag-on inverse path, plus a "default off" sanity check. * test(router): cover sites 1 + 3 of expose_router_debug_in_errors gate Addresses Greptile / codecov feedback on #30418: patch coverage was 55.6% with 4 lines uncovered in litellm/router.py. The existing tests exercised sites 2 (ContextWindowExceededError), 4 (no-fallback-found), and 5 (Received Model Group) — both default and flag-on. Sites 1 and 3 were declared in the PR description as covered by "site 5 also fires" but the gate body lines for each (the `e.message +=` inside the `if litellm.expose_router_debug_in_errors:` branch) only execute when the flag is on AND the specific exception path is taken, which neither existing test triggered. Added 4 new tests (default + flag-on × 2 sites): - test_default_does_not_leak_deployment_timeout_debug - test_flag_on_leaks_deployment_timeout_debug - test_default_does_not_leak_content_policy_fallback_hint - test_flag_on_leaks_content_policy_fallback_hint Trigger details: - Site 1 (litellm.Timeout in _acompletion) is reached via the Router-supported `mock_timeout=True` + `timeout=0.001` kwargs on `acompletion(...)`. Cannot embed a Timeout instance in model_list because Router.__init__ deep-copies it and Timeout.__reduce__ does not preserve the required positional args. - Site 3 (ContentPolicyViolationError without content_policy_fallbacks set, in async_function_with_fallbacks_common_utils) is reached by passing a `mock_response=litellm.ContentPolicyViolationError(...)` instance via the call-site kwarg — same deepcopy-avoidance reason. 11/11 tests pass locally. Patch coverage on litellm/router.py for this PR's diff should now be 100%. * chore(router): flip expose_router_debug_in_errors default to True Addresses @Sameerlite's review on #30418 — maintain backward compat on the wire. Redact becomes opt-in via setting the flag to False; the historical behavior (leak internal model_group / fallback wiring through exception messages) is preserved as the default. - litellm/__init__.py: default flipped to True, docstring rewritten with deprecation note pointing at a future flip to False (redact by default) in a major release. - tests/test_litellm/test_router_exception_redaction.py: fixture resets to True (was False); the "off" tests now explicitly set False; the "default_leaks_*" tests rely on the fixture default. test_flag_defaults_off -> test_flag_defaults_on. - No router.py change needed; the gate keys off the same flag, only the default changes. - PR title no longer needs the breaking-change `!` marker — no client sees a behavior change at default settings. 11/11 pass locally. * ci: retrigger workflows after base branch change to litellm_internal_staging * feat(guardrails): integrate Repelloai Argus guardrail (#30465) * feat(guardrails): add RepelloAI Argus guardrail integration (#1) * feat(guardrails): add RepelloAI Argus guardrail integration Add a new guardrail hook backed by RepelloAI Argus, with dashboard-managed asset policies enforced via an asset_id and X-API-Key auth. * fix(guardrails): harden RepelloAI Argus guardrail - scan streaming responses on output (was bypassing the guardrail) - log blocked verdicts as guardrail_intervened instead of success - treat auth/config errors (401/403/404/422) as misconfiguration that always blocks, not a fail-open-able unreachable error - default unreachable_fallback to fail_closed and read it directly; block on unknown/malformed verdicts so an API change can't silently disable enforcement - type unreachable_fallback as a Literal, drop the duplicate config model, expose unreachable_fallback in the config schema, and stop leaking the raw provider response / exception strings to the client * fix(guardrails): address RepelloAI Argus review feedback - support ARGUS_API_KEY (with REPELLOAI_API_KEY fallback) - make asset_id required in the config model - normalize unreachable_fallback so only fail_open opens; block on 400 misconfig - correct the shared unreachable_fallback field description * docs(guardrails): add RepelloAI Argus docs page and dashboard listing - add docs page covering config, env vars, modes, verdicts, failure semantics - list RepelloAI Argus in the Guardrail Garden with provider/logo mappings - add a regression test for the provider logo and display-name resolution * fix(guardrails): keep RepelloAI asset_id optional in config model A required asset_id leaked onto the shared LitellmParams (which inherits RepelloAIGuardrailConfigModel), breaking validation for every other guardrail. Keep it optional like sibling models; the guardrail __init__ still raises when asset_id is missing, which is the real enforcement. * Add comment for last user turn scanning * feat(guardrails): harden repelloai scanning * feat(guardrails): expand repelloai scanning to include tool definitions Add extraction of tool definitions and tool call arguments to the RepelloAI guardrail scanning. Improves detection coverage by including function schemas and parameters in the prompt sent to the guardrail service. Also captures detailed error responses in logs and adds guardrail header to streaming responses. * refactor(guardrails): fix and harden repelloai schema text extraction - Fix duplicate text in _iter_schema_text: previously all dict values were re-queued onto the stack even after scalar/list keys were already extracted explicitly, causing names/descriptions to appear twice in the scanned prompt - Extract schema key frozensets to module-level constants so they are not reconstructed on every call - Change _iter_schema_text from @classmethod to @staticmethod (cls unused) - Narrow _call_analyze stage param from str to Literal["prompt", "response"] - Add HttpxResponse type annotation to _raise_for_config_error - Add LLMResponseTypes annotation to async_post_call_success_hook response param * fix(guardrails): resolve pyright type errors in repelloai guardrail - Narrow async_handler.post return from Response|None to Response with explicit None guard before calling raise_for_status/json - Fix list comprehension returning str|None by switching to explicit loop with isinstance guard so pyright tracks the narrowing - Cast model_dump() result to Dict since hasattr does not narrow object type in pyright * fix(guardrails/repello): include Responses API instructions field in prompt scan The /v1/responses top-level `instructions` field was not included in _extract_prompt_text, allowing a caller to bypass guardrail policy checks by putting blocked content in `instructions` while keeping `input` benign. * feat: add api_key to config model and read prompt from data dict * fix(guardrails/repello): plug input_text and tool-call response bypass gaps Responses API input content parts with type 'input_text' were silently dropped by build_inspection_messages (which only handles type='text'), allowing callers to send blocked content via that path without triggering the pre-call scan. Fix: add _extract_input_text_parts to RepelloAIGuardrail and call it when walking the Responses API input messages. Post-call scanning skipped responses whose choices contained only tool_calls or function_call (message.content=None), letting models put blocked output in function arguments undetected. Fix: _extract_chat_completion_text now calls _extract_tool_call_args_from_message on each choice message. Also replace typing.Dict/List with builtin dict/list to clear TID251 strict ruff violations introduced by this file. * fix(guardrails/repello): scan Responses API function_call output arguments Output items with type 'function_call' in a /v1/responses response were skipped by _extract_responses_api_text; only 'message' items were walked. A model could return blocked content in function_call.arguments undetected. Now extract arguments from function_call output items before scanning. * fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients (#30486) * fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients When an Anthropic server-side tool (web_search, id `srvtoolu_...`) is used, its result is carried in `provider_specific_fields.web_search_results` — PRs #17746 / #17798 restore it for callers that round-trip provider_specific_fields. A generic OpenAI client that does NOT preserve provider_specific_fields (e.g. Open WebUI talking to a Vertex/Anthropic model over /chat/completions) drops it on replay and instead sends back an assistant `tool_call` + a `tool` message both keyed to the `srvtoolu_` id. The transform then produced a bare `server_tool_use` (with no following *_tool_result) plus a user `tool_result` for the same id — both invalid, so the next turn 400s: messages.N.content.0: unexpected `tool_use_id` found in `tool_result` blocks: srvtoolu_... Each `tool_result` block must have a corresponding `tool_use` block in the previous message. This is the commonly-reported vertex_ai symptom where Gemini works but Claude 400s on the 2nd turn of a web-search chat. Fix (litellm/litellm_core_utils/prompt_templates/factory.py): - convert_to_anthropic_tool_invoke: only emit a server_tool_use when its matching *_tool_result is available to pair with it; otherwise skip it (a bare server_tool_use is itself rejected). - anthropic_messages_pt: drop a replayed `tool`/`function` message whose tool_call_id starts with `srvtoolu_` (a server-executed tool produces no client result; a user tool_result for it is invalid). The existing reconstruction path (provider_specific_fields present, e.g. the litellm SDK) is unchanged, as is regular client tool_use/tool_result. Tests (tests/llm_translation/test_prompt_factory.py): - update test_convert_to_anthropic_tool_invoke_server_tool -> test_convert_to_anthropic_tool_invoke_server_tool_without_result_is_dropped - add test_anthropic_messages_pt_generic_client_drops_orphan_server_tool Follow-up to #17746 / #17798; addresses the generic-client (no provider_specific_fields) case of #17737. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(anthropic): cover the srvtoolu_ round-trip fix in the test_litellm unit suite The regression tests added in tests/llm_translation/test_prompt_factory.py aren't run by the coverage CI job (it runs tests/test_litellm), so the new factory.py branches showed as uncovered (codecov patch coverage). Add equivalent focused tests in the unit suite so both new branches are exercised there: - convert_to_anthropic_tool_invoke drops a srvtoolu_ server_tool_use when no matching *_tool_result is available. - anthropic_messages_pt drops the orphaned srvtoolu_ tool message a generic OpenAI client replays. Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(anthropic): cover the server_tool_use + result valid-pair path in unit suite Covers the remaining patch-coverage lines codecov flagged: convert_to_anthropic_tool_invoke emitting server_tool_use followed by its web_search_tool_result when the matching result is present (the litellm-SDK round-trip path). Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * style(anthropic): flatten srvtoolu_ tool-message guard to a negated if Addresses the Greptile style nit: replace the if-pass/else with a single negated `if not (...)` guard around the tool_result append. Behavior unchanged. Refs #17737 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(proxy): require premium only when enabling premium metadata fields (#30285) (#30506) Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix(perplexity): stop double-billing reasoning tokens in manual cost fallback (#30488) * fix(perplexity): stop double-billing reasoning tokens in manual cost fallback When perplexity_cost_per_token cannot use the API-provided usage.cost.total_cost short-circuit and falls back to manual calculation, it multiplies the full usage.completion_tokens by output_cost_per_token and then adds reasoning_tokens * output_cost_per_reasoning_token on top. Per the OpenAI/Perplexity usage convention codified for the central path in PR #18607, completion_tokens already INCLUDES reasoning_tokens, so the manual fallback double-bills reasoning at both the output and reasoning rate. Concrete impact on perplexity/sonar-deep-research (input 2e-6, output 8e-6, reasoning 3e-6): for the exact usage shape exercised by the live response fixture in tests/llm_translation/test_perplexity_reasoning.py (prompt_tokens=9, completion_tokens=20, reasoning_tokens=15) the current code charges 0.000223 vs the convention-correct 0.000103, a 2.165x overcharge. The bug is reachable whenever Perplexity omits the cost object (streaming chunks, fixture-driven paths, older API versions). Subtracts reasoning_tokens (clamped at zero) from completion_tokens before applying the output rate, mirroring how dashscope/cost_calculator.py and the central generic_cost_per_token already handle it. Preserves the existing fallback behaviour when output_cost_per_reasoning_token is unset (all completion_tokens stay at the output rate). Existing tests in tests/test_litellm/llms/perplexity/test_perplexity_cost_calculator.py asserted the buggy math and are updated to the convention-correct math. Adds a focused regression test using the exact usage shape from the live response fixture so this class of bug cannot be silently reintroduced. * style(perplexity): drop redundant type annotation on else branch to satisfy mypy mypy [no-redef] flagged 'completion_cost' as declared in both if and else arms; keeping the annotation only on the first declaration matches existing patterns in this file. * fix(perplexity): update integration test expected costs for non-double-billed math Three tests in test_perplexity_integration.py asserted the old buggy expectation that reasoning_tokens are billed in addition to the full completion_tokens count. After the fix in cost_per_token, reasoning_tokens are billed at the reasoning rate and the remaining (completion_tokens - reasoning_tokens) at the standard output rate, matching OpenAI/Perplexity convention (PR #18607). Updates: test_end_to_end_cost_calculation_with_transformation, test_main_cost_calculator_integration, test_high_volume_cost_calculation. The high-volume sanity threshold drops to 0.25 to reflect the corrected total. * fix(ui): use dynamic proxy base URL in MCP usage examples (#30487) Replace hardcoded http://localhost:4000 with getProxyBaseUrl() in the MCP server usage example and copy-to-clipboard snippet so the generated configuration works for non-local deployments. Fixes #30466 * feat: add missing UK PII entity types to Presidio guardrail (#30537) * feat: add missing UK PII entity types to Presidio guardrail Add UK_PASSPORT, UK_POSTCODE, and UK_VEHICLE_REGISTRATION to PiiEntityType enum and PII_ENTITY_CATEGORIES_MAP. These entity types are supported by Microsoft Presidio but were missing from litellm's type definitions, preventing users from configuring UK-specific PII detection. * test: remove fragile hardcoded entity count test Remove test_uk_category_entity_count which hardcodes len() == 5. The test_uk_entities_match_presidio_recognizers test already verifies exact set equality, making the count test redundant and fragile to future Presidio additions. * style: apply Black formatting to match CI requirements * fix: route volcengine (Doubao) tiered-pricing models to the tiered cost handler (#30357) Volcengine (Doubao) models define `tiered_pricing` but no flat per-token cost, so cost_per_token fell through to generic_cost_per_token (which only reads flat costs) and tracked them at $0 Route custom_llm_provider == "volcengine" to the shared tiered-pricing handler in litellm/llms/dashscope/cost_calculator.py, which already computes graduated tier costs. Make that handler provider-agnostic by adding a custom_llm_provider argument (default "dashscope" preserves existing behavior) so get_model_info resolves the correct model map entry Fixes #30346 * feat(mcp): make MCP gateway name and description configurable via env vars (#30473) * feat(mcp): make MCP gateway name and description configurable via env vars * Rename function _restore_env to _apply_env * docs(mcp): document import-time capture of env-backed identity constants Address Greptile review feedback: clarify that LITELLM_MCP_SERVER_NAME and LITELLM_MCP_SERVER_DESCRIPTION are read once at import and require a module reload to observe env changes after import. Generated with AI assistance Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Yevhen Luhovtsov <yevhen.luhovtsov@intapp.com> Co-authored-by: Claude <noreply@anthropic.com> * fix(mcp): preserve native tools in semantic filter hook (#26650) * fix(mcp): preserve native tools in semantic filter hook The SemanticToolFilterHook.async_pre_call_hook passed ALL tools (MCP + native) to filter_tools(), which only knows MCP-registered tool names. Native tools silently failed the name match in _get_tools_by_names() and were dropped from the request. Fix: partition tools into native and MCP-registered before filtering. Run the semantic filter only on MCP tools, then merge native tools back unconditionally. Changes: - Robust _is_mcp_tool() using shape-based detection for OpenAI-format dicts, safe regardless of future _extract_tool_info changes - Single-pass partition loop (no double _is_mcp_tool calls) - Preserve native tools in MCP expansion path (mixed requests) - Track MCP expansion to prevent expanded tools bypassing filtering - filter_stats reports MCP-only counts for accurate metrics - Extracted _emit_filter_metadata() helper - Skip spurious filter headers for all-native tool requests Closes #26212 * remove stale docstring note referencing tools_expanded_from_mcp * fix: handle Responses API name collision and preserve tool ordering - Classify Responses API tools ({type: 'function', name: '...'}) as native to prevent name collisions with MCP canonical names - Preserve original request tool ordering using id()-based merge instead of naive native+mcp concatenation - Add 2 regression tests: name collision and ordering preservation * style: apply black formatting * fix(mcp): harden semantic filter — preserve all native tool formats, safe metadata access, graceful expansion failure, name-based merge * lint: suppress PLR0915 on async_pre_call_hook (matches codebase convention) * ci: retrigger checks after rebase onto litellm_internal_staging * feat(fireworks): sync Fireworks AI model registry with current platform catalog (#30616) Adds 12 new Fireworks serverless models and updates 3 existing entries in model_prices_and_context_window.json and its bundled backup to match the current Fireworks platform model list. New direct models: glm-5p2, qwen3p7-plus, minimax-m3, minimax-m2p7, kimi-k2p7-code, kimi-k2p6, deepseek-v4-pro, deepseek-v4-flash. New router endpoints: glm-5p1-fast, kimi-k2p6-fast, kimi-k2p7-code-fast. Updated: glm-5p1, gpt-oss-120b, and gpt-oss-20b now carry correct output token caps, cache-read pricing, and explicit capability flags max_tokens is set equal to max_output_tokens (not the full context window) for models whose generation cap is below their context window. This avoids the shared input+output budget path in get_modified_max_tokens, which would otherwise let callers request output sizes the model cannot produce. The same fix corrects the pre-existing glm-5p1, gpt-oss-120b, and gpt-oss-20b entries that had max_tokens equal to the full context window Short-form aliases (fireworks_ai/<model>) are added for every direct accounts/fireworks/models/ entry so cost attribution works for callers using bare model names. Router endpoints get short-form aliases too, and transform_request now routes bare names ending in -fast to the accounts/fireworks/routers/ path instead of defaulting every bare name to models/. This keeps the kimi-k2p6-fast router from being misrouted to the nonexistent models/kimi-k2p6-fast endpoint kimi-k2p6-turbo is intentionally excluded; kimi-k2p6-fast is its replacement. Context windows for deepseek-v4 and kimi models use the power-of-two values (1048576 and 262144) published on the Fireworks model pages, matching the convention already used by existing entries Two regression tests in test_utils.py assert the exact per-token costs, token limits, capability flags, and short-form-to-long-form equality for all 15 models against both the main and backup cost maps. Two routing tests in test_fireworks_ai_chat_transformation.py verify bare -fast names route to routers/ and bare direct-model names route to models/ * fix(bedrock): handle role:"system" inside the messages array on /v1/messages (#29698) (#30443) * feat(anthropic): hoist leading in-array system to top-level (helper) * test(anthropic): cover _system_content_to_blocks edge cases; deepcopy cache_control * test(anthropic): mid-conversation system normalization cases * feat: add supports_mid_conversation_system flag to Claude Opus 4.8 Add supports_mid_conversation_system: true to all 9 claude-opus-4-8 cost-map entries (Anthropic-native, Bedrock, Vertex, Azure AI) in both the root cost map and the bundled package backup, since the runtime helper and tests read the backup in local/offline mode. Pin the mid-system passthrough regression test to the local cost map via the existing local_model_cost_map fixture so it reads the branch-local flag rather than the network-fetched main copy. * fix(bedrock): normalize in-array system in /v1/messages handler (#29698) Wire normalize_system_messages_for_anthropic into anthropic_messages_handler so all Bedrock /v1/messages paths (Invoke / Mantle / ClaudePlatform / Converse-bridge) hoist leading in-array system entries (and demote mid-conversation ones on models lacking supports_mid_conversation_system) into the top-level system field. The normalized messages/system are written back into the local_vars snapshot the base_llm branch reads from, otherwise the Invoke/Mantle fix would silently no-op. Also fix the helper to resolve supports_mid_conversation_system through the prefix-aware AnthropicModelInfo._supports_model_capability resolver. The raw _supports_factory could not see the flag once get_llm_provider left the invoke/ prefix on the model id, which would have wrongly demoted mid-conversation system on a Bedrock invoke opus-4-8 path. * fix(bedrock): resolve mid-conversation-system flag through mantle/invoke/converse route prefixes; drop unused param * fix(types): widen system param to Union[str, List] for hoisted system blocks * refactor(bedrock): drop dead local_vars messages writeback * fix(bedrock/converse): translate in-array system in anthropic->openai adapter (#29698) * fix(bedrock/converse): preserve cache_control on in-array system; test drop-empty * fix(bedrock/converse): rename colliding local to satisfy mypy; test handler system-merge branches * fix(types): register supports_mid_conversation_system in model-info schema The cost-map JSON-schema validation test (test_aaamodel_prices_and_context_window_json_is_valid) rejects unknown properties, so adding supports_mid_conversation_system to the opus-4-8 cost-map entries failed CI with 'Additional properties are not allowed'. Register the flag in the INTENDED_SCHEMA allow-list and in the ProviderSpecificModelInfo TypedDict so it is a typed, first-class capability flag alongside its peers (supports_output_config, etc.). --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix(bedrock/agentcore): optionally forward multimodal content blocks in InvokeAgentRuntime payload (#28885) * fix(bedrock/agentcore): optionally forward multimodal content blocks in InvokeAgentRuntime payload By default the agentcore provider flattens the last message to a text-only {"prompt": "..."} payload via convert_content_list_to_str, silently dropping OpenAI multimodal blocks (image_url, file, input_audio, ...). This adds an opt-in `forward_multimodal_content` litellm param. When truthy and the last message's content is a list containing a non-text block, the original OpenAI content list is forwarded verbatim under a new "content" field so an attachment-aware AgentCore agent can read it. Default off keeps the payload byte-identical to the legacy {"prompt": "..."} shape — existing agents are unaffected. The flag is read from optional_params (where other AgentCore params land) with a litellm_params fallback, and accepts a bool or a config/env string ('true', '1', ...). AgentCore Runtime is schemaless on the agent side — the agent's @app.entrypoint parses arbitrary JSON up to 100 MB (per https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-invoke-agent.html), so this is a purely upstream change; no AgentCore-side schema is asserted. * fix(bedrock/agentcore): shallow-copy forwarded multimodal content list Address review feedback (Sameerlite): payload["content"] = last_content aliased the caller's mutable messages[-1]["content"] list. Harmless today because the payload is JSON-serialized immediately, but a latent footgun if a future caller mutates the returned payload before serialization. Forward list(last_content) so the payload owns its own list. Block dicts stay shared on purpose — a deep copy would clone potentially large base64 media on the request hot path, and the flagged risk was the shared list, not the blocks. Update the passthrough tests to assert equality + distinct identity, and add a regression test that mutating the payload list can't leak back into the original message content. * Revert "fix(mcp): preserve native tools in semantic filter hook (#26650)" This reverts commit 438c825. * Revert "feat(guardrails): integrate Repelloai Argus guardrail (#30465)" This reverts commit 54da785. * Revert "feat(dashscope): add Responses API support (#30286)" This reverts commit 6766256. * Revert "fix(bedrock): handle role:"system" inside the messages array on /v1/messages (#29698) (#30443)" This reverts commit b8a8083. * Revert "fix(anthropic): drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients (#30486)" This reverts commit 6e9c0b0. * Revert "fix: route volcengine (Doubao) tiered-pricing models to the tiered cost handler (#30357)" This reverts commit 172e302. * Revert "feat(proxy): serve Anthropic-native /v1/models for Claude Code gateway discovery (#30273)" This reverts commit 4e31885. * fix: pass key_limit=None in team_member_update and patch model_cost in pricing test team_member_update called team_info without key_limit, so the fastapi.Query default object (not None) was passed through to get_data, which failed when serializing it. Pass key_limit=None explicitly to avoid this. test_get_model_info_costs patched litellm.model_cost from the local backup so the assertion holds before the PR is merged and the remote main URL is updated. * fix(security): validate resolved model in /realtime/client_secrets for non-transcription sessions (#30710) Omitting both model and session.model caused the endpoint to default to gpt-4o-realtime-preview without running can_key_call_resolved_model, so any key could access that model regardless of its allowed-model list. The transcription path already called can_key_call_resolved_model; this adds the same call for the realtime path before returning. * fix(lint): fix F821 undefined model_info and F841 unused metadata in create_model_info_response * fix: black formatting and stub get_model_group_info in third team translation test * fix: reformat utils.py with black 26.3.1 to match CI * fix: replace Optional[X] with X | None to satisfy UP045 ruff strict gate --------- Co-authored-by: Habon Laszlo <habonlaci@users.noreply.github.com> Co-authored-by: habonlaci <4699494+habonlaci@users.noreply.github.com> Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com> Co-authored-by: santino18727-debug <santino18727@gmail.com> Co-authored-by: Eric (GabiDevFamily) <271972409+santino18727-debug@users.noreply.github.com> Co-authored-by: Nitish Agarwal <1592163+nitishagar@users.noreply.github.com> Co-authored-by: jho1-godaddy <171078705+jho1-godaddy@users.noreply.github.com> Co-authored-by: 安妮的心动录 <74543653+anneheartrecord@users.noreply.github.com> Co-authored-by: Harshith Gujjeti <153299927+Harshxth@users.noreply.github.com> Co-authored-by: Tomoya Tabuchi <t@tomoyat1.com> Co-authored-by: Vedant Agarwal <43557509+Vedant-Agarwal@users.noreply.github.com> Co-authored-by: Prathamesh Jadhav <55660103+lollinng@users.noreply.github.com> Co-authored-by: songkuan-zheng <252822057+songkuan-zheng@users.noreply.github.com> Co-authored-by: Kropiunig <48442031+Kropiunig@users.noreply.github.com> Co-authored-by: Lavish Bansal <lavish.bansal619@gmail.com> Co-authored-by: Shane Emmons <27679+semmons99@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Anuj ojha <ojhaanuj224@gmail.com> Co-authored-by: Nahrin <nahrin@nahrinoda.com> Co-authored-by: Nbouyaa <67773915+FadelT@users.noreply.github.com> Co-authored-by: Vineeth Sai <vineethsai4444@gmail.com> Co-authored-by: Eugene Lugovtsov <34510252+EugeneLugovtsov@users.noreply.github.com> Co-authored-by: Yevhen Luhovtsov <yevhen.luhovtsov@intapp.com> Co-authored-by: Ayush Shekhar <106994833+ayushh0110@users.noreply.github.com> Co-authored-by: Ahmad Shahzad <107808273+shzdehmd@users.noreply.github.com> Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com> Co-authored-by: Jón Levy <levy@apro.is>

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

vineethsaivs force-pushed the fix-volcengine-tiered-pricing-cost branch from be8cd02 to c0c062c Compare June 15, 2026 01:18

vineethsaivs force-pushed the fix-volcengine-tiered-pricing-cost branch from c0c062c to 57fbdbb Compare June 17, 2026 02:52

vineethsaivs changed the base branch from litellm_internal_staging to litellm_oss_branch June 17, 2026 02:53

vineethsaivs requested a review from a team June 17, 2026 02:53

vineethsaivs force-pushed the fix-volcengine-tiered-pricing-cost branch from 57fbdbb to 2cd483b Compare June 17, 2026 04:08

vineethsaivs changed the base branch from litellm_oss_branch to litellm_internal_staging June 17, 2026 04:09

Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_170626_1 June 17, 2026 12:07

Sameerlite approved these changes Jun 17, 2026

View reviewed changes

Sameerlite merged commit 172e302 into BerriAI:litellm_oss_170626_1 Jun 17, 2026
3 checks passed

Sameerlite added a commit that referenced this pull request Jun 17, 2026

Revert "fix: route volcengine (Doubao) tiered-pricing models to the t…

60bde6d

…iered cost handler (#30357)" This reverts commit 172e302.

Uh oh!

Conversation

vineethsaivs commented Jun 13, 2026

Relevant issues

Type

Changes

Screenshots / Proof of Fix

Uh oh!

CLAassistant commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

vineethsaivs commented Jun 15, 2026

Uh oh!

Sameerlite commented Jun 15, 2026

Uh oh!

vineethsaivs commented Jun 17, 2026

Uh oh!

Sameerlite commented Jun 17, 2026

Uh oh!

vineethsaivs commented Jun 17, 2026

Uh oh!

greptile-apps Bot commented Jun 17, 2026

Uh oh!

Uh oh!

vineethsaivs commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jun 13, 2026 •

edited

Loading

codecov Bot commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading