Skip to content

Commit c23b19f

Browse files
mateo-berriclaudecursoragentyassin-berriai
authored
feat(openai): apply regional-processing cost uplift for EU/US data residency (#28626)
* feat(openai): apply regional-processing cost uplift for EU/US data residency OpenAI charges a 10% uplift on the latest GPT models when requests are served from a regionalized hostname (eu./us.api.openai.com). Infer the region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`, and multiply the computed cost by a per-model `regional_processing_uplift_multiplier_<region>` field. https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW * test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema * fix(cost): tighten data_residency inference and restore model_cost in tests - Only infer OpenAI data_residency when custom_llm_provider == "openai"; drop the implicit None fallback so non-OpenAI callers can't accidentally pick up a regional tag from a stray OpenAI hostname. - _local_model_cost_map fixture now snapshots and restores litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak state across the session. * refactor(openai): move data_residency helper under llms/openai * fix: thread data_residency through realtime stream cost calculation Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(cost): thread data_residency through batch_cost_calculator Apply the OpenAI regional-processing uplift multiplier to retrieve_batch cost paths so Batch API requests served via eu./us.api.openai.com are priced at the same uplifted token rates as completions/transcriptions. * refactor(openai): encapsulate provider check inside infer_openai_data_residency Move the custom_llm_provider == "openai" guard from get_litellm_params into the helper itself so the core utility no longer carries provider-specific dispatch logic. Callers pass through the provider unconditionally; the helper returns None for any non-OpenAI provider. * fix(responses): thread data_residency through Responses logging params The Responses API paths build their logging litellm_params dict after provider resolution but did not include data_residency, so cost calc saw None even when the effective api_base was a regional OpenAI host. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>
1 parent f38c16c commit c23b19f

17 files changed

Lines changed: 652 additions & 13 deletions

litellm/cost_calculator.py

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
from litellm.litellm_core_utils.llm_cost_calc.utils import (
2525
CostCalculatorUtils,
2626
_generic_cost_per_character,
27+
_get_regional_uplift_multiplier,
2728
_get_service_tier_cost_key,
2829
_parse_prompt_tokens_details,
2930
calculate_cost_component,
@@ -312,6 +313,10 @@ def cost_per_token( # noqa: PLR0915
312313
audio_transcription_file_duration: float = 0.0, # for audio transcription calls - the file time in seconds
313314
### SERVICE TIER ###
314315
service_tier: Optional[str] = None, # for OpenAI service tier pricing
316+
### DATA RESIDENCY ###
317+
data_residency: Optional[
318+
str
319+
] = None, # for OpenAI regional-processing uplift (e.g. "eu", "us")
315320
response: Optional[Any] = None,
316321
### REQUEST MODEL ###
317322
request_model: Optional[str] = None, # original request model for router detection
@@ -493,6 +498,7 @@ def cost_per_token( # noqa: PLR0915
493498
usage=usage_block,
494499
custom_llm_provider=custom_llm_provider,
495500
service_tier=service_tier,
501+
data_residency=data_residency,
496502
)
497503

498504
return prompt_cost, completion_cost
@@ -521,14 +527,18 @@ def cost_per_token( # noqa: PLR0915
521527
or call_type == CallTypes.retrieve_batch
522528
):
523529
return batch_cost_calculator(
524-
usage=usage_block, model=model, custom_llm_provider=custom_llm_provider
530+
usage=usage_block,
531+
model=model,
532+
custom_llm_provider=custom_llm_provider,
533+
data_residency=data_residency,
525534
)
526535
elif call_type == "atranscription" or call_type == "transcription":
527536
if _transcription_usage_has_token_details(usage_block):
528537
return openai_cost_per_token(
529538
model=model_without_prefix,
530539
usage=usage_block,
531540
service_tier=service_tier,
541+
data_residency=data_residency,
532542
)
533543

534544
return openai_cost_per_second(
@@ -579,7 +589,10 @@ def cost_per_token( # noqa: PLR0915
579589
)
580590
elif custom_llm_provider == "openai":
581591
return openai_cost_per_token(
582-
model=model, usage=usage_block, service_tier=service_tier
592+
model=model,
593+
usage=usage_block,
594+
service_tier=service_tier,
595+
data_residency=data_residency,
583596
)
584597
elif custom_llm_provider == "databricks":
585598
return databricks_cost_per_token(model=model, usage=usage_block)
@@ -631,6 +644,7 @@ def cost_per_token( # noqa: PLR0915
631644
usage=usage_block,
632645
custom_llm_provider=custom_llm_provider,
633646
service_tier=service_tier,
647+
data_residency=data_residency,
634648
)
635649

636650
if (
@@ -1117,6 +1131,10 @@ def completion_cost( # noqa: PLR0915
11171131
litellm_logging_obj: Optional[LitellmLoggingObject] = None,
11181132
### SERVICE TIER ###
11191133
service_tier: Optional[str] = None, # for OpenAI service tier pricing
1134+
### DATA RESIDENCY ###
1135+
data_residency: Optional[
1136+
str
1137+
] = None, # for OpenAI regional-processing uplift (e.g. "eu", "us")
11201138
) -> float:
11211139
"""
11221140
Calculate the cost of a given completion call fot GPT-3.5-turbo, llama2, any litellm supported llm.
@@ -1516,6 +1534,7 @@ def completion_cost( # noqa: PLR0915
15161534
combined_usage_object=cost_per_token_usage_object,
15171535
custom_llm_provider=custom_llm_provider,
15181536
litellm_model_name=model,
1537+
data_residency=data_residency,
15191538
)
15201539
elif call_type == _MCP_CALL_TYPE:
15211540
from litellm.proxy._experimental.mcp_server.cost_calculator import (
@@ -1600,6 +1619,7 @@ def completion_cost( # noqa: PLR0915
16001619
audio_transcription_file_duration=audio_transcription_file_duration,
16011620
rerank_billed_units=rerank_billed_units,
16021621
service_tier=service_tier,
1622+
data_residency=data_residency,
16031623
response=completion_response,
16041624
request_model=request_model_for_cost,
16051625
)
@@ -1811,6 +1831,10 @@ def response_cost_calculator(
18111831
litellm_logging_obj: Optional[LitellmLoggingObject] = None,
18121832
### SERVICE TIER ###
18131833
service_tier: Optional[str] = None, # for OpenAI service tier pricing
1834+
### DATA RESIDENCY ###
1835+
data_residency: Optional[
1836+
str
1837+
] = None, # for OpenAI regional-processing uplift (e.g. "eu", "us")
18141838
) -> float:
18151839
"""
18161840
Returns
@@ -1844,6 +1868,7 @@ def response_cost_calculator(
18441868
router_model_id=router_model_id,
18451869
litellm_logging_obj=litellm_logging_obj,
18461870
service_tier=service_tier,
1871+
data_residency=data_residency,
18471872
)
18481873
return response_cost
18491874
except Exception as e:
@@ -2202,6 +2227,7 @@ def batch_cost_calculator(
22022227
model: str,
22032228
custom_llm_provider: Optional[str] = None,
22042229
model_info: Optional[ModelInfo] = None,
2230+
data_residency: Optional[str] = None,
22052231
) -> Tuple[float, float]:
22062232
"""
22072233
Calculate the cost of a batch job.
@@ -2286,6 +2312,11 @@ def batch_cost_calculator(
22862312
usage.completion_tokens * (output_cost_per_token) / 2
22872313
) # batch cost is usually half of the regular token cost
22882314

2315+
uplift = _get_regional_uplift_multiplier(model_info, data_residency)
2316+
if uplift != 1.0:
2317+
total_prompt_cost *= uplift
2318+
total_completion_cost *= uplift
2319+
22892320
return total_prompt_cost, total_completion_cost
22902321

22912322

@@ -2431,6 +2462,7 @@ def handle_realtime_stream_cost_calculation(
24312462
combined_usage_object: Usage,
24322463
custom_llm_provider: str,
24332464
litellm_model_name: str,
2465+
data_residency: Optional[str] = None,
24342466
) -> float:
24352467
"""
24362468
Handles the cost calculation for realtime stream responses.
@@ -2461,6 +2493,7 @@ def handle_realtime_stream_cost_calculation(
24612493
model=model_name,
24622494
usage=combined_usage_object,
24632495
custom_llm_provider=custom_llm_provider,
2496+
data_residency=data_residency,
24642497
)
24652498
except Exception:
24662499
continue

litellm/litellm_core_utils/get_litellm_params.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
from typing import Optional
22

3+
from litellm.llms.openai.data_residency import infer_openai_data_residency
4+
35
# Pre-define optional kwargs keys as frozenset for O(1) lookups
46
# These are extracted from kwargs only if present, avoiding unnecessary .get() calls
57
_OPTIONAL_KWARGS_KEYS = frozenset(
@@ -103,6 +105,10 @@ def get_litellm_params(
103105
if litellm_trace_id is None:
104106
litellm_trace_id = _meta.get("trace_id") or _meta.get("session_id")
105107

108+
data_residency: Optional[str] = infer_openai_data_residency(
109+
custom_llm_provider, api_base
110+
)
111+
106112
# Build base dict with explicit parameters (always included)
107113
litellm_params = {
108114
"acompletion": acompletion,
@@ -112,6 +118,7 @@ def get_litellm_params(
112118
"verbose": verbose,
113119
"custom_llm_provider": custom_llm_provider,
114120
"api_base": api_base,
121+
"data_residency": data_residency,
115122
"litellm_call_id": litellm_call_id,
116123
"model_alias_map": model_alias_map,
117124
"completion_call_id": completion_call_id,

litellm/litellm_core_utils/litellm_logging.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1546,6 +1546,11 @@ def _response_cost_calculator(
15461546
if self.optional_params
15471547
else None
15481548
),
1549+
"data_residency": (
1550+
self.litellm_params.get("data_residency")
1551+
if hasattr(self, "litellm_params") and self.litellm_params
1552+
else None
1553+
),
15491554
}
15501555
except Exception as e: # error creating kwargs for cost calculation
15511556
debug_info = StandardLoggingModelCostFailureDebugInformation(

litellm/litellm_core_utils/llm_cost_calc/utils.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
CacheCreationTokenDetails,
1010
CallTypes,
1111
CompletionTokensDetailsWrapper,
12+
DataResidency,
1213
ImageResponse,
1314
ModelInfo,
1415
PassthroughCallTypes,
@@ -617,11 +618,46 @@ def _calculate_input_cost(
617618
return prompt_cost
618619

619620

621+
def _get_regional_uplift_multiplier(
622+
model_info: ModelInfo, data_residency: Optional[str]
623+
) -> float:
624+
"""
625+
Resolve the per-model regional-processing uplift multiplier for a given
626+
data-residency region.
627+
628+
OpenAI applies a flat percentage uplift (e.g. +10%) on all token costs for
629+
requests served from a regionalized hostname (eu./us.api.openai.com). The
630+
multiplier is stored on the model entry as
631+
``regional_processing_uplift_multiplier_<region>`` (e.g. 1.10).
632+
633+
Returns 1.0 (no uplift) when ``data_residency`` is ``None`` or when the
634+
model has no multiplier configured for the given region.
635+
"""
636+
if data_residency is None:
637+
return 1.0
638+
residency = data_residency.lower()
639+
if residency not in {r.value for r in DataResidency}:
640+
return 1.0
641+
multiplier = model_info.get(f"regional_processing_uplift_multiplier_{residency}")
642+
if multiplier is None:
643+
return 1.0
644+
try:
645+
return float(cast(float, multiplier))
646+
except (TypeError, ValueError):
647+
verbose_logger.exception(
648+
"Invalid regional_processing_uplift_multiplier_%s for model; "
649+
"defaulting to 1.0",
650+
residency,
651+
)
652+
return 1.0
653+
654+
620655
def generic_cost_per_token( # noqa: PLR0915
621656
model: str,
622657
usage: Usage,
623658
custom_llm_provider: str,
624659
service_tier: Optional[str] = None,
660+
data_residency: Optional[str] = None,
625661
) -> Tuple[float, float]:
626662
"""
627663
Calculates the cost per token for a given model, prompt tokens, and completion tokens.
@@ -631,6 +667,8 @@ def generic_cost_per_token( # noqa: PLR0915
631667
Input:
632668
- model: str, the model name without provider prefix
633669
- usage: LiteLLM Usage block, containing anthropic caching information
670+
- data_residency: optional OpenAI data-residency region (e.g. "eu", "us"),
671+
used to apply the per-model regional-processing uplift multiplier.
634672
635673
Returns:
636674
Tuple[float, float] - prompt_cost_in_usd, completion_cost_in_usd
@@ -781,6 +819,14 @@ def generic_cost_per_token( # noqa: PLR0915
781819
)
782820
completion_cost += float(image_tokens) * _output_cost_per_image_token
783821

822+
## REGIONAL DATA-RESIDENCY UPLIFT
823+
# Applied as a flat multiplier across all token costs for the request
824+
# when the upstream is a regionalized OpenAI host (eu./us.api.openai.com).
825+
uplift = _get_regional_uplift_multiplier(model_info, data_residency)
826+
if uplift != 1.0:
827+
prompt_cost *= uplift
828+
completion_cost *= uplift
829+
784830
return prompt_cost, completion_cost
785831

786832

litellm/llms/openai/cost_calculation.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,20 @@ def cost_router(call_type: CallTypes) -> Literal["cost_per_token", "cost_per_sec
1919

2020

2121
def cost_per_token(
22-
model: str, usage: Usage, service_tier: Optional[str] = None
22+
model: str,
23+
usage: Usage,
24+
service_tier: Optional[str] = None,
25+
data_residency: Optional[str] = None,
2326
) -> Tuple[float, float]:
2427
"""
2528
Calculates the cost per token for a given model, prompt tokens, and completion tokens.
2629
2730
Input:
2831
- model: str, the model name without provider prefix
2932
- usage: LiteLLM Usage block, containing anthropic caching information
33+
- data_residency: optional OpenAI data-residency region (e.g. "eu", "us"),
34+
inferred from api_base. Applies the model's regional-processing
35+
uplift multiplier when set.
3036
3137
Returns:
3238
Tuple[float, float] - prompt_cost_in_usd, completion_cost_in_usd
@@ -37,6 +43,7 @@ def cost_per_token(
3743
usage=usage,
3844
custom_llm_provider="openai",
3945
service_tier=service_tier,
46+
data_residency=data_residency,
4047
)
4148
# ### Non-cached text tokens
4249
# non_cached_text_tokens = usage.prompt_tokens
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
"""
2+
Helpers for resolving OpenAI data-residency (regional processing) from an
3+
api_base URL.
4+
5+
OpenAI enforces hostname-per-region for projects with geography restrictions
6+
enabled and rejects requests sent to the wrong host, so the api_base hostname
7+
is the authoritative signal of which region a request was processed in.
8+
"""
9+
10+
from typing import Dict, Optional
11+
from urllib.parse import urlparse
12+
13+
# Mapping of OpenAI regional hostnames to the corresponding data-residency
14+
# value used by the cost calculator. See
15+
# https://developers.openai.com/api/docs/pricing for the regional-processing
16+
# uplift these hostnames trigger.
17+
_OPENAI_REGIONAL_HOSTS: Dict[str, str] = {
18+
"eu.api.openai.com": "eu",
19+
"us.api.openai.com": "us",
20+
}
21+
22+
23+
def infer_openai_data_residency(
24+
custom_llm_provider: Optional[str], api_base: Optional[str]
25+
) -> Optional[str]:
26+
"""
27+
Derive the OpenAI data-residency region from an api_base URL.
28+
29+
Returns ``"eu"`` for the EU regional host, ``"us"`` for the US regional
30+
host, and ``None`` for the default global host, any non-OpenAI provider,
31+
or any non-OpenAI URL.
32+
"""
33+
if custom_llm_provider != "openai" or not api_base:
34+
return None
35+
try:
36+
host = urlparse(api_base).hostname
37+
except (TypeError, ValueError):
38+
return None
39+
if not host:
40+
return None
41+
return _OPENAI_REGIONAL_HOSTS.get(host.lower())

0 commit comments

Comments
 (0)