fix: apply cache_read_input_token_cost to cached tokens in Fireworks AI cost calculation#26016
fix: apply cache_read_input_token_cost to cached tokens in Fireworks AI cost calculation#26016VANDRANKI wants to merge 1 commit into
Conversation
…AI cost calculation
Greptile SummaryThis PR fixes Fireworks AI cost calculation so that cached tokens are billed at Confidence Score: 5/5Safe to merge — the fix is correct and the only remaining finding is a minor defensive-programming suggestion. The core logic is sound and consistent with how other providers (Dashscope) handle the same bucketing. The single P2 note about clamping No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/llms/fireworks_ai/cost_calculator.py | Correctly separates cached, cache-creation, and non-cached input tokens and applies their respective per-token rates; one minor concern about non_cached_tokens going negative with malformed usage data. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[cost_per_token called] --> B{model in model_prices?}
B -- yes --> C[get_model_info direct]
B -- no --> D[get_base_model_for_pricing]
D --> E[get_model_info for base model]
C & E --> F{prompt_tokens_details present?}
F -- yes --> G[extract cached_tokens\ncache_creation_tokens]
F -- no --> H[cached_tokens = 0\ncache_creation_tokens = 0]
G & H --> I[non_cached_tokens =\nprompt_tokens - cached - creation]
I --> J[prompt_cost =\nnon_cached x input_cost\n+ cached x cache_read_cost\n+ creation x cache_creation_cost]
J --> K[completion_cost =\ncompletion_tokens x output_cost]
K --> L[return prompt_cost, completion_cost]
Reviews (1): Last reviewed commit: "fix: apply cache_read_input_token_cost t..." | Re-trigger Greptile
| model_info.get("cache_creation_input_token_cost") or 0.0 | ||
| ) | ||
| # Non-cached tokens are billed at the standard input rate | ||
| non_cached_tokens = usage["prompt_tokens"] - cached_tokens - cache_creation_tokens |
There was a problem hiding this comment.
non_cached_tokens can go negative with malformed usage data
If the API ever returns cached_tokens + cache_creation_tokens > prompt_tokens (e.g. a rounding discrepancy or inconsistent response), non_cached_tokens is negative and prompt_cost ends up negative. Clamping to zero makes the cost calculation resilient to such inconsistencies.
| non_cached_tokens = usage["prompt_tokens"] - cached_tokens - cache_creation_tokens | |
| non_cached_tokens = max(0, usage["prompt_tokens"] - cached_tokens - cache_creation_tokens) |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
What
litellm/llms/fireworks_ai/cost_calculator.pyignorescached_tokenswhen computing prompt cost. All input tokens are charged atinput_cost_per_tokeneven whencache_read_input_token_costis configured and cache hits are reported inprompt_tokens_details.Fixes #25950.
Why
The
cost_per_tokenfunction did:This charges every token at the full input rate, ignoring
cached_tokensinprompt_tokens_detailsand thecache_read_input_token_cost/cache_creation_input_token_costfields that are correctly loaded intomodel_info.Fix
Extract
cached_tokensandcache_creation_tokensfromusage.prompt_tokens_details, then apply the appropriate rate to each bucket:Falls back to 0.0 if the cache cost fields are not set, so standard Fireworks serverless tier pricing is unchanged.
Test
Using the numbers from the issue report:
prompt_tokens = 44341,cached_tokens = 41518input_cost_per_token = 1.4e-6,cache_read_input_token_cost = 2.6e-7Before:
44341 * 1.4e-6 = $0.0621(wrong)After:
2823 * 1.4e-6 + 41518 * 2.6e-7 = $0.00395 + $0.01079 = $0.0147(correct)