Skip to content

fix: apply cache_read_input_token_cost to cached tokens in Fireworks AI cost calculation#26016

Open
VANDRANKI wants to merge 1 commit into
BerriAI:mainfrom
VANDRANKI:fix/fireworks-cache-cost
Open

fix: apply cache_read_input_token_cost to cached tokens in Fireworks AI cost calculation#26016
VANDRANKI wants to merge 1 commit into
BerriAI:mainfrom
VANDRANKI:fix/fireworks-cache-cost

Conversation

@VANDRANKI

Copy link
Copy Markdown
Contributor

What

litellm/llms/fireworks_ai/cost_calculator.py ignores cached_tokens when computing prompt cost. All input tokens are charged at input_cost_per_token even when cache_read_input_token_cost is configured and cache hits are reported in prompt_tokens_details.

Fixes #25950.

Why

The cost_per_token function did:

prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]

This charges every token at the full input rate, ignoring cached_tokens in prompt_tokens_details and the cache_read_input_token_cost / cache_creation_input_token_cost fields that are correctly loaded into model_info.

Fix

Extract cached_tokens and cache_creation_tokens from usage.prompt_tokens_details, then apply the appropriate rate to each bucket:

non_cached_tokens * input_cost_per_token
+ cached_tokens * cache_read_input_token_cost
+ cache_creation_tokens * cache_creation_input_token_cost

Falls back to 0.0 if the cache cost fields are not set, so standard Fireworks serverless tier pricing is unchanged.

Test

Using the numbers from the issue report:

  • prompt_tokens = 44341, cached_tokens = 41518
  • input_cost_per_token = 1.4e-6, cache_read_input_token_cost = 2.6e-7

Before: 44341 * 1.4e-6 = $0.0621 (wrong)
After: 2823 * 1.4e-6 + 41518 * 2.6e-7 = $0.00395 + $0.01079 = $0.0147 (correct)

@greptile-apps

greptile-apps Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes Fireworks AI cost calculation so that cached tokens are billed at cache_read_input_token_cost (and cache-creation tokens at cache_creation_input_token_cost) rather than at the full input_cost_per_token rate. The change correctly extracts token buckets from usage.prompt_tokens_details and falls back to 0.0 when the cache cost fields are absent, leaving standard serverless pricing unaffected.

Confidence Score: 5/5

Safe to merge — the fix is correct and the only remaining finding is a minor defensive-programming suggestion.

The core logic is sound and consistent with how other providers (Dashscope) handle the same bucketing. The single P2 note about clamping non_cached_tokens at zero is a robustness improvement, not a blocker.

No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/fireworks_ai/cost_calculator.py Correctly separates cached, cache-creation, and non-cached input tokens and applies their respective per-token rates; one minor concern about non_cached_tokens going negative with malformed usage data.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[cost_per_token called] --> B{model in model_prices?}
    B -- yes --> C[get_model_info direct]
    B -- no --> D[get_base_model_for_pricing]
    D --> E[get_model_info for base model]
    C & E --> F{prompt_tokens_details present?}
    F -- yes --> G[extract cached_tokens\ncache_creation_tokens]
    F -- no --> H[cached_tokens = 0\ncache_creation_tokens = 0]
    G & H --> I[non_cached_tokens =\nprompt_tokens - cached - creation]
    I --> J[prompt_cost =\nnon_cached x input_cost\n+ cached x cache_read_cost\n+ creation x cache_creation_cost]
    J --> K[completion_cost =\ncompletion_tokens x output_cost]
    K --> L[return prompt_cost, completion_cost]
Loading

Reviews (1): Last reviewed commit: "fix: apply cache_read_input_token_cost t..." | Re-trigger Greptile

model_info.get("cache_creation_input_token_cost") or 0.0
)
# Non-cached tokens are billed at the standard input rate
non_cached_tokens = usage["prompt_tokens"] - cached_tokens - cache_creation_tokens

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 non_cached_tokens can go negative with malformed usage data

If the API ever returns cached_tokens + cache_creation_tokens > prompt_tokens (e.g. a rounding discrepancy or inconsistent response), non_cached_tokens is negative and prompt_cost ends up negative. Clamping to zero makes the cost calculation resilient to such inconsistencies.

Suggested change
non_cached_tokens = usage["prompt_tokens"] - cached_tokens - cache_creation_tokens
non_cached_tokens = max(0, usage["prompt_tokens"] - cached_tokens - cache_creation_tokens)

@codspeed-hq

codspeed-hq Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing VANDRANKI:fix/fireworks-cache-cost (d0b62e7) with main (850fe59)

Open in CodSpeed

@codecov

codecov Bot commented Apr 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/llms/fireworks_ai/cost_calculator.py 0.00% 9 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant