fix: apply cache_read_input_token_cost to cached tokens in Fireworks AI cost calculation by VANDRANKI · Pull Request #26016 · BerriAI/litellm

VANDRANKI · 2026-04-18T17:21:57Z

What

litellm/llms/fireworks_ai/cost_calculator.py ignores cached_tokens when computing prompt cost. All input tokens are charged at input_cost_per_token even when cache_read_input_token_cost is configured and cache hits are reported in prompt_tokens_details.

Fixes #25950.

Why

The cost_per_token function did:

prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]

This charges every token at the full input rate, ignoring cached_tokens in prompt_tokens_details and the cache_read_input_token_cost / cache_creation_input_token_cost fields that are correctly loaded into model_info.

Fix

Extract cached_tokens and cache_creation_tokens from usage.prompt_tokens_details, then apply the appropriate rate to each bucket:

non_cached_tokens * input_cost_per_token
+ cached_tokens * cache_read_input_token_cost
+ cache_creation_tokens * cache_creation_input_token_cost

Falls back to 0.0 if the cache cost fields are not set, so standard Fireworks serverless tier pricing is unchanged.

Test

Using the numbers from the issue report:

prompt_tokens = 44341, cached_tokens = 41518
input_cost_per_token = 1.4e-6, cache_read_input_token_cost = 2.6e-7

Before: 44341 * 1.4e-6 = $0.0621 (wrong)
After: 2823 * 1.4e-6 + 41518 * 2.6e-7 = $0.00395 + $0.01079 = $0.0147 (correct)

…AI cost calculation

greptile-apps · 2026-04-18T17:23:40Z

Greptile Summary

This PR fixes Fireworks AI cost calculation so that cached tokens are billed at cache_read_input_token_cost (and cache-creation tokens at cache_creation_input_token_cost) rather than at the full input_cost_per_token rate. The change correctly extracts token buckets from usage.prompt_tokens_details and falls back to 0.0 when the cache cost fields are absent, leaving standard serverless pricing unaffected.

Confidence Score: 5/5

Safe to merge — the fix is correct and the only remaining finding is a minor defensive-programming suggestion.

The core logic is sound and consistent with how other providers (Dashscope) handle the same bucketing. The single P2 note about clamping non_cached_tokens at zero is a robustness improvement, not a blocker.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/llms/fireworks_ai/cost_calculator.py	Correctly separates cached, cache-creation, and non-cached input tokens and applies their respective per-token rates; one minor concern about `non_cached_tokens` going negative with malformed usage data.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[cost_per_token called] --> B{model in model_prices?}
    B -- yes --> C[get_model_info direct]
    B -- no --> D[get_base_model_for_pricing]
    D --> E[get_model_info for base model]
    C & E --> F{prompt_tokens_details present?}
    F -- yes --> G[extract cached_tokens\ncache_creation_tokens]
    F -- no --> H[cached_tokens = 0\ncache_creation_tokens = 0]
    G & H --> I[non_cached_tokens =\nprompt_tokens - cached - creation]
    I --> J[prompt_cost =\nnon_cached x input_cost\n+ cached x cache_read_cost\n+ creation x cache_creation_cost]
    J --> K[completion_cost =\ncompletion_tokens x output_cost]
    K --> L[return prompt_cost, completion_cost]

_{Reviews (1): Last reviewed commit: "fix: apply cache_read_input_token_cost t..." | Re-trigger Greptile}

greptile-apps · 2026-04-18T17:23:44Z

+        model_info.get("cache_creation_input_token_cost") or 0.0
+    )
+    # Non-cached tokens are billed at the standard input rate
+    non_cached_tokens = usage["prompt_tokens"] - cached_tokens - cache_creation_tokens


non_cached_tokens can go negative with malformed usage data

If the API ever returns cached_tokens + cache_creation_tokens > prompt_tokens (e.g. a rounding discrepancy or inconsistent response), non_cached_tokens is negative and prompt_cost ends up negative. Clamping to zero makes the cost calculation resilient to such inconsistencies.

Suggested change

non_cached_tokens = usage["prompt_tokens"] - cached_tokens - cache_creation_tokens

non_cached_tokens = max(0, usage["prompt_tokens"] - cached_tokens - cache_creation_tokens)

codspeed-hq · 2026-04-18T17:24:38Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing VANDRANKI:fix/fireworks-cache-cost (d0b62e7) with main (850fe59)}

codecov · 2026-04-18T17:24:47Z

Codecov Report

❌ Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/llms/fireworks_ai/cost_calculator.py	0.00%	9 Missing ⚠️

📢 Thoughts on this report? Let us know!

fix: apply cache_read_input_token_cost to cached tokens in Fireworks …

d0b62e7

…AI cost calculation

greptile-apps Bot reviewed Apr 18, 2026

View reviewed changes

This was referenced Apr 19, 2026

fix(fireworks): add glm-5p1 metadata and parallel_tool_calls #26031

Closed

fix(fireworks): add glm-5p1 metadata and parallel_tool_calls #26069

Merged

ramezquitao mentioned this pull request May 13, 2026

[Bug]: Fireworks AI - cache_read_input_token_cost configured but not used in cost calculation #25950

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: apply cache_read_input_token_cost to cached tokens in Fireworks AI cost calculation#26016

fix: apply cache_read_input_token_cost to cached tokens in Fireworks AI cost calculation#26016
VANDRANKI wants to merge 1 commit into
BerriAI:mainfrom
VANDRANKI:fix/fireworks-cache-cost

VANDRANKI commented Apr 18, 2026

Uh oh!

greptile-apps Bot commented Apr 18, 2026

Important Files Changed

Uh oh!

greptile-apps Bot Apr 18, 2026

Uh oh!

codspeed-hq Bot commented Apr 18, 2026

Uh oh!

codecov Bot commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	non_cached_tokens = usage["prompt_tokens"] - cached_tokens - cache_creation_tokens
	non_cached_tokens = max(0, usage["prompt_tokens"] - cached_tokens - cache_creation_tokens)

Uh oh!

Conversation

VANDRANKI commented Apr 18, 2026

What

Why

Fix

Test

Uh oh!

greptile-apps Bot commented Apr 18, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

codspeed-hq Bot commented Apr 18, 2026

Merging this PR will not alter performance

Uh oh!

codecov Bot commented Apr 18, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant