feat(prometheus): emit per-token-type detail metrics (LIT-3220) (#28372) by ishaan-berri · Pull Request #28378 · BerriAI/litellm

ishaan-berri · 2026-05-20T18:39:31Z

Adds five sparse counter metrics that break out the token detail fields providers already report in usage.prompt_tokens_details and usage.completion_tokens_details:

litellm_input_cached_tokens_metric (provider prompt-cache reads)
litellm_input_cache_creation_tokens_metric (Anthropic prompt-cache writes)
litellm_input_audio_tokens_metric (audio input tokens)
litellm_output_reasoning_tokens_metric (reasoning tokens)
litellm_output_audio_tokens_metric (audio output tokens)

These are additive — existing input/output/total counters are unchanged, so no dashboards break. Each new counter is only incremented when the underlying detail is populated and > 0, keeping scrape output sparse for providers that don't report a given field.

Data is read from the canonical Usage dict that
get_standard_logging_object_payload already attaches at standard_logging_payload["metadata"]["usage_object"], so no new plumbing through the logging pipeline is required.

Tests: 10 new unit tests covering registration, label-set parity, all-types increment, zero/None/negative skip behaviour, and the no-metadata/no-usage_object no-op paths.

Closes LIT-3220

Relevant issues

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Adds five sparse counter metrics that break out the token detail fields providers already report in `usage.prompt_tokens_details` and `usage.completion_tokens_details`: - litellm_input_cached_tokens_metric (provider prompt-cache reads) - litellm_input_cache_creation_tokens_metric (Anthropic prompt-cache writes) - litellm_input_audio_tokens_metric (audio input tokens) - litellm_output_reasoning_tokens_metric (reasoning tokens) - litellm_output_audio_tokens_metric (audio output tokens) These are additive — existing input/output/total counters are unchanged, so no dashboards break. Each new counter is only incremented when the underlying detail is populated and > 0, keeping scrape output sparse for providers that don't report a given field. Data is read from the canonical Usage dict that `get_standard_logging_object_payload` already attaches at `standard_logging_payload["metadata"]["usage_object"]`, so no new plumbing through the logging pipeline is required. Tests: 10 new unit tests covering registration, label-set parity, all-types increment, zero/None/negative skip behaviour, and the no-metadata/no-usage_object no-op paths. Closes LIT-3220 Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Krrish Dholakia <krrishdholakia@berri.ai> Co-authored-by: Claude <noreply@anthropic.com>

CLAassistant · 2026-05-20T18:39:38Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ishaan-jaff
❌ oss-agent-shin
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2026-05-20T18:43:25Z

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/integrations/prometheus.py	94.44%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-20T18:44:19Z

Greptile Summary

This PR adds five sparse Prometheus counters that expose per-token-type details — cached, cache-creation, audio input, reasoning, and audio output tokens — reading directly from standard_logging_payload["metadata"]["usage_object"] without touching the existing input/output/total counters.

Five new counters registered in __init__, typed in DEFINED_PROMETHEUS_METRICS, and labelled identically to the parent input/output metrics so dashboards can join across them.
_increment_token_detail_metrics only increments a counter when the underlying value is a positive number, keeping scrape output sparse for providers that don't report these details.
10 unit tests cover registration, label-set parity, all-types increment, and zero/None/negative/no-metadata no-op paths.

Confidence Score: 4/5

Safe to merge; changes are purely additive and isolated to the Prometheus integration layer with no impact on existing counters or other logging paths.

The implementation is clean and the existing counters are untouched. The one non-trivial edge case is that usage_object prompt_tokens_details could arrive as a Pydantic model rather than a plain dict, and the current or-empty-dict fallback won't replace a truthy Pydantic object, causing those three input-detail metrics to be silently skipped. In practice this path is unlikely because JSON-parsed usage dicts contain plain dict values, but the defensive fix is straightforward.

litellm/integrations/prometheus.py — the prompt/completion detail dict normalisation in _increment_token_detail_metrics

Important Files Changed

Filename	Overview
litellm/integrations/prometheus.py	Adds five new counter metrics and the _increment_token_detail_metrics helper; correctly wired into the success path with defensive null/type guards
litellm/types/integrations/prometheus.py	Adds five new metric names to DEFINED_PROMETHEUS_METRICS and reuses existing input/output label sets in PrometheusMetricLabels; no issues
tests/test_litellm/integrations/test_prometheus_token_detail_metrics.py	10 mock-based unit tests covering registration, all-types increment, zero/None/negative skip, and no-metadata/no-usage_object no-op paths; assertions verify inc() value but not label kwargs

_{Reviews (1): Last reviewed commit: "feat(prometheus): emit per-token-type de..." | Re-trigger Greptile}

greptile-apps · 2026-05-20T18:44:23Z

+        prompt_details = usage_object.get("prompt_tokens_details") or {}
+        completion_details = usage_object.get("completion_tokens_details") or {}


The isinstance(prompt_details, dict) guard is duplicated in every tuple entry, but prompt_details is already assigned above. If usage_object["prompt_tokens_details"] is a non-empty Pydantic model (truthy), or {} won't replace it, prompt_details becomes a Pydantic object, and all three input-detail metrics are silently skipped. Normalising to a plain dict once at assignment avoids this silent data-loss edge case and removes the repeated inline guards.

Suggested change

prompt_details = usage_object.get("prompt_tokens_details") or {}

completion_details = usage_object.get("completion_tokens_details") or {}

_pd = usage_object.get("prompt_tokens_details") or {}

prompt_details: dict = _pd if isinstance(_pd, dict) else {}

_cd = usage_object.get("completion_tokens_details") or {}

completion_details: dict = _cd if isinstance(_cd, dict) else {}

greptile-apps · 2026-05-20T18:44:24Z

+        payload = {
+            "metadata": {
+                "usage_object": {
+                    "prompt_tokens_details": {
+                        "cached_tokens": 0,
+                        "cache_creation_tokens": 0,
+                        "audio_tokens": 0,
+                    },
+                    "completion_tokens_details": {
+                        "reasoning_tokens": 0,
+                        "audio_tokens": 0,
+                    },
+                }
+            }
+        }
+
+        PrometheusLogger._increment_token_detail_metrics(
+            logger,
+            standard_logging_payload=payload,
+            enum_values=sample_enum_values,
+        )


Label kwargs not verified in increment tests

The assertions confirm that inc was called with the right amount, but do not verify that labels was called with the expected label key-value pairs (e.g., matching sample_enum_values). A regression that accidentally passes empty or wrong labels to counter.labels(...) would still pass these tests. Inspecting labels.call_args in at least one test case would catch label-wiring regressions.

greptile-apps Bot reviewed May 20, 2026

View reviewed changes

chore: remove proof folder image

8aca429

yuneng-berri approved these changes May 21, 2026

View reviewed changes

ishaan-berri enabled auto-merge (squash) May 21, 2026 18:45

ishaan-berri merged commit 14c0a2b into litellm_internal_staging May 23, 2026
113 of 116 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(prometheus): emit per-token-type detail metrics (LIT-3220) (#28372)#28378

feat(prometheus): emit per-token-type detail metrics (LIT-3220) (#28372)#28378
ishaan-berri merged 2 commits into
litellm_internal_stagingfrom
litellm_shin_may20

ishaan-berri commented May 20, 2026

Uh oh!

CLAassistant commented May 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 20, 2026

Important Files Changed

Uh oh!

greptile-apps Bot May 20, 2026

Uh oh!

greptile-apps Bot May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		prompt_details = usage_object.get("prompt_tokens_details") or {}
		completion_details = usage_object.get("completion_tokens_details") or {}

-        prompt_details = usage_object.get("prompt_tokens_details") or {}
-        completion_details = usage_object.get("completion_tokens_details") or {}
+        _pd = usage_object.get("prompt_tokens_details") or {}
+        prompt_details: dict = _pd if isinstance(_pd, dict) else {}
+        _cd = usage_object.get("completion_tokens_details") or {}
+        completion_details: dict = _cd if isinstance(_cd, dict) else {}

Uh oh!

Conversation

ishaan-berri commented May 20, 2026

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

CLAassistant commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented May 20, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented May 20, 2026 •

edited

Loading

codecov Bot commented May 20, 2026 •

edited

Loading