Skip to content

feat(prometheus): emit per-token-type detail metrics (LIT-3220) (#28372)#28378

Merged
ishaan-berri merged 2 commits into
litellm_internal_stagingfrom
litellm_shin_may20
May 23, 2026
Merged

feat(prometheus): emit per-token-type detail metrics (LIT-3220) (#28372)#28378
ishaan-berri merged 2 commits into
litellm_internal_stagingfrom
litellm_shin_may20

Conversation

@ishaan-berri

Copy link
Copy Markdown
Contributor

Adds five sparse counter metrics that break out the token detail fields providers already report in usage.prompt_tokens_details and usage.completion_tokens_details:

  • litellm_input_cached_tokens_metric (provider prompt-cache reads)
  • litellm_input_cache_creation_tokens_metric (Anthropic prompt-cache writes)
  • litellm_input_audio_tokens_metric (audio input tokens)
  • litellm_output_reasoning_tokens_metric (reasoning tokens)
  • litellm_output_audio_tokens_metric (audio output tokens)

These are additive — existing input/output/total counters are unchanged, so no dashboards break. Each new counter is only incremented when the underlying detail is populated and > 0, keeping scrape output sparse for providers that don't report a given field.

Data is read from the canonical Usage dict that
get_standard_logging_object_payload already attaches at standard_logging_payload["metadata"]["usage_object"], so no new plumbing through the logging pipeline is required.

Tests: 10 new unit tests covering registration, label-set parity, all-types increment, zero/None/negative skip behaviour, and the no-metadata/no-usage_object no-op paths.

Closes LIT-3220

Relevant issues

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Adds five sparse counter metrics that break out the token detail
fields providers already report in `usage.prompt_tokens_details` and
`usage.completion_tokens_details`:

  - litellm_input_cached_tokens_metric            (provider prompt-cache reads)
  - litellm_input_cache_creation_tokens_metric    (Anthropic prompt-cache writes)
  - litellm_input_audio_tokens_metric             (audio input tokens)
  - litellm_output_reasoning_tokens_metric        (reasoning tokens)
  - litellm_output_audio_tokens_metric            (audio output tokens)

These are additive — existing input/output/total counters are
unchanged, so no dashboards break. Each new counter is only
incremented when the underlying detail is populated and > 0, keeping
scrape output sparse for providers that don't report a given field.

Data is read from the canonical Usage dict that
`get_standard_logging_object_payload` already attaches at
`standard_logging_payload["metadata"]["usage_object"]`, so no new
plumbing through the logging pipeline is required.

Tests: 10 new unit tests covering registration, label-set parity,
all-types increment, zero/None/negative skip behaviour, and the
no-metadata/no-usage_object no-op paths.

Closes LIT-3220

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Krrish Dholakia <krrishdholakia@berri.ai>
Co-authored-by: Claude <noreply@anthropic.com>
@CLAassistant

CLAassistant commented May 20, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ishaan-jaff
❌ oss-agent-shin
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov

codecov Bot commented May 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/integrations/prometheus.py 94.44% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@greptile-apps

greptile-apps Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds five sparse Prometheus counters that expose per-token-type details — cached, cache-creation, audio input, reasoning, and audio output tokens — reading directly from standard_logging_payload["metadata"]["usage_object"] without touching the existing input/output/total counters.

  • Five new counters registered in __init__, typed in DEFINED_PROMETHEUS_METRICS, and labelled identically to the parent input/output metrics so dashboards can join across them.
  • _increment_token_detail_metrics only increments a counter when the underlying value is a positive number, keeping scrape output sparse for providers that don't report these details.
  • 10 unit tests cover registration, label-set parity, all-types increment, and zero/None/negative/no-metadata no-op paths.

Confidence Score: 4/5

Safe to merge; changes are purely additive and isolated to the Prometheus integration layer with no impact on existing counters or other logging paths.

The implementation is clean and the existing counters are untouched. The one non-trivial edge case is that usage_object prompt_tokens_details could arrive as a Pydantic model rather than a plain dict, and the current or-empty-dict fallback won't replace a truthy Pydantic object, causing those three input-detail metrics to be silently skipped. In practice this path is unlikely because JSON-parsed usage dicts contain plain dict values, but the defensive fix is straightforward.

litellm/integrations/prometheus.py — the prompt/completion detail dict normalisation in _increment_token_detail_metrics

Important Files Changed

Filename Overview
litellm/integrations/prometheus.py Adds five new counter metrics and the _increment_token_detail_metrics helper; correctly wired into the success path with defensive null/type guards
litellm/types/integrations/prometheus.py Adds five new metric names to DEFINED_PROMETHEUS_METRICS and reuses existing input/output label sets in PrometheusMetricLabels; no issues
tests/test_litellm/integrations/test_prometheus_token_detail_metrics.py 10 mock-based unit tests covering registration, all-types increment, zero/None/negative skip, and no-metadata/no-usage_object no-op paths; assertions verify inc() value but not label kwargs

Reviews (1): Last reviewed commit: "feat(prometheus): emit per-token-type de..." | Re-trigger Greptile

Comment on lines +1383 to +1384
prompt_details = usage_object.get("prompt_tokens_details") or {}
completion_details = usage_object.get("completion_tokens_details") or {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The isinstance(prompt_details, dict) guard is duplicated in every tuple entry, but prompt_details is already assigned above. If usage_object["prompt_tokens_details"] is a non-empty Pydantic model (truthy), or {} won't replace it, prompt_details becomes a Pydantic object, and all three input-detail metrics are silently skipped. Normalising to a plain dict once at assignment avoids this silent data-loss edge case and removes the repeated inline guards.

Suggested change
prompt_details = usage_object.get("prompt_tokens_details") or {}
completion_details = usage_object.get("completion_tokens_details") or {}
_pd = usage_object.get("prompt_tokens_details") or {}
prompt_details: dict = _pd if isinstance(_pd, dict) else {}
_cd = usage_object.get("completion_tokens_details") or {}
completion_details: dict = _cd if isinstance(_cd, dict) else {}

Comment on lines +155 to +175
payload = {
"metadata": {
"usage_object": {
"prompt_tokens_details": {
"cached_tokens": 0,
"cache_creation_tokens": 0,
"audio_tokens": 0,
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
},
}
}
}

PrometheusLogger._increment_token_detail_metrics(
logger,
standard_logging_payload=payload,
enum_values=sample_enum_values,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Label kwargs not verified in increment tests

The assertions confirm that inc was called with the right amount, but do not verify that labels was called with the expected label key-value pairs (e.g., matching sample_enum_values). A regression that accidentally passes empty or wrong labels to counter.labels(...) would still pass these tests. Inspecting labels.call_args in at least one test case would catch label-wiring regressions.

@ishaan-berri ishaan-berri enabled auto-merge (squash) May 21, 2026 18:45
@ishaan-berri ishaan-berri merged commit 14c0a2b into litellm_internal_staging May 23, 2026
113 of 116 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants