feat(prometheus): emit per-token-type detail metrics (LIT-3220) (#28372)#28378
Conversation
Adds five sparse counter metrics that break out the token detail fields providers already report in `usage.prompt_tokens_details` and `usage.completion_tokens_details`: - litellm_input_cached_tokens_metric (provider prompt-cache reads) - litellm_input_cache_creation_tokens_metric (Anthropic prompt-cache writes) - litellm_input_audio_tokens_metric (audio input tokens) - litellm_output_reasoning_tokens_metric (reasoning tokens) - litellm_output_audio_tokens_metric (audio output tokens) These are additive — existing input/output/total counters are unchanged, so no dashboards break. Each new counter is only incremented when the underlying detail is populated and > 0, keeping scrape output sparse for providers that don't report a given field. Data is read from the canonical Usage dict that `get_standard_logging_object_payload` already attaches at `standard_logging_payload["metadata"]["usage_object"]`, so no new plumbing through the logging pipeline is required. Tests: 10 new unit tests covering registration, label-set parity, all-types increment, zero/None/negative skip behaviour, and the no-metadata/no-usage_object no-op paths. Closes LIT-3220 Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Krrish Dholakia <krrishdholakia@berri.ai> Co-authored-by: Claude <noreply@anthropic.com>
|
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR adds five sparse Prometheus counters that expose per-token-type details — cached, cache-creation, audio input, reasoning, and audio output tokens — reading directly from
Confidence Score: 4/5Safe to merge; changes are purely additive and isolated to the Prometheus integration layer with no impact on existing counters or other logging paths. The implementation is clean and the existing counters are untouched. The one non-trivial edge case is that usage_object prompt_tokens_details could arrive as a Pydantic model rather than a plain dict, and the current or-empty-dict fallback won't replace a truthy Pydantic object, causing those three input-detail metrics to be silently skipped. In practice this path is unlikely because JSON-parsed usage dicts contain plain dict values, but the defensive fix is straightforward. litellm/integrations/prometheus.py — the prompt/completion detail dict normalisation in _increment_token_detail_metrics
|
| Filename | Overview |
|---|---|
| litellm/integrations/prometheus.py | Adds five new counter metrics and the _increment_token_detail_metrics helper; correctly wired into the success path with defensive null/type guards |
| litellm/types/integrations/prometheus.py | Adds five new metric names to DEFINED_PROMETHEUS_METRICS and reuses existing input/output label sets in PrometheusMetricLabels; no issues |
| tests/test_litellm/integrations/test_prometheus_token_detail_metrics.py | 10 mock-based unit tests covering registration, all-types increment, zero/None/negative skip, and no-metadata/no-usage_object no-op paths; assertions verify inc() value but not label kwargs |
Reviews (1): Last reviewed commit: "feat(prometheus): emit per-token-type de..." | Re-trigger Greptile
| prompt_details = usage_object.get("prompt_tokens_details") or {} | ||
| completion_details = usage_object.get("completion_tokens_details") or {} |
There was a problem hiding this comment.
The
isinstance(prompt_details, dict) guard is duplicated in every tuple entry, but prompt_details is already assigned above. If usage_object["prompt_tokens_details"] is a non-empty Pydantic model (truthy), or {} won't replace it, prompt_details becomes a Pydantic object, and all three input-detail metrics are silently skipped. Normalising to a plain dict once at assignment avoids this silent data-loss edge case and removes the repeated inline guards.
| prompt_details = usage_object.get("prompt_tokens_details") or {} | |
| completion_details = usage_object.get("completion_tokens_details") or {} | |
| _pd = usage_object.get("prompt_tokens_details") or {} | |
| prompt_details: dict = _pd if isinstance(_pd, dict) else {} | |
| _cd = usage_object.get("completion_tokens_details") or {} | |
| completion_details: dict = _cd if isinstance(_cd, dict) else {} |
| payload = { | ||
| "metadata": { | ||
| "usage_object": { | ||
| "prompt_tokens_details": { | ||
| "cached_tokens": 0, | ||
| "cache_creation_tokens": 0, | ||
| "audio_tokens": 0, | ||
| }, | ||
| "completion_tokens_details": { | ||
| "reasoning_tokens": 0, | ||
| "audio_tokens": 0, | ||
| }, | ||
| } | ||
| } | ||
| } | ||
|
|
||
| PrometheusLogger._increment_token_detail_metrics( | ||
| logger, | ||
| standard_logging_payload=payload, | ||
| enum_values=sample_enum_values, | ||
| ) |
There was a problem hiding this comment.
Label kwargs not verified in increment tests
The assertions confirm that inc was called with the right amount, but do not verify that labels was called with the expected label key-value pairs (e.g., matching sample_enum_values). A regression that accidentally passes empty or wrong labels to counter.labels(...) would still pass these tests. Inspecting labels.call_args in at least one test case would catch label-wiring regressions.
14c0a2b
into
litellm_internal_staging
Adds five sparse counter metrics that break out the token detail fields providers already report in
usage.prompt_tokens_detailsandusage.completion_tokens_details:These are additive — existing input/output/total counters are unchanged, so no dashboards break. Each new counter is only incremented when the underlying detail is populated and > 0, keeping scrape output sparse for providers that don't report a given field.
Data is read from the canonical Usage dict that
get_standard_logging_object_payloadalready attaches atstandard_logging_payload["metadata"]["usage_object"], so no new plumbing through the logging pipeline is required.Tests: 10 new unit tests covering registration, label-set parity, all-types increment, zero/None/negative skip behaviour, and the no-metadata/no-usage_object no-op paths.
Closes LIT-3220
Relevant issues
Linear ticket
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes