feat(otel): OTel-standard attributes on the proxy SERVER span (status code, route/path, preprocessing latency) by ryan-crabbe-berri · Pull Request #28040 · BerriAI/litellm

ryan-crabbe-berri · 2026-05-16T06:12:44Z

Summary

Adds the three OTel-standard attributes the FIL telemetry ask needs, all on the proxy SERVER span (Received Proxy Server Request) — the logging handlers write a child span, so each is set where the SERVER span is in hand (auth path / post-call hooks). Squash-and-merge.

http.response.status_code (int) — on failures; legacy error.code kept
http.route (route template) + url.path (literal path)
litellm.preprocessing.duration_ms — proxy-receive → first provider handoff (excludes retries)

Screenshots

error curl

success curl

error trace

success trace

Test plan

Automated: test_opentelemetry.py, test_auth_utils.py, test_litellm_logging.py — 447 passed, no regression. Lint (Black/Ruff/MyPy) clean.
Manual: console-exporter run; all three attributes land on one SERVER span for both success and failure.

Resolves LIT-3086

Set the OTel-standard http.response.status_code (integer) on failure spans alongside the existing OpenInference error.code (kept for back-compat). error.type is already emitted via ERROR_TYPE. Crucially, also record structured error attributes on the proxy SERVER span ('Received Proxy Server Request') from async_post_call_failure_hook - the only place the SERVER span is in hand. _handle_failure records on the litellm_request child span (the parent span is not propagated into its kwargs), so prior to this change the SERVER span that dashboards query carried only span status, never error.code/error.type. Reuses _record_exception_on_span + StandardLoggingPayloadSetup.get_error_information so values match the child span. Tests: recorder unit coverage + a hook-driven test asserting the SERVER span is stamped (the gap recorder-only tests missed). Full test_opentelemetry.py suite: 197 passed.

codecov · 2026-05-16T06:16:35Z

Codecov Report

❌ Patch coverage is 89.70588% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/integrations/opentelemetry.py	90.90%	4 Missing ⚠️
litellm/proxy/auth/user_api_key_auth.py	40.00%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-16T06:16:55Z

Greptile Summary

Adds three OTel-standard attributes to the proxy SERVER span (Received Proxy Server Request): http.response.status_code (int) on failures, http.route + url.path stamped in the auth path, and litellm.preprocessing.duration_ms (proxy-receive → first provider handoff, retries excluded) on both success and failure paths.

Error attributes (http.response.status_code, error.type, legacy error.code) are now set directly on the SERVER span via async_post_call_failure_hook, which was previously only marking span status without structured attributes. The http.response.status_code coerces the string error_code to int and silently omits it for non-numeric codes.
Route attributes are set immediately after the SERVER span is created in the auth builder — the only point where both the span and the full Request object are available together.
Preprocessing duration is anchored by a set-once first_api_call_start_time (written on the first pre_call only, not on retries) and a litellm_received_at datetime propagated through request state → internal metadata → failure-path top-level key, with careful guard rails to avoid injecting datetimes into user-facing metadata.

Confidence Score: 5/5

Additive-only changes to the OTel layer; all new code is guarded with None checks and broad exception handlers so a missing span, missing timestamp, or non-numeric error code degrades gracefully to a no-op rather than surfacing an error.

The three new attributes are set on spans that already exist, the timing anchors are propagated through channels that already carry similar internal metadata (headers, parent_otel_span), and all edge cases (pre-auth failure, missing logging obj, clock skew, non-numeric error code, None span) have explicit tests. No auth logic, routing logic, or DB access is modified.

No files require special attention. The most non-trivial logic is in set_preprocessing_duration_attribute and the lift-before-pop in proxy/utils.py, both of which are directly covered by the new tests.

Important Files Changed

Filename	Overview
litellm/integrations/opentelemetry.py	Adds four module-level attribute-name constants, exposes http.response.status_code (as int) in _record_exception_on_span, stamps error attrs + preprocessing duration on the SERVER span in async_post_call_failure_hook, adds the same duration in async_post_call_success_hook, and introduces two new well-guarded helpers (set_proxy_request_route_attributes, set_preprocessing_duration_attribute).
litellm/litellm_core_utils/litellm_logging.py	Adds set-once first_api_call_start_time to model_call_details in pre_call(); deliberately does not touch litellm_params["metadata"] to avoid echoing a datetime into provider request bodies or batch objects.
litellm/proxy/auth/user_api_key_auth.py	Captures the true proxy-receive instant into request.state immediately at the top of _user_api_key_auth_builder, then calls set_proxy_request_route_attributes on the freshly-created SERVER span inside the existing open_telemetry_logger guard.
litellm/proxy/litellm_pre_call_utils.py	Propagates litellm_received_at (datetime or None) from request.state into the internal metadata dict via the same channel as endpoint/headers, making it accessible to the OTel layer for preprocessing latency.
litellm/proxy/utils.py	Lifts first_api_call_start_time off the logging object into the top level of request_data before the non-serialisable logging object is popped, so failure-path OTel callbacks can still compute preprocessing latency.

_{Reviews (3): Last reviewed commit: "Merge remote-tracking branch 'origin/lit..." | Re-trigger Greptile}

Add the OTel-standard http.route (low-cardinality route template, e.g. /v1/threads/{thread_id}/runs) and url.path (literal path) to the SERVER span ('Received Proxy Server Request') so dashboards can group traffic by endpoint instead of seeing every path param as a unique value. Same architectural gap as the status-code commit: the success/failure logging handlers write the litellm_request CHILD span, and _handle_success explicitly refuses to copy to the SERVER span. Verified with a console-exporter run that the SERVER span was bare on success. Unlike error info, route/path are known at request time, so set them directly on the freshly-created SERVER span in user_api_key_auth (one edit point, works for success and failure, no hook-ordering risk): - http.route from the matched FastAPI route (scope['route'].path), empirically confirmed populated at auth-dependency time. - url.path from the existing literal-path variable. New get_request_route_template helper + set_proxy_request_route_attributes (no-op on None span, so the Langfuse override stays safe). Tests: route-attribute setter + route-template helper edges. Full test_opentelemetry.py and test_auth_utils.py green.

… span Expose the total time LiteLLM spends before the upstream provider request begins (auth + parsing + pre-call hooks) as a single number on the SERVER span ('Received Proxy Server Request'). Window: proxy-receive -> FIRST provider handoff. Retry semantics: first attempt only (pure preprocessing, excludes retry loops + backoff). api_call_start_time is overwritten on every attempt, so a set-once first_api_call_start_time pins the first handoff. Same architectural gap as the prior two commits: the success/failure logging handlers write the litellm_request CHILD span, not the SERVER span. Set it instead from the post-call hooks on user_api_key_dict.parent_otel_span. Failure-path subtlety: request_data.pop('litellm_logging_obj') runs before the failure-hook loop, so the failure hook can't read the logging object. litellm_received_at is propagated via the existing request->metadata channel, and first_api_call_start_time is mirrored onto litellm_params.metadata, so both anchors survive into request_data and the OTel helper reads them uniformly for success and failure. Edits: user_api_key_auth (stash receive instant), litellm_pre_call_utils (propagate it), litellm_logging (set-once first handoff + metadata mirror), opentelemetry (constant + set_preprocessing_duration_attribute, called from both post-call hooks). Tests: duration helper (both container shapes, missing/negative/None edges) + set-once invariant (retry doesn't overwrite, metadata mirror). test_opentelemetry.py + test_auth_utils.py + test_litellm_logging.py: 447 passed. Verified live: SERVER span carries the attribute on success and failure, coexisting with the status-code and route attributes.

No behavior change. MyPy (CI lint) flagged: - error_information["error_code"] is str|None: narrow via a None-checked local before int(). - _to_timestamp returns Optional[float]: resolve both anchors and return early if either is None instead of subtracting possibly-None floats.

ryan-crabbe-berri · 2026-05-16T07:15:39Z

@greptileai re review

cursor

Cursor Bugbot has reviewed your changes using high mode and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 6bf8db3. Configure here.}

cursor · 2026-05-16T07:20:00Z

+                # logging object directly).
+                _lp = self.model_call_details.get("litellm_params")
+                if isinstance(_lp, dict) and isinstance(_lp.get("metadata"), dict):
+                    _lp["metadata"]["first_api_call_start_time"] = _first_handoff


Preprocessing duration lost on thread/assistant failure path

Low Severity

For thread/assistant endpoints, first_api_call_start_time is written to a copy of litellm_params["metadata"] instead of the original request_data["litellm_metadata"]. This prevents set_preprocessing_duration_attribute from finding the start time on the failure path, causing the preprocessing duration to be omitted.

Additional Locations (1)

litellm/integrations/opentelemetry.py#L2995-L3005

^{Reviewed by Cursor Bugbot for commit 6bf8db3. Configure here.}

…tart_time The PR3 set-once preprocessing anchor was mirrored into litellm_params["metadata"] from core litellm_logging.py. That dict is the caller's request metadata, mutated in place and shared across every call path including pure SDK (litellm.acreate_batch). It got echoed into LiteLLMBatch(metadata=...), which the OpenAI batch schema types as Dict[str, str] -> pydantic ValidationError on a datetime value. - litellm_logging.py: set first_api_call_start_time only on model_call_details (success path reads it there directly). - proxy/utils.py: post_call_failure_hook lifts it off the logging object into request_data (internal top-level key, same convention as the other proxy-internal request_data keys) right before the existing litellm_logging_obj pop. Never touches user metadata. - opentelemetry.py: read the anchor from the container top level (model_call_details on success, request_data on failure). - Tests updated; add TestPostCallFailureHookLiftsFirstApiCallStartTime. Fixes the batches_testing regression introduced on this branch.

cursor · 2026-05-16T17:37:51Z

Bugbot is paused — on-demand spend limit reached

Bugbot uses usage-based billing for this team and has hit its on-demand spend limit.

A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue.

…itellm_otel_status_code_attr

ryan-crabbe-berri · 2026-05-16T17:56:56Z

@greptile re review

Collapse multi-line why-blocks to one or two lines and drop process/plan references (PR-numbering, "the plan") from test comments. No behavior change.

ryan-crabbe-berri changed the title ~~feat(otel): expose http.response.status_code on failure spans (incl. SERVER span)~~ feat(otel): OTel-standard HTTP attributes on the proxy SERVER span (status code, http.route, url.path) May 16, 2026

ryan-crabbe-berri changed the title ~~feat(otel): OTel-standard HTTP attributes on the proxy SERVER span (status code, http.route, url.path)~~ feat(otel): OTel-standard attributes on the proxy SERVER span (status code, route/path, preprocessing latency) May 16, 2026

cursor Bot reviewed May 16, 2026

View reviewed changes

Merge remote-tracking branch 'origin/litellm_internal_staging' into l…

900397e

…itellm_otel_status_code_attr

chore(otel): trim verbose comments to concise rationale

cea9224

Collapse multi-line why-blocks to one or two lines and drop process/plan references (PR-numbering, "the plan") from test comments. No behavior change.

yassin-berriai approved these changes May 16, 2026

View reviewed changes

yassin-berriai merged commit 0300333 into litellm_internal_staging May 16, 2026
112 of 114 checks passed

yassin-berriai deleted the litellm_otel_status_code_attr branch May 16, 2026 20:45

ryan-crabbe-berri mentioned this pull request May 16, 2026

feat(otel): set http.response.status_code on the success SERVER span #28090

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(otel): OTel-standard attributes on the proxy SERVER span (status code, route/path, preprocessing latency)#28040

feat(otel): OTel-standard attributes on the proxy SERVER span (status code, route/path, preprocessing latency)#28040
yassin-berriai merged 7 commits into
litellm_internal_stagingfrom
litellm_otel_status_code_attr

ryan-crabbe-berri commented May 16, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 16, 2026 •

edited

Loading

Important Files Changed

Uh oh!

ryan-crabbe-berri commented May 16, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 16, 2026

Uh oh!

cursor Bot commented May 16, 2026

Uh oh!

ryan-crabbe-berri commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ryan-crabbe-berri commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Screenshots

Test plan

Uh oh!

codecov Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

ryan-crabbe-berri commented May 16, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 16, 2026

Choose a reason for hiding this comment

Preprocessing duration lost on thread/assistant failure path

Uh oh!

cursor Bot commented May 16, 2026

Bugbot is paused — on-demand spend limit reached

Uh oh!

ryan-crabbe-berri commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryan-crabbe-berri commented May 16, 2026 •

edited

Loading

codecov Bot commented May 16, 2026 •

edited

Loading

greptile-apps Bot commented May 16, 2026 •

edited

Loading