fix: completion_cost AttributeError on streaming Anthropic web_search responses (#26153)#27346
Conversation
…ropic web search test
|
|
Greptile SummaryFixes
Confidence Score: 5/5Safe to merge — changes are narrowly scoped to the streaming cost-calculation path with no effect on request handling or auth. The fix touches only the streaming chunk accumulator and two cost-calculation call sites. The defensive helper is trivially correct, the coercion in the chunk processor is well-guarded with isinstance checks, and the existing Usage.init dict→ServerToolUse coercion already provided a backstop at the final round-trip. The modified existing test is strictly stronger (type check added), new tests cover None/dict/pydantic inputs end-to-end, and no real-network calls are made. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/llm_cost_calc/utils.py | Added _get_web_search_requests helper that safely reads web_search_requests from None, dict, or ServerToolUse — canonical definition used by both downstream modules. |
| litellm/litellm_core_utils/streaming_chunk_builder_utils.py | Coerces incoming dict/arbitrary server_tool_use to ServerToolUse pydantic during chunk accumulation, fixing the root cause of the AttributeError. |
| litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py | Two attribute-access guard chains replaced with _get_web_search_requests helper; logic is equivalent and now dict-safe. |
| litellm/llms/anthropic/cost_calculation.py | Replaced chained usage.server_tool_use.web_search_requests attribute access with _get_web_search_requests and getattr guard; no logic change. |
| tests/test_litellm/litellm_core_utils/test_streaming_chunk_builder_utils.py | Existing test strengthened: dict-index assertion replaced with explicit isinstance(ServerToolUse) + attribute access, correctly catching the regression. |
| tests/test_litellm/litellm_core_utils/test_streaming_chunk_builder_server_tool_use.py | New regression test file; exercises the full stream_chunk_builder → completion_cost path with a dict-shaped server_tool_use chunk. |
| tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking_dict_safety.py | New unit tests for dict/pydantic/None handling in StandardBuiltInToolCostTracking; imports _get_web_search_requests via tool_call_cost_tracking re-export rather than its canonical utils module. |
| tests/test_litellm/llms/anthropic/test_cost_calculation_dict_safety.py | New unit tests for get_cost_for_anthropic_web_search with dict/pydantic/None inputs; imports _get_web_search_requests via the cost_calculation re-export rather than its canonical utils module. |
Reviews (5): Last reviewed commit: "refactor(types): drop redundant server_t..." | Re-trigger Greptile
|
@ishaan-berri this fixes the prior regression (spend calculation no longer fails), but the cost of Anthropic Web Search is still not included into the total spend |
OpenAI shut down the gpt-4o-realtime-preview family (incl. the undated alias) on 2026-05-07, causing the live realtime test to fail with a 4000 invalid_request_error.invalid_model close. gpt-realtime is the GA successor; switch the live-call tests to it, matching the base branch.
…itellm_fix-issue-26153
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
… responses (#26153) (#27346) Cherry-picked from staging squash 4a3860d. stable/1.88.x predates the Usage.__init__ server_tool_use dict->ServerToolUse coercion that staging carries (it landed via the squashed OSS sync #29932 / 32c88ca, not as a standalone commit). The calculate_usage Usage(**returned_usage.model_dump()) round-trip on this line re-serializes server_tool_use to a plain dict, so without that coercion the rebuilt usage holds a dict and the regression test asserting a ServerToolUse type fails. Restored the coercion in litellm/types/utils.py to satisfy the prerequisite -- it matches #27346's own first commit (coerce server_tool_use dict to ServerToolUse in Usage.__init__), which was dropped from the squash only because staging already carried it.
… responses (#26153) (#27346) Cherry-picked from staging squash 4a3860d. The rc line predates the Usage.__init__ server_tool_use dict->ServerToolUse coercion that staging carries (it landed via the squashed OSS sync #29932 / 32c88ca, not as a standalone commit). The calculate_usage Usage(**returned_usage.model_dump()) round-trip re-serializes server_tool_use to a plain dict, so without that coercion the rebuilt usage holds a dict and the regression test asserting a ServerToolUse type fails. Restored the coercion in litellm/types/utils.py to satisfy the prerequisite -- it matches #27346's own first commit (coerce server_tool_use dict to ServerToolUse in Usage.__init__), which was dropped from the squash only because staging already carried it.
…AIDR, Mantle SigV4, NetApp streaming-cost fix, and team-scoped Datadog toward v1.89.0-rc.3 (#30179) * fix(proxy): authorize batch files using upload target_model_names (LIT-3593) (#30009) * fix(proxy): authorize batch files using upload target_model_names (LIT-3593) After replace_model_in_jsonl, body.model is a stripped provider id. Reverse-mapping it via resolve_model_name_from_model_id is first-match on model_list and caused false 403s when multiple deployments share the same stripped name. Use target_model_names from the unified file id instead. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593) Restores the reverse-lookup for the JSONL body.model fallback path so that legacy/pre-target_model_names managed files still map stripped provider IDs back to proxy aliases before auth. Also cleans up redundant `or None`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)" This reverts commit 30d2e96. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 2cd7e87) * feat(guardrails): capture user and model metadata in CrowdStrike AIDR (cherry picked from commit 6fc715c) * fix(guardrails): read CrowdStrike AIDR identity from both metadata bags (#29991) Capture user_id and extra_info from metadata or litellm_metadata. The single-bag read dropped identity whenever a request carried a present litellm_metadata field (null or a user-supplied dict), since /chat/completions routes the authenticated identity into metadata while the guardrail read litellm_metadata first (cherry picked from commit 1bbaf1c) * feat(bedrock_mantle): add SigV4/IAM auth to Responses API route (#29788) Applied as the squash diff of PR #29788 (head 9800b2f), which landed upstream inside the litellm_oss_staging_080626 sync (32c88ca, #29932) and has no standalone commit to cherry-pick. The rc line already carries the prerequisite #29490 Responses route via the 040626 sync. * fix: completion_cost AttributeError on streaming Anthropic web_search responses (#26153) (#27346) Cherry-picked from staging squash 4a3860d. The rc line predates the Usage.__init__ server_tool_use dict->ServerToolUse coercion that staging carries (it landed via the squashed OSS sync #29932 / 32c88ca, not as a standalone commit). The calculate_usage Usage(**returned_usage.model_dump()) round-trip re-serializes server_tool_use to a plain dict, so without that coercion the rebuilt usage holds a dict and the regression test asserting a ServerToolUse type fails. Restored the coercion in litellm/types/utils.py to satisfy the prerequisite -- it matches #27346's own first commit (coerce server_tool_use dict to ServerToolUse in Usage.__init__), which was dropped from the squash only because staging already carried it. * feat(datadog): add team-scoped Datadog callback support (#29947) Cherry-picked from the PR head 9c049da (single-commit PR, merged to litellm_oss_branch). Applied cleanly; no conflicts. Note: black --check in this worktree flags pre-existing multi-line string formatting in litellm_core_utils/litellm_logging.py (lines ~1006-1050) that is already present on the patch/v1.89.0-rc.1 base and is untouched by this pick -- left as-is to avoid reformatting unrelated lines. --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kenan Yildirim <kenan@kenany.me> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Kent <kingdooo@gmail.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com> Co-authored-by: aanchal22 <12680748+aanchal22@users.noreply.github.com>
… responses (BerriAI#26153) (BerriAI#27346) * fix: coerce server_tool_use dict to ServerToolUse in Usage.__init__ (BerriAI#26153) * fix: coerce server_tool_use to ServerToolUse in stream_chunk_builder (BerriAI#26153) * fix: dict/pydantic-tolerant access in tool_call_cost_tracking (BerriAI#26153) * fix: dict/pydantic-tolerant access in anthropic cost_calculation (BerriAI#26153) * test: assert ServerToolUse type in existing stream_chunk_builder anthropic web search test * test: regression test for BerriAI#26153 (stream_chunk_builder server_tool_use type) * test: dict/pydantic safety for tool_call_cost_tracking helper * test: dict/pydantic safety for anthropic web_search cost * refactor: consolidate _get_web_search_requests into shared cost-calc utils * test(realtime): use gpt-realtime; openai retired gpt-4o-realtime-preview OpenAI shut down the gpt-4o-realtime-preview family (incl. the undated alias) on 2026-05-07, causing the live realtime test to fail with a 4000 invalid_request_error.invalid_model close. gpt-realtime is the GA successor; switch the live-call tests to it, matching the base branch. * refactor(types): drop redundant server_tool_use coercion in Usage.__init__ --------- Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
… responses (BerriAI#26153) (BerriAI#27346) * fix: coerce server_tool_use dict to ServerToolUse in Usage.__init__ (BerriAI#26153) * fix: coerce server_tool_use to ServerToolUse in stream_chunk_builder (BerriAI#26153) * fix: dict/pydantic-tolerant access in tool_call_cost_tracking (BerriAI#26153) * fix: dict/pydantic-tolerant access in anthropic cost_calculation (BerriAI#26153) * test: assert ServerToolUse type in existing stream_chunk_builder anthropic web search test * test: regression test for BerriAI#26153 (stream_chunk_builder server_tool_use type) * test: dict/pydantic safety for tool_call_cost_tracking helper * test: dict/pydantic safety for anthropic web_search cost * refactor: consolidate _get_web_search_requests into shared cost-calc utils * test(realtime): use gpt-realtime; openai retired gpt-4o-realtime-preview OpenAI shut down the gpt-4o-realtime-preview family (incl. the undated alias) on 2026-05-07, causing the live realtime test to fail with a 4000 invalid_request_error.invalid_model close. gpt-realtime is the GA successor; switch the live-call tests to it, matching the base branch. * refactor(types): drop redundant server_tool_use coercion in Usage.__init__ --------- Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> (cherry picked from commit 4a3860d)
Prerequisite for BerriAI#27346 (completion_cost AttributeError on streaming Anthropic web_search). BerriAI#27346's per-chunk coercion is undone by the existing Usage(**returned_usage.model_dump()) reconstruction in calculate_usage, which round-trips server_tool_use back to a plain dict; without this Usage.__init__ coercion the cost path still does attribute access on a dict and raises. The same prerequisite was bundled into the stable/1.88.x backport of BerriAI#27346 (24b9655). Content-verified present on litellm_internal_staging via aggregator 32c88ca (Litellm oss staging 080626, BerriAI#29932); this restores only the two-line coercion, not the rest of that aggregator. (cherry picked from commit 32c88ca)
… responses (BerriAI#26153) (BerriAI#27346) * fix: coerce server_tool_use dict to ServerToolUse in Usage.__init__ (BerriAI#26153) * fix: coerce server_tool_use to ServerToolUse in stream_chunk_builder (BerriAI#26153) * fix: dict/pydantic-tolerant access in tool_call_cost_tracking (BerriAI#26153) * fix: dict/pydantic-tolerant access in anthropic cost_calculation (BerriAI#26153) * test: assert ServerToolUse type in existing stream_chunk_builder anthropic web search test * test: regression test for BerriAI#26153 (stream_chunk_builder server_tool_use type) * test: dict/pydantic safety for tool_call_cost_tracking helper * test: dict/pydantic safety for anthropic web_search cost * refactor: consolidate _get_web_search_requests into shared cost-calc utils * test(realtime): use gpt-realtime; openai retired gpt-4o-realtime-preview OpenAI shut down the gpt-4o-realtime-preview family (incl. the undated alias) on 2026-05-07, causing the live realtime test to fail with a 4000 invalid_request_error.invalid_model close. gpt-realtime is the GA successor; switch the live-call tests to it, matching the base branch. * refactor(types): drop redundant server_tool_use coercion in Usage.__init__ --------- Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
… responses (BerriAI#26153) (BerriAI#27346) * fix: coerce server_tool_use dict to ServerToolUse in Usage.__init__ (BerriAI#26153) * fix: coerce server_tool_use to ServerToolUse in stream_chunk_builder (BerriAI#26153) * fix: dict/pydantic-tolerant access in tool_call_cost_tracking (BerriAI#26153) * fix: dict/pydantic-tolerant access in anthropic cost_calculation (BerriAI#26153) * test: assert ServerToolUse type in existing stream_chunk_builder anthropic web search test * test: regression test for BerriAI#26153 (stream_chunk_builder server_tool_use type) * test: dict/pydantic safety for tool_call_cost_tracking helper * test: dict/pydantic safety for anthropic web_search cost * refactor: consolidate _get_web_search_requests into shared cost-calc utils * test(realtime): use gpt-realtime; openai retired gpt-4o-realtime-preview OpenAI shut down the gpt-4o-realtime-preview family (incl. the undated alias) on 2026-05-07, causing the live realtime test to fail with a 4000 invalid_request_error.invalid_model close. gpt-realtime is the GA successor; switch the live-call tests to it, matching the base branch. * refactor(types): drop redundant server_tool_use coercion in Usage.__init__ --------- Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
… responses (BerriAI#26153) (BerriAI#27346) * fix: coerce server_tool_use dict to ServerToolUse in Usage.__init__ (BerriAI#26153) * fix: coerce server_tool_use to ServerToolUse in stream_chunk_builder (BerriAI#26153) * fix: dict/pydantic-tolerant access in tool_call_cost_tracking (BerriAI#26153) * fix: dict/pydantic-tolerant access in anthropic cost_calculation (BerriAI#26153) * test: assert ServerToolUse type in existing stream_chunk_builder anthropic web search test * test: regression test for BerriAI#26153 (stream_chunk_builder server_tool_use type) * test: dict/pydantic safety for tool_call_cost_tracking helper * test: dict/pydantic safety for anthropic web_search cost * refactor: consolidate _get_web_search_requests into shared cost-calc utils * test(realtime): use gpt-realtime; openai retired gpt-4o-realtime-preview OpenAI shut down the gpt-4o-realtime-preview family (incl. the undated alias) on 2026-05-07, causing the live realtime test to fail with a 4000 invalid_request_error.invalid_model close. gpt-realtime is the GA successor; switch the live-call tests to it, matching the base branch. * refactor(types): drop redundant server_tool_use coercion in Usage.__init__ --------- Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
… responses (BerriAI#26153) (BerriAI#27346) * fix: coerce server_tool_use dict to ServerToolUse in Usage.__init__ (BerriAI#26153) * fix: coerce server_tool_use to ServerToolUse in stream_chunk_builder (BerriAI#26153) * fix: dict/pydantic-tolerant access in tool_call_cost_tracking (BerriAI#26153) * fix: dict/pydantic-tolerant access in anthropic cost_calculation (BerriAI#26153) * test: assert ServerToolUse type in existing stream_chunk_builder anthropic web search test * test: regression test for BerriAI#26153 (stream_chunk_builder server_tool_use type) * test: dict/pydantic safety for tool_call_cost_tracking helper * test: dict/pydantic safety for anthropic web_search cost * refactor: consolidate _get_web_search_requests into shared cost-calc utils * test(realtime): use gpt-realtime; openai retired gpt-4o-realtime-preview OpenAI shut down the gpt-4o-realtime-preview family (incl. the undated alias) on 2026-05-07, causing the live realtime test to fail with a 4000 invalid_request_error.invalid_model close. gpt-realtime is the GA successor; switch the live-call tests to it, matching the base branch. * refactor(types): drop redundant server_tool_use coercion in Usage.__init__ --------- Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
… responses (BerriAI#26153) (BerriAI#27346) * fix: coerce server_tool_use dict to ServerToolUse in Usage.__init__ (BerriAI#26153) * fix: coerce server_tool_use to ServerToolUse in stream_chunk_builder (BerriAI#26153) * fix: dict/pydantic-tolerant access in tool_call_cost_tracking (BerriAI#26153) * fix: dict/pydantic-tolerant access in anthropic cost_calculation (BerriAI#26153) * test: assert ServerToolUse type in existing stream_chunk_builder anthropic web search test * test: regression test for BerriAI#26153 (stream_chunk_builder server_tool_use type) * test: dict/pydantic safety for tool_call_cost_tracking helper * test: dict/pydantic safety for anthropic web_search cost * refactor: consolidate _get_web_search_requests into shared cost-calc utils * test(realtime): use gpt-realtime; openai retired gpt-4o-realtime-preview OpenAI shut down the gpt-4o-realtime-preview family (incl. the undated alias) on 2026-05-07, causing the live realtime test to fail with a 4000 invalid_request_error.invalid_model close. gpt-realtime is the GA successor; switch the live-call tests to it, matching the base branch. * refactor(types): drop redundant server_tool_use coercion in Usage.__init__ --------- Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> (cherry picked from commit 4a3860d)
Prerequisite for BerriAI#27346 (completion_cost AttributeError on streaming Anthropic web_search). BerriAI#27346's per-chunk coercion is undone by the existing Usage(**returned_usage.model_dump()) reconstruction in calculate_usage, which round-trips server_tool_use back to a plain dict; without this Usage.__init__ coercion the cost path still does attribute access on a dict and raises. The same prerequisite was bundled into the stable/1.88.x backport of BerriAI#27346 (24b9655). Content-verified present on litellm_internal_staging via aggregator 32c88ca (Litellm oss staging 080626, BerriAI#29932); this restores only the two-line coercion, not the rest of that aggregator. (cherry picked from commit 32c88ca)
Relevant issues
Fixes #26153
Linear ticket
(no Linear ticket — bug fix from external user)
Pre-Submission checklist
make test-unit@greptileaiCI (LiteLLM team)
Screenshots / Proof of Fix
Before:
After:
Type
Bug Fix
Changes
Two-pronged fix per the issuer's recommendation:
Usage.__init__andstream_chunk_buildernow coerceserver_tool_usetoServerToolUsepydantic when reconstructingusagefrom streaming chunks. Previously the streaming path round-tripped throughmodel_dump()and re-builtUsage(**dump)which leftserver_tool_useas a plain dict._get_web_search_requestshelper that handles None / dict / ServerToolUse uniformly. Used intool_call_cost_tracking.pyandanthropic/cost_calculation.py.Both fixes shipped together. Tests cover the regression + the helper's tolerance for all input types.