chore(ci): merge dev branch by yuneng-berri · Pull Request #28290 · BerriAI/litellm

yuneng-berri · 2026-05-19T22:08:26Z

fix: end user logs
fix(auth): address PR review feedback on end-user id validation

Gate DB validation behind litellm.validate_end_user_id_in_db (default False) so arbitrary client-supplied identifiers still pass through.
Reuse get_end_user_object / get_user_object / _get_fuzzy_user_object instead of issuing raw Prisma queries in the auth hot path.
Consolidate: builder does the resolution once and stores it on the auth obj; centralized checks reuse it, the outer user_api_key_auth copy is removed.
Preserve end_user_id when litellm.max_end_user_budget_id is set so the default end-user budget can still apply to new customers.

fix(auth): gate JSON-blob user-id rejection behind validate_end_user_id_in_db

Addresses PR review feedback: the JSON-encoded dict/list rejection in _coerce_user_id_to_str was unconditionally applied, which would silently stop tracking spend for deployments passing JSON-encoded user identifiers on upgrade. Per the backwards-compatibility rule, default-path behavior changes must be opt-in.

Now only strings that decode to a JSON object/array are dropped when litellm.validate_end_user_id_in_db is True. Non-string dict/list/tuple values are still always dropped, since stringifying them produces unusable "{'device_id': ...}"-shaped spend-log rows.

fix(auth): route email end-user lookup through get_user_object cache

The email-shaped end-user id branch called _get_fuzzy_user_object directly, bypassing get_user_object's _should_check_db throttle and user_api_key_cache. Every unique email would hit an unbudgeted raw Prisma query on the critical auth path. Collapsing the two calls into one get_user_object invocation with user_email=end_user_id routes through the cached helper per PR review feedback.

fix(auth): keep end-user safety net at user_api_key_auth tail

Krrish flagged that removing the tail-of-user_api_key_auth assignment was a regression risk: _user_api_key_auth_builder has multiple early-return paths (master_key=None, /user/auth, JWT short-circuits) that bypass the end-user resolution block, so dropping the safety net silently strips end-user attribution from those paths.

Restore the assignment but route it through resolve_and_validate_end_user_id so the same validation rules apply. Skip the second pass when the builder already set an id.

Adds two tests pinning the behaviour: one for the early-return safety net and one verifying we don't double-resolve when the builder set the id.

Relevant issues

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

* fix: end user logs * fix(auth): address PR review feedback on end-user id validation - Gate DB validation behind litellm.validate_end_user_id_in_db (default False) so arbitrary client-supplied identifiers still pass through. - Reuse get_end_user_object / get_user_object / _get_fuzzy_user_object instead of issuing raw Prisma queries in the auth hot path. - Consolidate: builder does the resolution once and stores it on the auth obj; centralized checks reuse it, the outer user_api_key_auth copy is removed. - Preserve end_user_id when litellm.max_end_user_budget_id is set so the default end-user budget can still apply to new customers. * fix(auth): gate JSON-blob user-id rejection behind validate_end_user_id_in_db Addresses PR review feedback: the JSON-encoded dict/list rejection in _coerce_user_id_to_str was unconditionally applied, which would silently stop tracking spend for deployments passing JSON-encoded user identifiers on upgrade. Per the backwards-compatibility rule, default-path behavior changes must be opt-in. Now only strings that decode to a JSON object/array are dropped when litellm.validate_end_user_id_in_db is True. Non-string dict/list/tuple values are still always dropped, since stringifying them produces unusable "{'device_id': ...}"-shaped spend-log rows. * fix(auth): route email end-user lookup through get_user_object cache The email-shaped end-user id branch called _get_fuzzy_user_object directly, bypassing get_user_object's _should_check_db throttle and user_api_key_cache. Every unique email would hit an unbudgeted raw Prisma query on the critical auth path. Collapsing the two calls into one get_user_object invocation with user_email=end_user_id routes through the cached helper per PR review feedback. * fix(auth): keep end-user safety net at user_api_key_auth tail Krrish flagged that removing the tail-of-user_api_key_auth assignment was a regression risk: ``_user_api_key_auth_builder`` has multiple early-return paths (master_key=None, /user/auth, JWT short-circuits) that bypass the end-user resolution block, so dropping the safety net silently strips end-user attribution from those paths. Restore the assignment but route it through resolve_and_validate_end_user_id so the same validation rules apply. Skip the second pass when the builder already set an id. Adds two tests pinning the behaviour: one for the early-return safety net and one verifying we don't double-resolve when the builder set the id.

codecov · 2026-05-19T22:11:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-19T22:14:52Z

Greptile Summary

This PR refines the end-user ID attribution path in the proxy auth layer: it adds an opt-in validate_end_user_id_in_db flag to validate caller-supplied user IDs against known DB rows, introduces _coerce_user_id_to_str to drop non-string structured values from all extraction sites, and restores the tail safety-net in user_api_key_auth so early-return builder paths still attribute spend logs correctly.

resolve_and_validate_end_user_id and _end_user_id_exists_in_db are added to auth_checks.py; they reuse get_end_user_object / get_user_object so validation shares the existing cache and avoids raw Prisma calls on the hot auth path.
_coerce_user_id_to_str replaces all str() coercions in get_end_user_id_from_request_body, dropping dict/list/tuple values unconditionally and gating JSON-string rejection behind the new flag for backward compatibility.
The tail safety-net in user_api_key_auth re-routes through resolve_and_validate_end_user_id, and two new integration tests pin the no-double-resolve and early-return-fallback invariants.

Confidence Score: 4/5

Safe to merge for default configurations; the new opt-in validation path has a minor exception-handling gap worth addressing before enabling the flag in production

The core logic is well-structured with careful backward-compatibility gating throughout. Three findings are all non-blocking: a BudgetExceededError gap in the tail safety-net (only reachable with the opt-in flag plus a builder early-return), two trivially-true test assertions, and one test that misses its documented branch. No issues affect the default flag-off code path.

litellm/proxy/auth/user_api_key_auth.py tail section (lines 2188-2208) and the corresponding tests in test_auth_checks.py

Important Files Changed

Filename	Overview
litellm/init.py	Adds `validate_end_user_id_in_db: bool = False` global flag — clean, backwards-compatible addition with accurate inline documentation
litellm/proxy/auth/auth_checks.py	New validation helpers reuse existing cache/DB helpers correctly; BudgetExceededError propagation from the tail safety-net call site is outside the exception handler
litellm/proxy/auth/auth_utils.py	`_coerce_user_id_to_str` wired through all extraction sites with correct backward-compat gating
litellm/proxy/auth/user_api_key_auth.py	Builder, centralized checks, and tail safety-net correctly layered; BudgetExceededError from tail block bypasses exception handler
tests/test_litellm/proxy/auth/test_auth_checks.py	Good coverage but `_get_fuzzy_user_object.assert_not_awaited()` assertions are trivially true
tests/test_litellm/proxy/auth/test_auth_utils.py	Thorough coercion tests; brace-string test doesn't enable the validation flag
tests/test_litellm/proxy/auth/test_user_api_key_auth.py	Two new integration tests correctly pin the early-return and no-double-resolve invariants

Comments Outside Diff (1)

tests/test_litellm/proxy/auth/test_auth_checks.py, line 525 (link)

_get_fuzzy_user_object.assert_not_awaited() assertions are trivially true

_get_fuzzy_user_object is never called directly from _end_user_id_exists_in_db; it is only invoked as an internal detail of get_user_object, which is itself mocked. The assertion is therefore always satisfied regardless of what the production code does, and would not catch a regression where _end_user_id_exists_in_db is changed to call _get_fuzzy_user_object directly. The same trivially-true assertion appears at line 555. The meaningful check for the email-routing behaviour is already the user_kwargs["user_email"] assertion at lines 552-553 and 576.

_{Reviews (1): Last reviewed commit: "fix: end user logs (#27758)" | Re-trigger Greptile}

greptile-apps · 2026-05-19T22:14:59Z

+
+    def test_dict_metadata_user_id_returns_none(self):
+        request_body = {
+            "metadata": {"user_id": {"device_id": "abc"}},
+        }


Test doesn't exercise the JSON-parse branch it documents

The docstring says "A string starting with { but failing to parse stays as-is", implying that the safe_json_loads guard inside _coerce_user_id_to_str is being tested. However, the test doesn't set litellm.validate_end_user_id_in_db = True, so the if litellm.validate_end_user_id_in_db and stripped[:1] in ("{", "[") condition is never entered. A companion test with the flag enabled is needed to confirm that invalid-JSON brace strings are preserved after safe_json_loads returns None.

greptile-apps · 2026-05-19T22:15:00Z

+    if user_api_key_auth_obj.end_user_id is None:
+        from litellm.proxy.proxy_server import (
+            prisma_client,
+            proxy_logging_obj,
+            user_api_key_cache,
+        )
+
+        raw_end_user_id = get_end_user_id_from_request_body(
+            request_data, _safe_get_request_headers(request)
+        )
+        if raw_end_user_id is not None:
+            resolved_end_user_id = await resolve_and_validate_end_user_id(
+                raw_end_user_id=raw_end_user_id,
+                prisma_client=prisma_client,
+                user_api_key_cache=user_api_key_cache,
+                parent_otel_span=user_api_key_auth_obj.parent_otel_span,
+                proxy_logging_obj=proxy_logging_obj,
+                route=route,
+            )
+            if resolved_end_user_id is not None:
+                user_api_key_auth_obj.end_user_id = resolved_end_user_id


BudgetExceededError from tail safety-net bypasses the exception handler

When validate_end_user_id_in_db=True, the tail block calls resolve_and_validate_end_user_id, which calls _end_user_id_exists_in_db -> get_end_user_object. get_end_user_object re-raises litellm.BudgetExceededError for over-budget users, and _end_user_id_exists_in_db propagates it. Because this block sits outside the try/except around _run_centralized_common_checks (lines 2165-2180), a budget-exceeded exception here escapes user_api_key_auth without going through UserAPIKeyAuthExceptionHandler._handle_authentication_error, producing a raw 500 instead of a formatted 429/403 on builder early-return paths when the flag is enabled.

* fix: end user logs * fix(auth): address PR review feedback on end-user id validation - Gate DB validation behind litellm.validate_end_user_id_in_db (default False) so arbitrary client-supplied identifiers still pass through. - Reuse get_end_user_object / get_user_object / _get_fuzzy_user_object instead of issuing raw Prisma queries in the auth hot path. - Consolidate: builder does the resolution once and stores it on the auth obj; centralized checks reuse it, the outer user_api_key_auth copy is removed. - Preserve end_user_id when litellm.max_end_user_budget_id is set so the default end-user budget can still apply to new customers. * fix(auth): gate JSON-blob user-id rejection behind validate_end_user_id_in_db Addresses PR review feedback: the JSON-encoded dict/list rejection in _coerce_user_id_to_str was unconditionally applied, which would silently stop tracking spend for deployments passing JSON-encoded user identifiers on upgrade. Per the backwards-compatibility rule, default-path behavior changes must be opt-in. Now only strings that decode to a JSON object/array are dropped when litellm.validate_end_user_id_in_db is True. Non-string dict/list/tuple values are still always dropped, since stringifying them produces unusable "{'device_id': ...}"-shaped spend-log rows. * fix(auth): route email end-user lookup through get_user_object cache The email-shaped end-user id branch called _get_fuzzy_user_object directly, bypassing get_user_object's _should_check_db throttle and user_api_key_cache. Every unique email would hit an unbudgeted raw Prisma query on the critical auth path. Collapsing the two calls into one get_user_object invocation with user_email=end_user_id routes through the cached helper per PR review feedback. * fix(auth): keep end-user safety net at user_api_key_auth tail Krrish flagged that removing the tail-of-user_api_key_auth assignment was a regression risk: ``_user_api_key_auth_builder`` has multiple early-return paths (master_key=None, /user/auth, JWT short-circuits) that bypass the end-user resolution block, so dropping the safety net silently strips end-user attribution from those paths. Restore the assignment but route it through resolve_and_validate_end_user_id so the same validation rules apply. Skip the second pass when the builder already set an id. Adds two tests pinning the behaviour: one for the early-return safety net and one verifying we don't double-resolve when the builder set the id. Co-authored-by: Dennis Henry <dennis.henry@okta.com>

yuneng-berri changed the title ~~[Infra] Merge dev branch~~ chore(ci): merge dev branch May 19, 2026

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

ryan-crabbe-berri approved these changes May 21, 2026

View reviewed changes

yuneng-berri merged commit 697a90e into litellm_internal_staging May 21, 2026
115 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(ci): merge dev branch#28290

chore(ci): merge dev branch#28290
yuneng-berri merged 1 commit into
litellm_internal_stagingfrom
litellm_yj_may19

yuneng-berri commented May 19, 2026

Uh oh!

codecov Bot commented May 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 19, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 19, 2026

Uh oh!

greptile-apps Bot May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

yuneng-berri commented May 19, 2026

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 19, 2026 •

edited

Loading

greptile-apps Bot commented May 19, 2026 •

edited

Loading