chore(ci): merge dev branch#28290
Conversation
* fix: end user logs
* fix(auth): address PR review feedback on end-user id validation
- Gate DB validation behind litellm.validate_end_user_id_in_db (default
False) so arbitrary client-supplied identifiers still pass through.
- Reuse get_end_user_object / get_user_object / _get_fuzzy_user_object
instead of issuing raw Prisma queries in the auth hot path.
- Consolidate: builder does the resolution once and stores it on the
auth obj; centralized checks reuse it, the outer user_api_key_auth
copy is removed.
- Preserve end_user_id when litellm.max_end_user_budget_id is set so
the default end-user budget can still apply to new customers.
* fix(auth): gate JSON-blob user-id rejection behind validate_end_user_id_in_db
Addresses PR review feedback: the JSON-encoded dict/list rejection in
_coerce_user_id_to_str was unconditionally applied, which would silently
stop tracking spend for deployments passing JSON-encoded user identifiers
on upgrade. Per the backwards-compatibility rule, default-path behavior
changes must be opt-in.
Now only strings that decode to a JSON object/array are dropped when
litellm.validate_end_user_id_in_db is True. Non-string dict/list/tuple
values are still always dropped, since stringifying them produces
unusable "{'device_id': ...}"-shaped spend-log rows.
* fix(auth): route email end-user lookup through get_user_object cache
The email-shaped end-user id branch called _get_fuzzy_user_object directly,
bypassing get_user_object's _should_check_db throttle and user_api_key_cache.
Every unique email would hit an unbudgeted raw Prisma query on the critical
auth path. Collapsing the two calls into one get_user_object invocation
with user_email=end_user_id routes through the cached helper per PR review
feedback.
* fix(auth): keep end-user safety net at user_api_key_auth tail
Krrish flagged that removing the tail-of-user_api_key_auth assignment
was a regression risk: ``_user_api_key_auth_builder`` has multiple
early-return paths (master_key=None, /user/auth, JWT short-circuits)
that bypass the end-user resolution block, so dropping the safety net
silently strips end-user attribution from those paths.
Restore the assignment but route it through resolve_and_validate_end_user_id
so the same validation rules apply. Skip the second pass when the builder
already set an id.
Adds two tests pinning the behaviour: one for the early-return safety
net and one verifying we don't double-resolve when the builder set the id.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR refines the end-user ID attribution path in the proxy auth layer: it adds an opt-in
Confidence Score: 4/5Safe to merge for default configurations; the new opt-in validation path has a minor exception-handling gap worth addressing before enabling the flag in production The core logic is well-structured with careful backward-compatibility gating throughout. Three findings are all non-blocking: a BudgetExceededError gap in the tail safety-net (only reachable with the opt-in flag plus a builder early-return), two trivially-true test assertions, and one test that misses its documented branch. No issues affect the default flag-off code path.
|
| Filename | Overview |
|---|---|
| litellm/init.py | Adds validate_end_user_id_in_db: bool = False global flag — clean, backwards-compatible addition with accurate inline documentation |
| litellm/proxy/auth/auth_checks.py | New validation helpers reuse existing cache/DB helpers correctly; BudgetExceededError propagation from the tail safety-net call site is outside the exception handler |
| litellm/proxy/auth/auth_utils.py | _coerce_user_id_to_str wired through all extraction sites with correct backward-compat gating |
| litellm/proxy/auth/user_api_key_auth.py | Builder, centralized checks, and tail safety-net correctly layered; BudgetExceededError from tail block bypasses exception handler |
| tests/test_litellm/proxy/auth/test_auth_checks.py | Good coverage but _get_fuzzy_user_object.assert_not_awaited() assertions are trivially true |
| tests/test_litellm/proxy/auth/test_auth_utils.py | Thorough coercion tests; brace-string test doesn't enable the validation flag |
| tests/test_litellm/proxy/auth/test_user_api_key_auth.py | Two new integration tests correctly pin the early-return and no-double-resolve invariants |
Comments Outside Diff (1)
-
tests/test_litellm/proxy/auth/test_auth_checks.py, line 525 (link)_get_fuzzy_user_object.assert_not_awaited()assertions are trivially true_get_fuzzy_user_objectis never called directly from_end_user_id_exists_in_db; it is only invoked as an internal detail ofget_user_object, which is itself mocked. The assertion is therefore always satisfied regardless of what the production code does, and would not catch a regression where_end_user_id_exists_in_dbis changed to call_get_fuzzy_user_objectdirectly. The same trivially-true assertion appears at line 555. The meaningful check for the email-routing behaviour is already theuser_kwargs["user_email"]assertion at lines 552-553 and 576.
Reviews (1): Last reviewed commit: "fix: end user logs (#27758)" | Re-trigger Greptile
|
|
||
| def test_dict_metadata_user_id_returns_none(self): | ||
| request_body = { | ||
| "metadata": {"user_id": {"device_id": "abc"}}, | ||
| } |
There was a problem hiding this comment.
Test doesn't exercise the JSON-parse branch it documents
The docstring says "A string starting with { but failing to parse stays as-is", implying that the safe_json_loads guard inside _coerce_user_id_to_str is being tested. However, the test doesn't set litellm.validate_end_user_id_in_db = True, so the if litellm.validate_end_user_id_in_db and stripped[:1] in ("{", "[") condition is never entered. A companion test with the flag enabled is needed to confirm that invalid-JSON brace strings are preserved after safe_json_loads returns None.
| if user_api_key_auth_obj.end_user_id is None: | ||
| from litellm.proxy.proxy_server import ( | ||
| prisma_client, | ||
| proxy_logging_obj, | ||
| user_api_key_cache, | ||
| ) | ||
|
|
||
| raw_end_user_id = get_end_user_id_from_request_body( | ||
| request_data, _safe_get_request_headers(request) | ||
| ) | ||
| if raw_end_user_id is not None: | ||
| resolved_end_user_id = await resolve_and_validate_end_user_id( | ||
| raw_end_user_id=raw_end_user_id, | ||
| prisma_client=prisma_client, | ||
| user_api_key_cache=user_api_key_cache, | ||
| parent_otel_span=user_api_key_auth_obj.parent_otel_span, | ||
| proxy_logging_obj=proxy_logging_obj, | ||
| route=route, | ||
| ) | ||
| if resolved_end_user_id is not None: | ||
| user_api_key_auth_obj.end_user_id = resolved_end_user_id |
There was a problem hiding this comment.
BudgetExceededError from tail safety-net bypasses the exception handler
When validate_end_user_id_in_db=True, the tail block calls resolve_and_validate_end_user_id, which calls _end_user_id_exists_in_db -> get_end_user_object. get_end_user_object re-raises litellm.BudgetExceededError for over-budget users, and _end_user_id_exists_in_db propagates it. Because this block sits outside the try/except around _run_centralized_common_checks (lines 2165-2180), a budget-exceeded exception here escapes user_api_key_auth without going through UserAPIKeyAuthExceptionHandler._handle_authentication_error, producing a raw 500 instead of a formatted 429/403 on builder early-return paths when the flag is enabled.
* fix: end user logs
* fix(auth): address PR review feedback on end-user id validation
- Gate DB validation behind litellm.validate_end_user_id_in_db (default
False) so arbitrary client-supplied identifiers still pass through.
- Reuse get_end_user_object / get_user_object / _get_fuzzy_user_object
instead of issuing raw Prisma queries in the auth hot path.
- Consolidate: builder does the resolution once and stores it on the
auth obj; centralized checks reuse it, the outer user_api_key_auth
copy is removed.
- Preserve end_user_id when litellm.max_end_user_budget_id is set so
the default end-user budget can still apply to new customers.
* fix(auth): gate JSON-blob user-id rejection behind validate_end_user_id_in_db
Addresses PR review feedback: the JSON-encoded dict/list rejection in
_coerce_user_id_to_str was unconditionally applied, which would silently
stop tracking spend for deployments passing JSON-encoded user identifiers
on upgrade. Per the backwards-compatibility rule, default-path behavior
changes must be opt-in.
Now only strings that decode to a JSON object/array are dropped when
litellm.validate_end_user_id_in_db is True. Non-string dict/list/tuple
values are still always dropped, since stringifying them produces
unusable "{'device_id': ...}"-shaped spend-log rows.
* fix(auth): route email end-user lookup through get_user_object cache
The email-shaped end-user id branch called _get_fuzzy_user_object directly,
bypassing get_user_object's _should_check_db throttle and user_api_key_cache.
Every unique email would hit an unbudgeted raw Prisma query on the critical
auth path. Collapsing the two calls into one get_user_object invocation
with user_email=end_user_id routes through the cached helper per PR review
feedback.
* fix(auth): keep end-user safety net at user_api_key_auth tail
Krrish flagged that removing the tail-of-user_api_key_auth assignment
was a regression risk: ``_user_api_key_auth_builder`` has multiple
early-return paths (master_key=None, /user/auth, JWT short-circuits)
that bypass the end-user resolution block, so dropping the safety net
silently strips end-user attribution from those paths.
Restore the assignment but route it through resolve_and_validate_end_user_id
so the same validation rules apply. Skip the second pass when the builder
already set an id.
Adds two tests pinning the behaviour: one for the early-return safety
net and one verifying we don't double-resolve when the builder set the id.
Co-authored-by: Dennis Henry <dennis.henry@okta.com>
fix: end user logs
fix(auth): address PR review feedback on end-user id validation
Addresses PR review feedback: the JSON-encoded dict/list rejection in _coerce_user_id_to_str was unconditionally applied, which would silently stop tracking spend for deployments passing JSON-encoded user identifiers on upgrade. Per the backwards-compatibility rule, default-path behavior changes must be opt-in.
Now only strings that decode to a JSON object/array are dropped when litellm.validate_end_user_id_in_db is True. Non-string dict/list/tuple values are still always dropped, since stringifying them produces unusable "{'device_id': ...}"-shaped spend-log rows.
The email-shaped end-user id branch called _get_fuzzy_user_object directly, bypassing get_user_object's _should_check_db throttle and user_api_key_cache. Every unique email would hit an unbudgeted raw Prisma query on the critical auth path. Collapsing the two calls into one get_user_object invocation with user_email=end_user_id routes through the cached helper per PR review feedback.
Krrish flagged that removing the tail-of-user_api_key_auth assignment was a regression risk:
_user_api_key_auth_builderhas multiple early-return paths (master_key=None, /user/auth, JWT short-circuits) that bypass the end-user resolution block, so dropping the safety net silently strips end-user attribution from those paths.Restore the assignment but route it through resolve_and_validate_end_user_id so the same validation rules apply. Skip the second pass when the builder already set an id.
Adds two tests pinning the behaviour: one for the early-return safety net and one verifying we don't double-resolve when the builder set the id.
Relevant issues
Linear ticket
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes