test(proxy): behavior-pinning matrix for team management endpoints by yuneng-berri · Pull Request #28441 · BerriAI/litellm

yuneng-berri · 2026-05-21T05:07:13Z

PR2 — Team Tier-1 behavior pinning

Second slice of the management-endpoint behavior-pinning effort. Extends the
tests/proxy_behavior/management/ harness from PR1 (#28321) and adds the
actor × target-resource authorization matrix for the 7 team endpoints.
Tests-only — no production code changes.

Scope

/team/new, /team/info, /team/list, /team/update, /team/member_add,
/team/member_delete, /team/member_update.

Harness extensions (actors.py, conftest.py):

ORG_B_ADMIN actor + TEAM_GAMMA (an ORG_A team with no actor members),
so team-targeting endpoints get a clean own / same-org-other / cross-org
target axis.
create_scratch_team() raw-seeds target teams with no /team/new side
effects; the scratch teardown now also strips dangling scratch-team refs
from LiteLLM_UserTable.teams.

Status codes are pinned to observed handler behavior; surprising results
are surfaced in test comments, not "fixed" in the assertion.

Exit gates

G1 — CI green. New job test-unit-proxy-mgmt-behavior.yml runs on this PR
(no workflow change needed — it already globs tests/proxy_behavior). No
skipped tests.

G2 — wall-time. proxy-mgmt-behavior / Run tests ran in ~4m on this PR
(286 scenarios, PR1 + PR2) — inside the 10-min budget. workers: 0 unchanged.

G3 — strict imports. Both greps empty (also codified in
test_no_management_imports.py):

$ rg 'from litellm\.proxy\.management_endpoints' tests/proxy_behavior/   # empty
$ rg 'mock.*user_api_key_auth|patch.*user_api_key_auth' tests/proxy_behavior/  # empty

G4 — regression replay. In-scope behavior-fix candidates on
team_endpoints.py (~6mo): 09ffc87734 (added _verify_team_access),
662d05531d (relocation gate), 91bfbe6efe (org-boundary enforcement),
70126d9130/1b2ea270b4 (team/new org validation), 1d45cfd1fc/
f879b8b1cb/b10f71d583 (member_add).

Verified RED→GREEN replay against 09ffc87734:

Step	Action	`test_team_update_org_relocation_gate[org_b_admin]`
RED	revert `09ffc87734` (drop `_verify_team_access` call in `update_team`)	FAIL — 200, ORG_B_ADMIN relocates a team it does not administer
GREEN	restore the fix	PASS — 403

G5 — mutmut. Deferred manual follow-up, as in PR1: a whole-folder
mutation-test.yml run takes hours and is off the per-PR critical path.
[tool.mutmut].tests_dir already includes this suite. The binding pre-merge
signal is the behavior matrix (G1) plus the G4 regression-replay.

PR-specific metrics

PR2.M1 — 156 scenarios, 7/7 endpoints.

File	Scenarios
`test_team_info.py`	27
`test_team_list.py`	18
`test_team_new.py`	30
`test_team_update.py`	24
`test_team_member_add.py`	21
`test_team_member_delete.py`	18
`test_team_member_update.py`	18

PR2.M2 — cumulative wall-time ~4m on CI for all 286 PR1 + PR2 scenarios,
inside the 10-min budget.

Test plan

uv run pytest tests/proxy_behavior/ — 286 passed locally (Postgres 14);
proxy-mgmt-behavior CI job green on this PR.
Known coverage gap: the /team/update org-relocation allowed branch for
a non-proxy-admin needs a caller who is org admin of both source and
destination orgs; no seeded actor is one, so only the deny paths are
pinned (noted in test_team_update.py).

PR2 (Team Tier-1) of the management-endpoint behavior-pinning effort. Extends the tests/proxy_behavior/management/ harness PR1 built and adds the actor x target-resource authz matrix for the 7 team endpoints: /team/new, /team/info, /team/list, /team/update, /team/member_add, /team/member_delete, /team/member_update. Tests-only, no production code changes. Harness extensions: - actors.py: ORG_B_ADMIN actor (org admin of ORG_B) and TEAM_GAMMA (an ORG_A team with no actor members), so team-targeting endpoints get a clean own / same-org-other / cross-org target axis. - conftest.py: create_scratch_team() raw-seeds target teams without /team/new side effects; the scratch teardown now also strips dangling scratch-team refs from LiteLLM_UserTable.teams. 156 new scenarios; status codes pinned to observed handler behavior.

codecov · 2026-05-21T05:10:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-21T05:13:33Z

Greptile Summary

This is a tests-only PR that extends the tests/proxy_behavior/management/ behavior-pinning harness from PR1 to cover the 7 team management endpoints. No production code is changed. The suite drives real HTTP endpoints against a live Postgres instance with the full auth layer — nothing is mocked.

Adds a 156-scenario actor × target-resource authorization matrix for /team/new, /team/info, /team/list, /team/update, /team/member_add, /team/member_delete, and /team/member_update, along with a verified G4 RED→GREEN regression replay against 09ffc87734.
Extends the world seed with a new ORG_B_ADMIN actor and TEAM_GAMMA (ORG_A, no members), and adds create_scratch_team to raw-seed target teams without /team/new side effects; the scratch teardown is extended to strip dangling scratch-team refs from LiteLLM_UserTable.teams arrays on world actors.

Confidence Score: 5/5

Tests-only change with no production code modifications; safe to merge.

Every changed file is under tests/proxy_behavior/management/. The world seed additions (ORG_B_ADMIN, TEAM_GAMMA) are consistent across actors.py, conftest.py, and the test matrices. The scratch teardown correctly handles the new side effects introduced by POST /team/new and POST /team/member_add (team rows, membership rows, and world-actor teams array refs). No production code paths are touched.

No files require special attention.

Important Files Changed

Filename	Overview
tests/proxy_behavior/management/actors.py	Adds ORG_B_ADMIN actor and TEAM_GAMMA team to the world seed; world struct, _actor_profile, seed_world, and _wipe_world all updated consistently
tests/proxy_behavior/management/conftest.py	Adds create_scratch_team helper and teardown logic to strip dangling scratch-team refs from world-actor LiteLLM_UserTable.teams arrays; existing teardown order (FK-safe: membership → teamtable → usertable) is preserved
tests/proxy_behavior/management/test_team_new.py	27-scenario authz matrix + 3 input-validation pins for POST /team/new; cleanup relies on scratch teardown for team rows, membership rows, and creator teams-array refs
tests/proxy_behavior/management/test_team_update.py	24 scenarios across basic authz matrix, no-org-context test, and org-relocation gate for POST /team/update; all use fresh scratch teams and verify DB state post-request
tests/proxy_behavior/management/test_team_member_add.py	21 scenarios covering authz matrix plus available-team self-join (role-escalation and other-user injection deny paths); monkeypatch of default_internal_user_params is safe under workers:0

_{Reviews (2): Last reviewed commit: "test(proxy): drop suite README + triage ..." | Re-trigger Greptile}

greptile-apps · 2026-05-21T05:13:37Z

+async def test_team_new_unknown_organization_is_500(
+    proxy_client, prisma, scratch, world
+):
+    """SURFACED, NOT ENDORSED: a /team/new with an organization_id that does
+    not exist currently fails 500 (the role-resolution layer raises before
+    the handler's own 400 'Organization not found' check is reached)."""
+    resp = await proxy_client.post(
+        "/team/new",
+        headers={"Authorization": f"Bearer {world.keys[Actor.PROXY_ADMIN].cleartext}"},
+        json={
+            "team_id": scratch.prefix,
+            "organization_id": scratch.tag("no-such-org"),
+        },
+    )
+    assert resp.status_code == 500, resp.text
+    row = await prisma.db.litellm_teamtable.find_unique(
+        where={"team_id": scratch.prefix}
+    )
+    assert row is None


500 pinned as expected behavior with no follow-up issue

test_team_new_unknown_organization_is_500 asserts resp.status_code == 500. The inline comment correctly labels this "SURFACED, NOT ENDORSED" and explains that the role-resolution layer raises an unhandled exception before the handler's own 400 guard is reached. Because the suite pins observed behavior, a future fix to return 400 would flip this test RED — which is the intended signal, but only if the developer knows to look here. Consider linking a tracking issue (e.g., a # TODO(#NNNN) reference) so the fix path is obvious when the test does flip.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-05-21T05:13:38Z

+        polluted = await prisma.db.litellm_usertable.find_many(
+            where={"teams": {"isEmpty": False}}
+        )
+        for user in polluted:
+            cleaned = [t for t in user.teams if not t.startswith(handle.prefix)]
+            if cleaned != list(user.teams):
+                await prisma.db.litellm_usertable.update(
+                    where={"user_id": user.user_id},
+                    data={"teams": {"set": cleaned}},
+                )


Teardown scans every user with a non-empty teams array

find_many(where={"teams": {"isEmpty": False}}) loads all matching rows from the entire test DB on every scratch teardown, not just the rows touched by the current test. With workers: 0 (serial) and the isolated test DB this is harmless today, but if the DB is ever shared with a larger data set the teardown could become slow and mask timing-related issues. A tighter filter like {"teams": {"has": handle.prefix}} (Prisma hasSome/has on a String[] field) would scope the query to only rows that actually contain a scratch-prefixed team ID.

Attempted a scoped local mutmut run for G5; it did not complete. Record the three concrete blockers in mutmut_triage/pr2-team-tier1.md so the next attempt has a head start: 1. mutmut's mutants/ sandbox is import-shadowed by the worktree source. 2. the legacy mock suite and the real-DB behavior suite cannot share a pytest session (mock suite globally patches prisma_client). 3. the CI mutation-test.yml workflow starts no Postgres, so its stats phase now aborts on the behavior-suite tests PR1 added to tests_dir. mutmut stays a deferred follow-up (as in PR1); the binding pre-merge signal remains the behavior matrix (G1) and the G4 regression-replay.

Remove the two prose docs from the behavior suite (README.md and mutmut_triage/pr2-team-tier1.md) and tighten the comment blocks on the team test files + harness down to the load-bearing parts (the gate each matrix pins, plus genuinely surprising results). No behavior change — all 286 scenarios still pass.

* feat: add guardrail violation span attributes and fix missing spans on pre-call blocks (#28364) - Fix missing guardrail child spans when a pre-call guardrail blocks the request before reaching the LLM provider; `async_post_call_failure_hook` now calls `_emit_guardrail_spans_from_request_data` to emit spans from `request_data["metadata"]` regardless of whether `_handle_failure` already fired - Add `guardrail_status`, `guardrail_action`, and `guardrail_violation_categories` as queryable top-level OTEL span attributes so trace backends can filter/group by violation type without parsing the redacted `guardrail_response` blob - Introduce `_emit_guardrail_spans_from_request_data` helper that constructs minimal kwargs from `request_data["metadata"]` and routes through `_create_guardrail_span`, sharing the same dedupe state to prevent double-emitting when both failure hooks fire - Extend `BedrockGuardrail` with `_build_tracing_detail` and `_extract_violation_category_names` which flatten BLOCKED assessments into human-readable category labels (topic names, content-filter types, PII entity types, named regex names) before redaction, and surface Bedrock's raw `action` field via `tracing_detail` - Security: violation category extraction deliberately omits `customWords.match` and unnamed regex `match` values because those fields carry the user-submitted content that triggered the rule; only operator-defined `name`/`type` labels are emitted - Add `violation_categories` and `guardrail_action` fields to `StandardLoggingGuardrailInformation` and `GuardrailTracingDetail` TypedDicts to carry the pre-redaction metadata through the logging pipeline - Add comprehensive test suite covering: guardrail span creation on failure, dedupe between `_handle_failure` and `async_post_call_failure_hook`, per-span status attributes for multi-guardrail sequences, Bedrock category extraction for all policy types, security leak prevention, and end-to-end `CustomGuardrail` violation path Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> * test(proxy): behavior-pinning matrix for team management endpoints (#28441) * test(proxy): behavior-pinning matrix for team management endpoints PR2 (Team Tier-1) of the management-endpoint behavior-pinning effort. Extends the tests/proxy_behavior/management/ harness PR1 built and adds the actor x target-resource authz matrix for the 7 team endpoints: /team/new, /team/info, /team/list, /team/update, /team/member_add, /team/member_delete, /team/member_update. Tests-only, no production code changes. Harness extensions: - actors.py: ORG_B_ADMIN actor (org admin of ORG_B) and TEAM_GAMMA (an ORG_A team with no actor members), so team-targeting endpoints get a clean own / same-org-other / cross-org target axis. - conftest.py: create_scratch_team() raw-seeds target teams without /team/new side effects; the scratch teardown now also strips dangling scratch-team refs from LiteLLM_UserTable.teams. 156 new scenarios; status codes pinned to observed handler behavior. * test(proxy): record mutmut run blockers in PR2 triage doc Attempted a scoped local mutmut run for G5; it did not complete. Record the three concrete blockers in mutmut_triage/pr2-team-tier1.md so the next attempt has a head start: 1. mutmut's mutants/ sandbox is import-shadowed by the worktree source. 2. the legacy mock suite and the real-DB behavior suite cannot share a pytest session (mock suite globally patches prisma_client). 3. the CI mutation-test.yml workflow starts no Postgres, so its stats phase now aborts on the behavior-suite tests PR1 added to tests_dir. mutmut stays a deferred follow-up (as in PR1); the binding pre-merge signal remains the behavior matrix (G1) and the G4 regression-replay. * test(proxy): drop suite README + triage doc, trim test comments Remove the two prose docs from the behavior suite (README.md and mutmut_triage/pr2-team-tier1.md) and tighten the comment blocks on the team test files + harness down to the load-bearing parts (the gate each matrix pins, plus genuinely surprising results). No behavior change — all 286 scenarios still pass. * test(proxy): remove mutmut tests_dir comment * test(vertex_ai): tolerate transient 500 in google maps grounding test (#28503) test_gemini_google_maps_tool_simple makes live calls to Vertex AI's Google Maps grounding backend, which intermittently returns 500 INTERNAL ("Please retry") — a transient Google-side failure, not a LiteLLM bug. The request LiteLLM emits matches Google's published googleMaps grounding spec field-for-field, and the maps-platform 500 only occurs after Vertex accepts the request. The test already passes on RateLimitError; treat InternalServerError the same way so transient Vertex-side failures don't fail CI. * fix(docker): restore npm to non_root builder image (#28519) The non_root builder stage installs `nodejs` but not `npm`. Without `npm` on PATH, prisma-python falls back to downloading a Node runtime via nodeenv from nodejs.org, and that downloaded binary fails to load `libatomic.so.1` — breaking `prisma generate` and the image build. `npm` was dropped from this apk list in ca52e34. Restoring it lets prisma-python use the system Node + npm, matching docker/Dockerfile which already installs `npm` for the same reason. * build(deps): bump next from 16.2.4 to 16.2.6 in /ui/litellm-dashboard (#27665) (#28524) Bumps [next](https://github.com/vercel/next.js) from 16.2.4 to 16.2.6. - [Release notes](https://github.com/vercel/next.js/releases) - [Changelog](https://github.com/vercel/next.js/blob/canary/release.js) - [Commits](vercel/next.js@v16.2.4...v16.2.6) --- updated-dependencies: - dependency-name: next dependency-version: 16.2.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump black to 26.3.1 and apply formatting (#28525) * build(deps-dev): bump black 24.10.0 -> 26.3.1 * style: apply black 26.3.1 formatting * chore: authorize black 26.3.1 license in liccheck.ini * chore(deps): bump deps (#28528) * build(deps): bump next from 16.2.4 to 16.2.6 in /ui/litellm-dashboard (#27665) Bumps [next](https://github.com/vercel/next.js) from 16.2.4 to 16.2.6. - [Release notes](https://github.com/vercel/next.js/releases) - [Changelog](https://github.com/vercel/next.js/blob/canary/release.js) - [Commits](vercel/next.js@v16.2.4...v16.2.6) --- updated-dependencies: - dependency-name: next dependency-version: 16.2.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump protobufjs in /tests/pass_through_tests (#28296) Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.5.6 to 7.6.0. - [Release notes](https://github.com/protobufjs/protobuf.js/releases) - [Changelog](https://github.com/protobufjs/protobuf.js/blob/protobufjs-v7.6.0/CHANGELOG.md) - [Commits](protobufjs/protobuf.js@protobufjs-v7.5.6...protobufjs-v7.6.0) --- updated-dependencies: - dependency-name: protobufjs dependency-version: 7.6.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump ws from 8.20.0 to 8.20.1 in /tests/pass_through_tests (#28303) Bumps [ws](https://github.com/websockets/ws) from 8.20.0 to 8.20.1. - [Release notes](https://github.com/websockets/ws/releases) - [Commits](websockets/ws@8.20.0...8.20.1) --- updated-dependencies: - dependency-name: ws dependency-version: 8.20.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * test(e2e): forward LITELLM_LICENSE to UI e2e proxy (#28398) * test(e2e): forward LITELLM_LICENSE to UI e2e proxy The UI e2e job ran without LITELLM_LICENSE, so premium_user was always false in the issued login JWT and premium-gated UI surfaces (Team-BYOK Model switch, etc.) couldn't be driven through the UI. Forward the env var from run_e2e.sh and the CircleCI e2e_ui_testing job, and add a sanity test that decodes the admin storage state token and asserts premium_user=true so the wiring fails loudly if it ever regresses. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update ui/litellm-dashboard/e2e_tests/tests/proxy-admin/license.spec.ts Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add granian as a ASGI compliant web server. Provider better throughput stability, (#26027) * Add granian as a ASGI compliant web server. Provides better stability, 10-20 RPS improvement under standard LT conditions. TODO: Verify poetry lock details and add locust numbers to PR * Update granian version in license_cache.json and pyproject.toml to 2.5.7 * Enhance proxy CLI tests by adding SSL initialization checks for Granian server. Remove Python version skip conditions and implement tests to ensure SSL certificate and key are required for server initialization. * update uv lock to fix granian import error --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: harish-berri <harish@berri.ai>

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

yuneng-berri closed this May 21, 2026

yuneng-berri reopened this May 21, 2026

yuneng-berri requested a review from a team May 21, 2026 06:57

test(proxy): remove mutmut tests_dir comment

60e7321

yuneng-berri force-pushed the litellm_/ecstatic-hugle-b094e4 branch from 6cb0605 to 60e7321 Compare May 21, 2026 17:20

ryan-crabbe-berri approved these changes May 21, 2026

View reviewed changes

yuneng-berri enabled auto-merge (squash) May 21, 2026 18:28

shin-berri approved these changes May 21, 2026

View reviewed changes

yuneng-berri merged commit 67e6e5e into litellm_internal_staging May 21, 2026
116 of 117 checks passed

yuneng-berri mentioned this pull request May 22, 2026

test(proxy): behavior-pinning matrix for tier-2/3 key + team management endpoints #28620

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(proxy): behavior-pinning matrix for team management endpoints#28441

test(proxy): behavior-pinning matrix for team management endpoints#28441
yuneng-berri merged 4 commits into
litellm_internal_stagingfrom
litellm_/ecstatic-hugle-b094e4

yuneng-berri commented May 21, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 21, 2026

Uh oh!

greptile-apps Bot commented May 21, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

yuneng-berri commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR2 — Team Tier-1 behavior pinning

Scope

Exit gates

PR-specific metrics

Test plan

Uh oh!

codecov Bot commented May 21, 2026

Codecov Report

Uh oh!

greptile-apps Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuneng-berri commented May 21, 2026 •

edited

Loading

greptile-apps Bot commented May 21, 2026 •

edited

Loading