test(proxy/proxy_server): pin forwarding routes (PR2) by yuneng-berri · Pull Request #28887 · BerriAI/litellm

yuneng-berri · 2026-05-26T18:32:56Z

Summary

PR2 of the proxy_server.py behavior-pinning project — fills the 12 PR2 placeholder files left empty by the harness PR (#28827) with happy + error tests for the 52 LLM-facing forwarding routes in the pin list. Pins HTTP-boundary behavior of models, chat/completions, completions, embeddings, moderations, audio, assistants, threads, llm-utils, model-info, model-metrics, and queue routes via normalize(response.json()) == {...} dict-equality assertions, so future refactors of litellm/proxy/proxy_server.py can't silently change response shapes.

Plan: https://www.notion.so/Plan-Pin-proxy_server-py-behavior-2026-05-25-36c43b8acdab81ee845fd5365128a2fc
PR2 pin list: https://www.notion.so/PR2-Pin-List-Forwarding-routes-36c43b8acdab81b09868fa69a95efbd3
Sibling PRs: PR1 test(proxy): pin proxy_server.py non-route surface behavior (PR1) #28856 (non-route surface) and PR3 test(proxy/proxy_server): pin control-plane routes (PR3) #28850 (control-plane routes) — independent, target the same base, expected to land in parallel.

What's covered

File	Routes	Tests
`test_routes_models.py`	4	8
`test_routes_chat_completions.py`	4	8
`test_routes_completions.py`	4	8
`test_routes_embeddings.py`	4	8
`test_routes_moderations.py`	2	4
`test_routes_audio.py`	4	8
`test_routes_assistants.py`	6	12
`test_routes_threads.py`	10	20
`test_routes_utils.py`	3	6
`test_routes_model_info.py`	4	8
`test_routes_model_metrics.py`	6	13
`test_routes_queue.py`	1	2
Total	52	105

Each pin has ≥1 happy-path test with a strong assertion (normalize(response.json()) == {...} or dict-eq with ≥3 keys) and ≥1 error-path test (4xx/5xx with body-presence assertion). Aliased route paths sharing one handler (e.g. /v1/chat/completions + /chat/completions + /engines/{model:path}/chat/completions + /openai/deployments/{model:path}/chat/completions) are tested via @pytest.mark.parametrize.

Mock contract

All routes are stubbed at one of three boundaries — no real LLM calls, no real Router, no real Prisma:

Chat/completions/embeddings: monkeypatch ProxyBaseLLMRequestProcessing.base_process_llm_request to return a deterministic dict (happy) or raise + _handle_llm_api_exception (error).
Moderations / audio: monkeypatch route_request to return a deterministic awaitable.
Assistants / threads / queue / models / model-info / model-metrics: monkeypatch the specific llm_router.<method> async call, or set llm_router / prisma_client to None for the error branch.

Test plan

uv run pytest tests/test_litellm/proxy/proxy_server/ -n 4 → 124 passed in ~11s (the 19 extra come from the harness's test_harness_smoke.py).
python tests/test_litellm/proxy/proxy_server/_pin_check.py --list .pin_list.txt → PASS (52 pins, 59 PR2 tests, zero status-only failures).
python tests/test_litellm/proxy/proxy_server/_coverage_check.py (CI mode, self-detected target) → PASS (24.27% line, 7.50% branch — exceeds the PR0 baseline; the 50% / 38% PR2 gate is additive on top of PR1 and self-activates once PR1 lands).
Runtime well under the 3-min PR2 budget.
uv run black tests/test_litellm/proxy/proxy_server/ — clean.

Notes for reviewers

All tests use only fixtures from tests/test_litellm/proxy/proxy_server/conftest.py (no new conftests, no fixtures defined inline; happy-path / error-path setup lives per-test-file as module-scoped @pytest.fixture helpers).
Greptile-bot review feedback applied to PR1 (test(proxy): pin proxy_server.py non-route surface behavior (PR1) #28856) and PR3 (test(proxy/proxy_server): pin control-plane routes (PR3) #28850) was pre-emptively addressed here: every error-path assertion pins the exact status code (== 500 / == 405) rather than a permissive >= 400 or in (X, Y), and every error test carries a body-presence assertion so _pin_check.py's no-status-only rule fires correctly.
No deletion of existing scattered tests (cleanup deferred per plan).

Out of scope

Non-route helpers (PR1) and control-plane routes (PR3) — separate PRs targeting the same base.
Real Router-backed end-to-end pinning — explicitly out of scope per the plan's mock-contract decision (Decision Added OpenRouter support #8).

Generated by Claude Code

PR2 of the proxy_server.py behavior-pinning project: fills the 12 forwarding-route test files added by the harness PR with happy + error pins for all 52 LLM-facing routes (models, chat/completions, completions, embeddings, moderations, audio, assistants, threads, utils, model-info, model-metrics, queue). Every happy-path test asserts the full response dict via normalize() so the gate enforces real shape pinning rather than status codes.

Apply the same review feedback Greptile gave on PR1 (#28856) and PR3 (#28850) to PR2's forwarding-route tests: - Replace permissive `>= 400` / `in (X, Y)` status assertions with the exact 500/405 the handler actually returns, so a regression that silently shifts the code now fails the pin. - Add a body-presence check alongside each tightened status assertion to satisfy _pin_check.py's no-status-only rule.

CLAassistant · 2026-05-26T18:33:04Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2026-05-26T18:37:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-26T18:39:32Z

Greptile Summary

This PR fills in the 12 placeholder test files introduced by the harness PR, adding 105 behavior-pinning tests across 52 LLM-facing forwarding routes in proxy_server.py. All routes are stubbed at well-chosen mock boundaries (no real LLM calls, no real Router, no real Prisma), and every test covers both a happy path with strong dict-equality assertions via normalize() and an error path with an exact status-code check plus body-presence assertion.

Happy-path mocks for chat/completions/embeddings correctly intercept ProxyBaseLLMRequestProcessing.base_process_llm_request; audio/moderation routes use route_request with a double-coroutine pattern that mirrors production's own double-await (llm_call = await route_request(...) / response = await llm_call).
Assistants, threads, and queue routes stub individual llm_router async methods; model-info and model-metrics routes stub Prisma and router helpers.
Minor inconsistency in test_routes_completions.py: the error fixture omits the _handle_llm_api_exception stub (present in the sibling chat-completions fixture), causing the test to pin a 500 crash rather than the clean error response the handler produces when properly wired.

Confidence Score: 4/5

Safe to merge; all changes are new test files with no impact on production code paths.

All 12 files are pure test additions with no production-code changes. The mock contracts are accurate — the double-coroutine pattern in audio fixtures reflects real production behavior, and the litellm.utils.return_raw_request patch works because the handler does a local import at call time. Two small quality gaps exist: the completions error fixture omits the _handle_llm_api_exception stub (pinning a 500 crash instead of the clean error path), and a disjunct in one model-metrics assertion is always true for any error response.

test_routes_completions.py (incomplete error fixture) and test_routes_model_metrics.py (weak body assertion + redundant intermediate asserts)

Important Files Changed

Filename	Overview
tests/test_litellm/proxy/proxy_server/test_routes_chat_completions.py	Pins four aliased POST chat-completion routes; happy and error paths are well-structured with both base_process_llm_request and _handle_llm_api_exception stubbed.
tests/test_litellm/proxy/proxy_server/test_routes_completions.py	Pins four aliased text-completion routes; error fixture omits _handle_llm_api_exception stub (unlike its chat-completions counterpart), causing the error test to pin a 500 crash rather than a clean error response.
tests/test_litellm/proxy/proxy_server/test_routes_embeddings.py	Pins four aliased embedding routes with both ProxyBaseLLMRequestProcessing stubs and _handle_llm_api_exception stub; looks correct.
tests/test_litellm/proxy/proxy_server/test_routes_models.py	Pins GET /v1/models, /models, and /models/{id} aliases; fixture correctly stubs router, proxy_utils helpers, and llm provider lookup.
tests/test_litellm/proxy/proxy_server/test_routes_moderations.py	Pins two aliased moderation routes via route_request stub; double-await pattern matches production code; looks correct.
tests/test_litellm/proxy/proxy_server/test_routes_audio.py	Pins speech and transcription routes; the double-async pattern (_fake_route_request returns an unawaited coroutine) correctly mirrors production's double-await (llm_call = await route_request; response = await llm_call).
tests/test_litellm/proxy/proxy_server/test_routes_assistants.py	Pins six assistant routes (GET/POST/DELETE with /v1 and bare aliases); fixtures are clean and well-structured.
tests/test_litellm/proxy/proxy_server/test_routes_threads.py	Pins ten thread routes (create/get thread, add/get messages, run) with deterministic router stubs; good parametrize coverage.
tests/test_litellm/proxy/proxy_server/test_routes_utils.py	Pins token_counter, supported_openai_params, and transform_request utils; litellm.utils.return_raw_request patching works correctly because the function performs a local import at call time.
tests/test_litellm/proxy/proxy_server/test_routes_model_info.py	Pins v2/v1/model/info and model_group/info routes; fixtures cleanly stub router, model_list, and _get_proxy_model_info.
tests/test_litellm/proxy/proxy_server/test_routes_model_metrics.py	Pins six metric/settings/alerting routes; two issues: trivially-true error-body disjunct in test_alerting_settings_non_admin_error, and redundant intermediate assertions before summary dict checks.
tests/test_litellm/proxy/proxy_server/test_routes_queue.py	Pins the single queue/chat/completions route with both happy and no-router error paths; clean and correct.

_{Reviews (1): Last reviewed commit: "test(proxy): tighten PR2 error-path stat..." | Re-trigger Greptile}

greptile-apps · 2026-05-26T18:39:37Z

+    monkeypatch.setattr(proxy_server, "general_settings", {})
+
+    with auth_as(LitellmUserRoles.PROXY_ADMIN):
+        response = client.get("/alerting/settings")


Trivially-true disjunct weakens error-body pin

"error" in response.text is true for virtually every HTTP error response FastAPI emits, so the or branch makes the assertion pass regardless of the actual error message. The intent is to distinguish this non-admin 400 from a silent empty body, but the current guard doesn't verify any meaningful content.

Consider pinning the exact status code (done ✓) and replacing the loose disjunct with a single specific string that actually appears in the INTERNAL_USER permission-denied message, e.g. assert "internal_user" in response.text.lower().

greptile-apps · 2026-05-26T18:39:40Z

+        "body_length": 1,
+    }
+
+
+def test_model_settings_method_not_allowed(client, auth_as):
+    """Pins ``GET /model/settings`` (error: wrong method)."""
+    with auth_as():
+        response = client.post("/model/settings", json={})
+    assert response.status_code == 405
+    assert len(response.content) > 0
+
+
+# ---------------------------------------------------------------------------
+# GET /alerting/settings
+# ---------------------------------------------------------------------------
+
+
+def test_alerting_settings_no_db_error(client, auth_as, no_prisma):
+    """Pins ``GET /alerting/settings`` (error: db not connected)."""


Redundant intermediate assertion before summary dict check

assert body == [{"name": "openai", "fields": []}] is a complete, strong pin on the full response body. The summary dict and its assertion immediately afterwards re-derive a strict subset of the same information, making that block redundant. The same pattern appears in test_alerting_settings_happy (with assert body[0]["field_name"] == "slack_alerting" before the summary). Either keep the full-body assertion and drop the summary, or drop the full-body assertion and let the summary carry the pin — but having both is noisy and gives a false impression that the summary is an additional check.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-05-26T18:39:43Z

+@pytest.fixture
+def completion_pipeline_raises(monkeypatch):
+    monkeypatch.setattr(proxy_server, "llm_router", MagicMock())
+    monkeypatch.setattr(
+        proxy_server, "proxy_logging_obj", MagicMock(post_call_failure_hook=AsyncMock())
+    )
+
+    async def _raise(self, *args, **kwargs):
+        raise ValueError("boom")
+
+    monkeypatch.setattr(
+        common_request_processing.ProxyBaseLLMRequestProcessing,
+        "base_process_llm_request",
+        _raise,
+    )
+    yield
+
+
+_COMPLETION_PATHS = [
+    "/v1/completions",
+    "/completions",
+    "/engines/gpt-3.5-turbo-instruct/completions",
+    "/openai/deployments/gpt-3.5-turbo-instruct/completions",
+]
+
+


_handle_llm_api_exception not stubbed — error path pins crash behavior, not clean 400

completion_pipeline_raises stubs only base_process_llm_request to raise ValueError, leaving _handle_llm_api_exception unstubbed. The parallel fixture patched_chat_error (in test_routes_chat_completions.py) explicitly stubs _handle_llm_api_exception to return a ProxyException(code=400) and asserts response.status_code == 400. Here the test asserts 500, meaning the exception escapes through the handler unprocessed. If the completions route does route through _handle_llm_api_exception in production (as its sibling chat-completions endpoint does), this pin is locking in an accidentally-uncleaned mock state rather than the real error contract. Verify that the completions endpoint genuinely bypasses _handle_llm_api_exception, or add the matching stub and update the expected status to 400.

* test(proxy/proxy_server): pin forwarding routes (PR2) (#28887) * test(proxy): pin proxy_server.py forwarding-route behavior PR2 of the proxy_server.py behavior-pinning project: fills the 12 forwarding-route test files added by the harness PR with happy + error pins for all 52 LLM-facing routes (models, chat/completions, completions, embeddings, moderations, audio, assistants, threads, utils, model-info, model-metrics, queue). Every happy-path test asserts the full response dict via normalize() so the gate enforces real shape pinning rather than status codes. * test(proxy): drop task-plumbing comments from PR2 test files * test(proxy): tighten PR2 error-path status-code pins Apply the same review feedback Greptile gave on PR1 (#28856) and PR3 (#28850) to PR2's forwarding-route tests: - Replace permissive `>= 400` / `in (X, Y)` status assertions with the exact 500/405 the handler actually returns, so a regression that silently shifts the code now fails the pin. - Add a body-presence check alongside each tightened status assertion to satisfy _pin_check.py's no-status-only rule. --------- Co-authored-by: Claude <noreply@anthropic.com> * test(proxy): pin proxy_server.py non-route surface behavior (PR1) (#28856) * test(proxy): pin proxy_server.py non-route surface behavior (PR1) Fills the 7 PR1 placeholder files under tests/test_litellm/proxy/proxy_server/ with behavior pins for the non-route surface of proxy_server.py: lifecycle/init/shutdown, ProxyConfig class methods, DB-overlay config scrubbers, spend counters, background-health helpers, OpenAPI customization, exception handlers, and streaming-generator helpers. 233 tests cover 101 pin-list symbols (1+ happy + 1+ error each). New-tests-only coverage on litellm/proxy/proxy_server.py: 32.80% line / 20.91% branch (PR1 gate: 25% line / 18% branch). Full directory runs in ~22s with -n 4. Plan: https://www.notion.so/Plan-Pin-proxy_server-py-behavior-2026-05-25-36c43b8acdab81ee845fd5365128a2fc * test(proxy): address Greptile review comments on test_lifecycle.py - test_initialize_signature_is_async_with_expected_params: hard-code expected_param_count so a signature change actually trips the gate (previously both sides of the comparison were len(sig.parameters)). - test_check_request_disconnection_invalid_when_connected_times_out: patch asyncio.sleep so the test no longer spins for ~1.2 s of real wall-clock; timeout lowered to 0.05 s. --------- Co-authored-by: Claude <noreply@anthropic.com> * test(proxy/proxy_server): pin control-plane routes (PR3) (#28850) * test(proxy/proxy_server): pin misc routes (PR3, partial) Adds happy + error tests for the misc control-plane routes: GET /, /routes, /adaptive_router/state, /get_logo_url, /get_image, /get_favicon. Also gitignores .pin_list.txt (used by the pin gate). * test(proxy/proxy_server): pin login/SSO routes (PR3, partial) Adds happy + error tests for the 5 login/SSO control-plane routes: GET /fallback/login, POST /login, POST /v2/login, POST /v3/login, POST /v3/login/exchange. Mocks authenticate_user and create_ui_token_object at their imported location. * test(proxy/proxy_server): pin onboarding routes (PR3, partial) Adds happy + error tests for the 2 onboarding control-plane routes: GET /onboarding/get_token, POST /onboarding/claim_token. Wires a MagicMock async context manager for prisma_client.db.tx() and signs the onboarding JWT with the patched master_key. * test(proxy/proxy_server): pin model_cost_map reload routes (PR3, partial) Adds happy + error tests for the 5 model-cost-map control-plane routes: POST /reload/model_cost_map, POST|DELETE|GET /schedule/model_cost_map_reload(/status), GET /model/cost_map/source. Attaches litellm_config to mock_prisma per-test (the table is not in the default _PRISMA_TABLES fixture). * test(proxy/proxy_server): pin anthropic_beta_headers reload routes (PR3, partial) Adds happy + error tests for the 4 anthropic-beta-headers control-plane routes: POST /reload/anthropic_beta_headers, POST|DELETE|GET /schedule/anthropic_beta_headers_reload(/status). Stubs db.litellm_config (not in default _PRISMA_TABLES) and monkeypatches reload_beta_headers_config so no network calls fire. * test(proxy/proxy_server): pin invitation routes (PR3, partial) Adds happy + error tests for the 4 invitation control-plane routes: POST /invitation/new, GET /invitation/info, POST /invitation/update, POST /invitation/delete. Patches _user_has_admin_privileges / _user_has_admin_view to avoid extensive get_user_object mocking. * test(proxy/proxy_server): pin config CRUD routes (PR3, partial) Adds happy + error tests for the 8 config-CRUD control-plane routes: POST /config/update, POST|GET /config/field/update|info, GET /config/list, POST /config/field/delete, POST /config/callback/delete, GET /get/config/callbacks, GET /config/yaml. Attaches litellm_config to mock_prisma per-test. * test(proxy/proxy_server): tighten pin assertions per review - test_routes_misc.py: `b"" in response.content` is trivially true; replace with `len(response.content) > 0` so an empty 405 body trips the gate. - test_routes_login_sso.py: `len(response.content) >= 0` is trivially true; tighten to `> 0`. - test_routes_anthropic_beta.py: replace brittle string-literal checks on the serialized JSON (`'"interval_hours": 12' in payload`) with `json.loads` + dict access so the assertion survives any serializer spacing. - test_routes_config.py: `assert status_code in (404, 500)` was too permissive; the handler re-raises HTTPException(404) verbatim, so pin 404 strictly. --------- Co-authored-by: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

claude added 3 commits May 26, 2026 06:58

test(proxy): drop task-plumbing comments from PR2 test files

3704c67

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

yuneng-berri changed the base branch from litellm_internal_staging to litellm_yj_may29 May 29, 2026 22:40

yuneng-berri merged commit c3e5251 into litellm_yj_may29 May 29, 2026
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(proxy/proxy_server): pin forwarding routes (PR2)#28887

test(proxy/proxy_server): pin forwarding routes (PR2)#28887
yuneng-berri merged 3 commits into
litellm_yj_may29from
claude/affectionate-einstein-B7mKS

yuneng-berri commented May 26, 2026

Uh oh!

CLAassistant commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026

Important Files Changed

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

yuneng-berri commented May 26, 2026

Summary

What's covered

Mock contract

Test plan

Notes for reviewers

Out of scope

Uh oh!

CLAassistant commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026

Codecov Report

Uh oh!

greptile-apps Bot commented May 26, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants