test(proxy/proxy_server): pin forwarding routes (PR2)#28887
Conversation
PR2 of the proxy_server.py behavior-pinning project: fills the 12 forwarding-route test files added by the harness PR with happy + error pins for all 52 LLM-facing routes (models, chat/completions, completions, embeddings, moderations, audio, assistants, threads, utils, model-info, model-metrics, queue). Every happy-path test asserts the full response dict via normalize() so the gate enforces real shape pinning rather than status codes.
Apply the same review feedback Greptile gave on PR1 (#28856) and PR3 (#28850) to PR2's forwarding-route tests: - Replace permissive `>= 400` / `in (X, Y)` status assertions with the exact 500/405 the handler actually returns, so a regression that silently shifts the code now fails the pin. - Add a body-presence check alongside each tightened status assertion to satisfy _pin_check.py's no-status-only rule.
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR fills in the 12 placeholder test files introduced by the harness PR, adding 105 behavior-pinning tests across 52 LLM-facing forwarding routes in
Confidence Score: 4/5Safe to merge; all changes are new test files with no impact on production code paths. All 12 files are pure test additions with no production-code changes. The mock contracts are accurate — the double-coroutine pattern in audio fixtures reflects real production behavior, and the litellm.utils.return_raw_request patch works because the handler does a local import at call time. Two small quality gaps exist: the completions error fixture omits the _handle_llm_api_exception stub (pinning a 500 crash instead of the clean error path), and a disjunct in one model-metrics assertion is always true for any error response. test_routes_completions.py (incomplete error fixture) and test_routes_model_metrics.py (weak body assertion + redundant intermediate asserts)
|
| Filename | Overview |
|---|---|
| tests/test_litellm/proxy/proxy_server/test_routes_chat_completions.py | Pins four aliased POST chat-completion routes; happy and error paths are well-structured with both base_process_llm_request and _handle_llm_api_exception stubbed. |
| tests/test_litellm/proxy/proxy_server/test_routes_completions.py | Pins four aliased text-completion routes; error fixture omits _handle_llm_api_exception stub (unlike its chat-completions counterpart), causing the error test to pin a 500 crash rather than a clean error response. |
| tests/test_litellm/proxy/proxy_server/test_routes_embeddings.py | Pins four aliased embedding routes with both ProxyBaseLLMRequestProcessing stubs and _handle_llm_api_exception stub; looks correct. |
| tests/test_litellm/proxy/proxy_server/test_routes_models.py | Pins GET /v1/models, /models, and /models/{id} aliases; fixture correctly stubs router, proxy_utils helpers, and llm provider lookup. |
| tests/test_litellm/proxy/proxy_server/test_routes_moderations.py | Pins two aliased moderation routes via route_request stub; double-await pattern matches production code; looks correct. |
| tests/test_litellm/proxy/proxy_server/test_routes_audio.py | Pins speech and transcription routes; the double-async pattern (_fake_route_request returns an unawaited coroutine) correctly mirrors production's double-await (llm_call = await route_request; response = await llm_call). |
| tests/test_litellm/proxy/proxy_server/test_routes_assistants.py | Pins six assistant routes (GET/POST/DELETE with /v1 and bare aliases); fixtures are clean and well-structured. |
| tests/test_litellm/proxy/proxy_server/test_routes_threads.py | Pins ten thread routes (create/get thread, add/get messages, run) with deterministic router stubs; good parametrize coverage. |
| tests/test_litellm/proxy/proxy_server/test_routes_utils.py | Pins token_counter, supported_openai_params, and transform_request utils; litellm.utils.return_raw_request patching works correctly because the function performs a local import at call time. |
| tests/test_litellm/proxy/proxy_server/test_routes_model_info.py | Pins v2/v1/model/info and model_group/info routes; fixtures cleanly stub router, model_list, and _get_proxy_model_info. |
| tests/test_litellm/proxy/proxy_server/test_routes_model_metrics.py | Pins six metric/settings/alerting routes; two issues: trivially-true error-body disjunct in test_alerting_settings_non_admin_error, and redundant intermediate assertions before summary dict checks. |
| tests/test_litellm/proxy/proxy_server/test_routes_queue.py | Pins the single queue/chat/completions route with both happy and no-router error paths; clean and correct. |
Reviews (1): Last reviewed commit: "test(proxy): tighten PR2 error-path stat..." | Re-trigger Greptile
| monkeypatch.setattr(proxy_server, "general_settings", {}) | ||
|
|
||
| with auth_as(LitellmUserRoles.PROXY_ADMIN): | ||
| response = client.get("/alerting/settings") |
There was a problem hiding this comment.
Trivially-true disjunct weakens error-body pin
"error" in response.text is true for virtually every HTTP error response FastAPI emits, so the or branch makes the assertion pass regardless of the actual error message. The intent is to distinguish this non-admin 400 from a silent empty body, but the current guard doesn't verify any meaningful content.
Consider pinning the exact status code (done ✓) and replacing the loose disjunct with a single specific string that actually appears in the INTERNAL_USER permission-denied message, e.g. assert "internal_user" in response.text.lower().
| "body_length": 1, | ||
| } | ||
|
|
||
|
|
||
| def test_model_settings_method_not_allowed(client, auth_as): | ||
| """Pins ``GET /model/settings`` (error: wrong method).""" | ||
| with auth_as(): | ||
| response = client.post("/model/settings", json={}) | ||
| assert response.status_code == 405 | ||
| assert len(response.content) > 0 | ||
|
|
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # GET /alerting/settings | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def test_alerting_settings_no_db_error(client, auth_as, no_prisma): | ||
| """Pins ``GET /alerting/settings`` (error: db not connected).""" |
There was a problem hiding this comment.
Redundant intermediate assertion before summary dict check
assert body == [{"name": "openai", "fields": []}] is a complete, strong pin on the full response body. The summary dict and its assertion immediately afterwards re-derive a strict subset of the same information, making that block redundant. The same pattern appears in test_alerting_settings_happy (with assert body[0]["field_name"] == "slack_alerting" before the summary). Either keep the full-body assertion and drop the summary, or drop the full-body assertion and let the summary carry the pin — but having both is noisy and gives a false impression that the summary is an additional check.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| @pytest.fixture | ||
| def completion_pipeline_raises(monkeypatch): | ||
| monkeypatch.setattr(proxy_server, "llm_router", MagicMock()) | ||
| monkeypatch.setattr( | ||
| proxy_server, "proxy_logging_obj", MagicMock(post_call_failure_hook=AsyncMock()) | ||
| ) | ||
|
|
||
| async def _raise(self, *args, **kwargs): | ||
| raise ValueError("boom") | ||
|
|
||
| monkeypatch.setattr( | ||
| common_request_processing.ProxyBaseLLMRequestProcessing, | ||
| "base_process_llm_request", | ||
| _raise, | ||
| ) | ||
| yield | ||
|
|
||
|
|
||
| _COMPLETION_PATHS = [ | ||
| "/v1/completions", | ||
| "/completions", | ||
| "/engines/gpt-3.5-turbo-instruct/completions", | ||
| "/openai/deployments/gpt-3.5-turbo-instruct/completions", | ||
| ] | ||
|
|
||
|
|
There was a problem hiding this comment.
_handle_llm_api_exception not stubbed — error path pins crash behavior, not clean 400
completion_pipeline_raises stubs only base_process_llm_request to raise ValueError, leaving _handle_llm_api_exception unstubbed. The parallel fixture patched_chat_error (in test_routes_chat_completions.py) explicitly stubs _handle_llm_api_exception to return a ProxyException(code=400) and asserts response.status_code == 400. Here the test asserts 500, meaning the exception escapes through the handler unprocessed. If the completions route does route through _handle_llm_api_exception in production (as its sibling chat-completions endpoint does), this pin is locking in an accidentally-uncleaned mock state rather than the real error contract. Verify that the completions endpoint genuinely bypasses _handle_llm_api_exception, or add the matching stub and update the expected status to 400.
* test(proxy/proxy_server): pin forwarding routes (PR2) (#28887) * test(proxy): pin proxy_server.py forwarding-route behavior PR2 of the proxy_server.py behavior-pinning project: fills the 12 forwarding-route test files added by the harness PR with happy + error pins for all 52 LLM-facing routes (models, chat/completions, completions, embeddings, moderations, audio, assistants, threads, utils, model-info, model-metrics, queue). Every happy-path test asserts the full response dict via normalize() so the gate enforces real shape pinning rather than status codes. * test(proxy): drop task-plumbing comments from PR2 test files * test(proxy): tighten PR2 error-path status-code pins Apply the same review feedback Greptile gave on PR1 (#28856) and PR3 (#28850) to PR2's forwarding-route tests: - Replace permissive `>= 400` / `in (X, Y)` status assertions with the exact 500/405 the handler actually returns, so a regression that silently shifts the code now fails the pin. - Add a body-presence check alongside each tightened status assertion to satisfy _pin_check.py's no-status-only rule. --------- Co-authored-by: Claude <noreply@anthropic.com> * test(proxy): pin proxy_server.py non-route surface behavior (PR1) (#28856) * test(proxy): pin proxy_server.py non-route surface behavior (PR1) Fills the 7 PR1 placeholder files under tests/test_litellm/proxy/proxy_server/ with behavior pins for the non-route surface of proxy_server.py: lifecycle/init/shutdown, ProxyConfig class methods, DB-overlay config scrubbers, spend counters, background-health helpers, OpenAPI customization, exception handlers, and streaming-generator helpers. 233 tests cover 101 pin-list symbols (1+ happy + 1+ error each). New-tests-only coverage on litellm/proxy/proxy_server.py: 32.80% line / 20.91% branch (PR1 gate: 25% line / 18% branch). Full directory runs in ~22s with -n 4. Plan: https://www.notion.so/Plan-Pin-proxy_server-py-behavior-2026-05-25-36c43b8acdab81ee845fd5365128a2fc * test(proxy): address Greptile review comments on test_lifecycle.py - test_initialize_signature_is_async_with_expected_params: hard-code expected_param_count so a signature change actually trips the gate (previously both sides of the comparison were len(sig.parameters)). - test_check_request_disconnection_invalid_when_connected_times_out: patch asyncio.sleep so the test no longer spins for ~1.2 s of real wall-clock; timeout lowered to 0.05 s. --------- Co-authored-by: Claude <noreply@anthropic.com> * test(proxy/proxy_server): pin control-plane routes (PR3) (#28850) * test(proxy/proxy_server): pin misc routes (PR3, partial) Adds happy + error tests for the misc control-plane routes: GET /, /routes, /adaptive_router/state, /get_logo_url, /get_image, /get_favicon. Also gitignores .pin_list.txt (used by the pin gate). * test(proxy/proxy_server): pin login/SSO routes (PR3, partial) Adds happy + error tests for the 5 login/SSO control-plane routes: GET /fallback/login, POST /login, POST /v2/login, POST /v3/login, POST /v3/login/exchange. Mocks authenticate_user and create_ui_token_object at their imported location. * test(proxy/proxy_server): pin onboarding routes (PR3, partial) Adds happy + error tests for the 2 onboarding control-plane routes: GET /onboarding/get_token, POST /onboarding/claim_token. Wires a MagicMock async context manager for prisma_client.db.tx() and signs the onboarding JWT with the patched master_key. * test(proxy/proxy_server): pin model_cost_map reload routes (PR3, partial) Adds happy + error tests for the 5 model-cost-map control-plane routes: POST /reload/model_cost_map, POST|DELETE|GET /schedule/model_cost_map_reload(/status), GET /model/cost_map/source. Attaches litellm_config to mock_prisma per-test (the table is not in the default _PRISMA_TABLES fixture). * test(proxy/proxy_server): pin anthropic_beta_headers reload routes (PR3, partial) Adds happy + error tests for the 4 anthropic-beta-headers control-plane routes: POST /reload/anthropic_beta_headers, POST|DELETE|GET /schedule/anthropic_beta_headers_reload(/status). Stubs db.litellm_config (not in default _PRISMA_TABLES) and monkeypatches reload_beta_headers_config so no network calls fire. * test(proxy/proxy_server): pin invitation routes (PR3, partial) Adds happy + error tests for the 4 invitation control-plane routes: POST /invitation/new, GET /invitation/info, POST /invitation/update, POST /invitation/delete. Patches _user_has_admin_privileges / _user_has_admin_view to avoid extensive get_user_object mocking. * test(proxy/proxy_server): pin config CRUD routes (PR3, partial) Adds happy + error tests for the 8 config-CRUD control-plane routes: POST /config/update, POST|GET /config/field/update|info, GET /config/list, POST /config/field/delete, POST /config/callback/delete, GET /get/config/callbacks, GET /config/yaml. Attaches litellm_config to mock_prisma per-test. * test(proxy/proxy_server): tighten pin assertions per review - test_routes_misc.py: `b"" in response.content` is trivially true; replace with `len(response.content) > 0` so an empty 405 body trips the gate. - test_routes_login_sso.py: `len(response.content) >= 0` is trivially true; tighten to `> 0`. - test_routes_anthropic_beta.py: replace brittle string-literal checks on the serialized JSON (`'"interval_hours": 12' in payload`) with `json.loads` + dict access so the assertion survives any serializer spacing. - test_routes_config.py: `assert status_code in (404, 500)` was too permissive; the handler re-raises HTTPException(404) verbatim, so pin 404 strictly. --------- Co-authored-by: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
Summary
PR2 of the proxy_server.py behavior-pinning project — fills the 12 PR2 placeholder files left empty by the harness PR (#28827) with happy + error tests for the 52 LLM-facing forwarding routes in the pin list. Pins HTTP-boundary behavior of models, chat/completions, completions, embeddings, moderations, audio, assistants, threads, llm-utils, model-info, model-metrics, and queue routes via
normalize(response.json()) == {...}dict-equality assertions, so future refactors oflitellm/proxy/proxy_server.pycan't silently change response shapes.What's covered
test_routes_models.pytest_routes_chat_completions.pytest_routes_completions.pytest_routes_embeddings.pytest_routes_moderations.pytest_routes_audio.pytest_routes_assistants.pytest_routes_threads.pytest_routes_utils.pytest_routes_model_info.pytest_routes_model_metrics.pytest_routes_queue.pyEach pin has ≥1 happy-path test with a strong assertion (
normalize(response.json()) == {...}or dict-eq with ≥3 keys) and ≥1 error-path test (4xx/5xx with body-presence assertion). Aliased route paths sharing one handler (e.g./v1/chat/completions+/chat/completions+/engines/{model:path}/chat/completions+/openai/deployments/{model:path}/chat/completions) are tested via@pytest.mark.parametrize.Mock contract
All routes are stubbed at one of three boundaries — no real LLM calls, no real Router, no real Prisma:
ProxyBaseLLMRequestProcessing.base_process_llm_requestto return a deterministic dict (happy) or raise +_handle_llm_api_exception(error).route_requestto return a deterministic awaitable.llm_router.<method>async call, or setllm_router/prisma_clienttoNonefor the error branch.Test plan
uv run pytest tests/test_litellm/proxy/proxy_server/ -n 4→ 124 passed in ~11s (the 19 extra come from the harness'stest_harness_smoke.py).python tests/test_litellm/proxy/proxy_server/_pin_check.py --list .pin_list.txt→ PASS (52 pins, 59 PR2 tests, zerostatus-onlyfailures).python tests/test_litellm/proxy/proxy_server/_coverage_check.py(CI mode, self-detected target) → PASS (24.27% line, 7.50% branch — exceeds the PR0 baseline; the 50% / 38% PR2 gate is additive on top of PR1 and self-activates once PR1 lands).uv run black tests/test_litellm/proxy/proxy_server/— clean.Notes for reviewers
tests/test_litellm/proxy/proxy_server/conftest.py(no new conftests, no fixtures defined inline; happy-path / error-path setup lives per-test-file as module-scoped@pytest.fixturehelpers).== 500/== 405) rather than a permissive>= 400orin (X, Y), and every error test carries a body-presence assertion so_pin_check.py's no-status-only rule fires correctly.Out of scope
Router-backed end-to-end pinning — explicitly out of scope per the plan's mock-contract decision (Decision Added OpenRouter support #8).Generated by Claude Code