Skip to content

test(proxy/proxy_server): pin forwarding routes (PR2)#28887

Merged
yuneng-berri merged 3 commits into
litellm_yj_may29from
claude/affectionate-einstein-B7mKS
May 29, 2026
Merged

test(proxy/proxy_server): pin forwarding routes (PR2)#28887
yuneng-berri merged 3 commits into
litellm_yj_may29from
claude/affectionate-einstein-B7mKS

Conversation

@yuneng-berri

Copy link
Copy Markdown
Collaborator

Summary

PR2 of the proxy_server.py behavior-pinning project — fills the 12 PR2 placeholder files left empty by the harness PR (#28827) with happy + error tests for the 52 LLM-facing forwarding routes in the pin list. Pins HTTP-boundary behavior of models, chat/completions, completions, embeddings, moderations, audio, assistants, threads, llm-utils, model-info, model-metrics, and queue routes via normalize(response.json()) == {...} dict-equality assertions, so future refactors of litellm/proxy/proxy_server.py can't silently change response shapes.

What's covered

File Routes Tests
test_routes_models.py 4 8
test_routes_chat_completions.py 4 8
test_routes_completions.py 4 8
test_routes_embeddings.py 4 8
test_routes_moderations.py 2 4
test_routes_audio.py 4 8
test_routes_assistants.py 6 12
test_routes_threads.py 10 20
test_routes_utils.py 3 6
test_routes_model_info.py 4 8
test_routes_model_metrics.py 6 13
test_routes_queue.py 1 2
Total 52 105

Each pin has ≥1 happy-path test with a strong assertion (normalize(response.json()) == {...} or dict-eq with ≥3 keys) and ≥1 error-path test (4xx/5xx with body-presence assertion). Aliased route paths sharing one handler (e.g. /v1/chat/completions + /chat/completions + /engines/{model:path}/chat/completions + /openai/deployments/{model:path}/chat/completions) are tested via @pytest.mark.parametrize.

Mock contract

All routes are stubbed at one of three boundaries — no real LLM calls, no real Router, no real Prisma:

  • Chat/completions/embeddings: monkeypatch ProxyBaseLLMRequestProcessing.base_process_llm_request to return a deterministic dict (happy) or raise + _handle_llm_api_exception (error).
  • Moderations / audio: monkeypatch route_request to return a deterministic awaitable.
  • Assistants / threads / queue / models / model-info / model-metrics: monkeypatch the specific llm_router.<method> async call, or set llm_router / prisma_client to None for the error branch.

Test plan

  • uv run pytest tests/test_litellm/proxy/proxy_server/ -n 4124 passed in ~11s (the 19 extra come from the harness's test_harness_smoke.py).
  • python tests/test_litellm/proxy/proxy_server/_pin_check.py --list .pin_list.txtPASS (52 pins, 59 PR2 tests, zero status-only failures).
  • python tests/test_litellm/proxy/proxy_server/_coverage_check.py (CI mode, self-detected target) → PASS (24.27% line, 7.50% branch — exceeds the PR0 baseline; the 50% / 38% PR2 gate is additive on top of PR1 and self-activates once PR1 lands).
  • Runtime well under the 3-min PR2 budget.
  • uv run black tests/test_litellm/proxy/proxy_server/ — clean.

Notes for reviewers

  • All tests use only fixtures from tests/test_litellm/proxy/proxy_server/conftest.py (no new conftests, no fixtures defined inline; happy-path / error-path setup lives per-test-file as module-scoped @pytest.fixture helpers).
  • Greptile-bot review feedback applied to PR1 (test(proxy): pin proxy_server.py non-route surface behavior (PR1) #28856) and PR3 (test(proxy/proxy_server): pin control-plane routes (PR3) #28850) was pre-emptively addressed here: every error-path assertion pins the exact status code (== 500 / == 405) rather than a permissive >= 400 or in (X, Y), and every error test carries a body-presence assertion so _pin_check.py's no-status-only rule fires correctly.
  • No deletion of existing scattered tests (cleanup deferred per plan).

Out of scope

  • Non-route helpers (PR1) and control-plane routes (PR3) — separate PRs targeting the same base.
  • Real Router-backed end-to-end pinning — explicitly out of scope per the plan's mock-contract decision (Decision Added OpenRouter support #8).

Generated by Claude Code

claude added 3 commits May 26, 2026 06:58
PR2 of the proxy_server.py behavior-pinning project: fills the 12
forwarding-route test files added by the harness PR with happy + error
pins for all 52 LLM-facing routes (models, chat/completions, completions,
embeddings, moderations, audio, assistants, threads, utils, model-info,
model-metrics, queue). Every happy-path test asserts the full response
dict via normalize() so the gate enforces real shape pinning rather
than status codes.
Apply the same review feedback Greptile gave on PR1 (#28856) and PR3
(#28850) to PR2's forwarding-route tests:

- Replace permissive `>= 400` / `in (X, Y)` status assertions with the
  exact 500/405 the handler actually returns, so a regression that
  silently shifts the code now fails the pin.
- Add a body-presence check alongside each tightened status assertion
  to satisfy _pin_check.py's no-status-only rule.
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov

codecov Bot commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@greptile-apps

greptile-apps Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fills in the 12 placeholder test files introduced by the harness PR, adding 105 behavior-pinning tests across 52 LLM-facing forwarding routes in proxy_server.py. All routes are stubbed at well-chosen mock boundaries (no real LLM calls, no real Router, no real Prisma), and every test covers both a happy path with strong dict-equality assertions via normalize() and an error path with an exact status-code check plus body-presence assertion.

  • Happy-path mocks for chat/completions/embeddings correctly intercept ProxyBaseLLMRequestProcessing.base_process_llm_request; audio/moderation routes use route_request with a double-coroutine pattern that mirrors production's own double-await (llm_call = await route_request(...) / response = await llm_call).
  • Assistants, threads, and queue routes stub individual llm_router async methods; model-info and model-metrics routes stub Prisma and router helpers.
  • Minor inconsistency in test_routes_completions.py: the error fixture omits the _handle_llm_api_exception stub (present in the sibling chat-completions fixture), causing the test to pin a 500 crash rather than the clean error response the handler produces when properly wired.

Confidence Score: 4/5

Safe to merge; all changes are new test files with no impact on production code paths.

All 12 files are pure test additions with no production-code changes. The mock contracts are accurate — the double-coroutine pattern in audio fixtures reflects real production behavior, and the litellm.utils.return_raw_request patch works because the handler does a local import at call time. Two small quality gaps exist: the completions error fixture omits the _handle_llm_api_exception stub (pinning a 500 crash instead of the clean error path), and a disjunct in one model-metrics assertion is always true for any error response.

test_routes_completions.py (incomplete error fixture) and test_routes_model_metrics.py (weak body assertion + redundant intermediate asserts)

Important Files Changed

Filename Overview
tests/test_litellm/proxy/proxy_server/test_routes_chat_completions.py Pins four aliased POST chat-completion routes; happy and error paths are well-structured with both base_process_llm_request and _handle_llm_api_exception stubbed.
tests/test_litellm/proxy/proxy_server/test_routes_completions.py Pins four aliased text-completion routes; error fixture omits _handle_llm_api_exception stub (unlike its chat-completions counterpart), causing the error test to pin a 500 crash rather than a clean error response.
tests/test_litellm/proxy/proxy_server/test_routes_embeddings.py Pins four aliased embedding routes with both ProxyBaseLLMRequestProcessing stubs and _handle_llm_api_exception stub; looks correct.
tests/test_litellm/proxy/proxy_server/test_routes_models.py Pins GET /v1/models, /models, and /models/{id} aliases; fixture correctly stubs router, proxy_utils helpers, and llm provider lookup.
tests/test_litellm/proxy/proxy_server/test_routes_moderations.py Pins two aliased moderation routes via route_request stub; double-await pattern matches production code; looks correct.
tests/test_litellm/proxy/proxy_server/test_routes_audio.py Pins speech and transcription routes; the double-async pattern (_fake_route_request returns an unawaited coroutine) correctly mirrors production's double-await (llm_call = await route_request; response = await llm_call).
tests/test_litellm/proxy/proxy_server/test_routes_assistants.py Pins six assistant routes (GET/POST/DELETE with /v1 and bare aliases); fixtures are clean and well-structured.
tests/test_litellm/proxy/proxy_server/test_routes_threads.py Pins ten thread routes (create/get thread, add/get messages, run) with deterministic router stubs; good parametrize coverage.
tests/test_litellm/proxy/proxy_server/test_routes_utils.py Pins token_counter, supported_openai_params, and transform_request utils; litellm.utils.return_raw_request patching works correctly because the function performs a local import at call time.
tests/test_litellm/proxy/proxy_server/test_routes_model_info.py Pins v2/v1/model/info and model_group/info routes; fixtures cleanly stub router, model_list, and _get_proxy_model_info.
tests/test_litellm/proxy/proxy_server/test_routes_model_metrics.py Pins six metric/settings/alerting routes; two issues: trivially-true error-body disjunct in test_alerting_settings_non_admin_error, and redundant intermediate assertions before summary dict checks.
tests/test_litellm/proxy/proxy_server/test_routes_queue.py Pins the single queue/chat/completions route with both happy and no-router error paths; clean and correct.

Reviews (1): Last reviewed commit: "test(proxy): tighten PR2 error-path stat..." | Re-trigger Greptile

monkeypatch.setattr(proxy_server, "general_settings", {})

with auth_as(LitellmUserRoles.PROXY_ADMIN):
response = client.get("/alerting/settings")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Trivially-true disjunct weakens error-body pin

"error" in response.text is true for virtually every HTTP error response FastAPI emits, so the or branch makes the assertion pass regardless of the actual error message. The intent is to distinguish this non-admin 400 from a silent empty body, but the current guard doesn't verify any meaningful content.

Consider pinning the exact status code (done ✓) and replacing the loose disjunct with a single specific string that actually appears in the INTERNAL_USER permission-denied message, e.g. assert "internal_user" in response.text.lower().

Comment on lines +165 to +183
"body_length": 1,
}


def test_model_settings_method_not_allowed(client, auth_as):
"""Pins ``GET /model/settings`` (error: wrong method)."""
with auth_as():
response = client.post("/model/settings", json={})
assert response.status_code == 405
assert len(response.content) > 0


# ---------------------------------------------------------------------------
# GET /alerting/settings
# ---------------------------------------------------------------------------


def test_alerting_settings_no_db_error(client, auth_as, no_prisma):
"""Pins ``GET /alerting/settings`` (error: db not connected)."""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant intermediate assertion before summary dict check

assert body == [{"name": "openai", "fields": []}] is a complete, strong pin on the full response body. The summary dict and its assertion immediately afterwards re-derive a strict subset of the same information, making that block redundant. The same pattern appears in test_alerting_settings_happy (with assert body[0]["field_name"] == "slack_alerting" before the summary). Either keep the full-body assertion and drop the summary, or drop the full-body assertion and let the summary carry the pin — but having both is noisy and gives a false impression that the summary is an additional check.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +55 to +80
@pytest.fixture
def completion_pipeline_raises(monkeypatch):
monkeypatch.setattr(proxy_server, "llm_router", MagicMock())
monkeypatch.setattr(
proxy_server, "proxy_logging_obj", MagicMock(post_call_failure_hook=AsyncMock())
)

async def _raise(self, *args, **kwargs):
raise ValueError("boom")

monkeypatch.setattr(
common_request_processing.ProxyBaseLLMRequestProcessing,
"base_process_llm_request",
_raise,
)
yield


_COMPLETION_PATHS = [
"/v1/completions",
"/completions",
"/engines/gpt-3.5-turbo-instruct/completions",
"/openai/deployments/gpt-3.5-turbo-instruct/completions",
]


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _handle_llm_api_exception not stubbed — error path pins crash behavior, not clean 400

completion_pipeline_raises stubs only base_process_llm_request to raise ValueError, leaving _handle_llm_api_exception unstubbed. The parallel fixture patched_chat_error (in test_routes_chat_completions.py) explicitly stubs _handle_llm_api_exception to return a ProxyException(code=400) and asserts response.status_code == 400. Here the test asserts 500, meaning the exception escapes through the handler unprocessed. If the completions route does route through _handle_llm_api_exception in production (as its sibling chat-completions endpoint does), this pin is locking in an accidentally-uncleaned mock state rather than the real error contract. Verify that the completions endpoint genuinely bypasses _handle_llm_api_exception, or add the matching stub and update the expected status to 400.

@yuneng-berri yuneng-berri changed the base branch from litellm_internal_staging to litellm_yj_may29 May 29, 2026 22:40
@yuneng-berri yuneng-berri merged commit c3e5251 into litellm_yj_may29 May 29, 2026
46 checks passed
yuneng-berri added a commit that referenced this pull request May 30, 2026
* test(proxy/proxy_server): pin forwarding routes (PR2) (#28887)

* test(proxy): pin proxy_server.py forwarding-route behavior

PR2 of the proxy_server.py behavior-pinning project: fills the 12
forwarding-route test files added by the harness PR with happy + error
pins for all 52 LLM-facing routes (models, chat/completions, completions,
embeddings, moderations, audio, assistants, threads, utils, model-info,
model-metrics, queue). Every happy-path test asserts the full response
dict via normalize() so the gate enforces real shape pinning rather
than status codes.

* test(proxy): drop task-plumbing comments from PR2 test files

* test(proxy): tighten PR2 error-path status-code pins

Apply the same review feedback Greptile gave on PR1 (#28856) and PR3
(#28850) to PR2's forwarding-route tests:

- Replace permissive `>= 400` / `in (X, Y)` status assertions with the
  exact 500/405 the handler actually returns, so a regression that
  silently shifts the code now fails the pin.
- Add a body-presence check alongside each tightened status assertion
  to satisfy _pin_check.py's no-status-only rule.

---------

Co-authored-by: Claude <noreply@anthropic.com>

* test(proxy): pin proxy_server.py non-route surface behavior (PR1) (#28856)

* test(proxy): pin proxy_server.py non-route surface behavior (PR1)

Fills the 7 PR1 placeholder files under tests/test_litellm/proxy/proxy_server/
with behavior pins for the non-route surface of proxy_server.py:
lifecycle/init/shutdown, ProxyConfig class methods, DB-overlay config scrubbers,
spend counters, background-health helpers, OpenAPI customization, exception
handlers, and streaming-generator helpers.

233 tests cover 101 pin-list symbols (1+ happy + 1+ error each). New-tests-only
coverage on litellm/proxy/proxy_server.py: 32.80% line / 20.91% branch (PR1
gate: 25% line / 18% branch). Full directory runs in ~22s with -n 4.

Plan: https://www.notion.so/Plan-Pin-proxy_server-py-behavior-2026-05-25-36c43b8acdab81ee845fd5365128a2fc

* test(proxy): address Greptile review comments on test_lifecycle.py

- test_initialize_signature_is_async_with_expected_params: hard-code
  expected_param_count so a signature change actually trips the gate
  (previously both sides of the comparison were len(sig.parameters)).
- test_check_request_disconnection_invalid_when_connected_times_out:
  patch asyncio.sleep so the test no longer spins for ~1.2 s of real
  wall-clock; timeout lowered to 0.05 s.

---------

Co-authored-by: Claude <noreply@anthropic.com>

* test(proxy/proxy_server): pin control-plane routes (PR3) (#28850)

* test(proxy/proxy_server): pin misc routes (PR3, partial)

Adds happy + error tests for the misc control-plane routes:
GET /, /routes, /adaptive_router/state, /get_logo_url,
/get_image, /get_favicon.

Also gitignores .pin_list.txt (used by the pin gate).

* test(proxy/proxy_server): pin login/SSO routes (PR3, partial)

Adds happy + error tests for the 5 login/SSO control-plane routes:
GET /fallback/login, POST /login, POST /v2/login, POST /v3/login,
POST /v3/login/exchange. Mocks authenticate_user and
create_ui_token_object at their imported location.

* test(proxy/proxy_server): pin onboarding routes (PR3, partial)

Adds happy + error tests for the 2 onboarding control-plane routes:
GET /onboarding/get_token, POST /onboarding/claim_token. Wires a
MagicMock async context manager for prisma_client.db.tx() and
signs the onboarding JWT with the patched master_key.

* test(proxy/proxy_server): pin model_cost_map reload routes (PR3, partial)

Adds happy + error tests for the 5 model-cost-map control-plane routes:
POST /reload/model_cost_map, POST|DELETE|GET
/schedule/model_cost_map_reload(/status), GET /model/cost_map/source.
Attaches litellm_config to mock_prisma per-test (the table is not in
the default _PRISMA_TABLES fixture).

* test(proxy/proxy_server): pin anthropic_beta_headers reload routes (PR3, partial)

Adds happy + error tests for the 4 anthropic-beta-headers control-plane
routes: POST /reload/anthropic_beta_headers, POST|DELETE|GET
/schedule/anthropic_beta_headers_reload(/status). Stubs
db.litellm_config (not in default _PRISMA_TABLES) and monkeypatches
reload_beta_headers_config so no network calls fire.

* test(proxy/proxy_server): pin invitation routes (PR3, partial)

Adds happy + error tests for the 4 invitation control-plane routes:
POST /invitation/new, GET /invitation/info, POST /invitation/update,
POST /invitation/delete. Patches _user_has_admin_privileges /
_user_has_admin_view to avoid extensive get_user_object mocking.

* test(proxy/proxy_server): pin config CRUD routes (PR3, partial)

Adds happy + error tests for the 8 config-CRUD control-plane routes:
POST /config/update, POST|GET /config/field/update|info, GET /config/list,
POST /config/field/delete, POST /config/callback/delete,
GET /get/config/callbacks, GET /config/yaml. Attaches litellm_config
to mock_prisma per-test.

* test(proxy/proxy_server): tighten pin assertions per review

- test_routes_misc.py: `b"" in response.content` is trivially true;
  replace with `len(response.content) > 0` so an empty 405 body trips
  the gate.
- test_routes_login_sso.py: `len(response.content) >= 0` is trivially
  true; tighten to `> 0`.
- test_routes_anthropic_beta.py: replace brittle string-literal checks
  on the serialized JSON (`'"interval_hours": 12' in payload`) with
  `json.loads` + dict access so the assertion survives any serializer
  spacing.
- test_routes_config.py: `assert status_code in (404, 500)` was too
  permissive; the handler re-raises HTTPException(404) verbatim, so
  pin 404 strictly.

---------

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants