Skip to content

fix(v1.87.0): unblock e2e --mock-only — 3 fails → 0 (Layer 1)#67

Merged
songkuan-zheng merged 1 commit into
ship/v1.87.0from
fix/v1.87.0-e2e-mock-only-greens
Jun 4, 2026
Merged

fix(v1.87.0): unblock e2e --mock-only — 3 fails → 0 (Layer 1)#67
songkuan-zheng merged 1 commit into
ship/v1.87.0from
fix/v1.87.0-e2e-mock-only-greens

Conversation

@songkuan-zheng

Copy link
Copy Markdown
Collaborator

Layer 1 of post-bump audit: e2e --mock-only had 3 fails. Root causes diagnosed by agents, one real prod regression (case 20 streaming fast_path) + 2 e2e harness fixes (case 07 auth default, case 23 in_flight drain). After: 15 PASS / 0 FAIL / 7 SKIP.

Layer 1 of the post-bump audit. E2E mock-only run found 3 fails. Root
causes were independent; one is a real production regression (case 20),
the other two are e2e-harness wiring.

After this commit: `e2e/tools/run-all-cases --mock-only` reports
15 PASS / 0 FAIL / 7 SKIP (skips are Tier=real cases that require a
real provider).

## 1. case 07 metrics-endpoint-smoke (e2e config)

Upstream v1.84.0 flipped the default of
`litellm.require_auth_for_metrics_endpoint` to True. Anonymous
`curl /metrics` now returns 401 in the e2e harness, so the test's
`families=0 ct_ok=0` was a misread auth failure rather than a Prometheus
emission bug.

The case 07 runbook explicitly assumes anonymous /metrics access (the
scrape posture for a trusted-network local Prometheus). Restore that
posture in the e2e config — production unaffected (this only ships in
the autogenerated `e2e/_config/.litellm.rendered.yaml`).

File: `e2e/tools/proxy` — added `require_auth_for_metrics_endpoint: false`
to the rendered `litellm_settings` block.

## 2. case 20 returned_model_name streaming /v1/messages (real bug)

**Production regression.** Wave 5b placed the
`message_start.message.model` SSE rewrite AFTER upstream's
`async_post_call_streaming_hook` call. Upstream PR BerriAI#28289 (in v1.87.0)
introduced a `fast_path` short-circuit before that hook for the dominant
config (no guardrails, default `include_cost_in_streaming_usage`), so
the rewrite was being skipped on every streaming /v1/messages request
where `returned_model_name` is set. The upstream model id leaked.

Fix: move the rewrite block BEFORE the `fast_path` short-circuit. Pay
near-zero overhead in the unset-override case (one dict get + one
substring test in the SSE byte rewriter).

File: `litellm/proxy/common_request_processing.py:2113-2173`.

## 3. case 23 mock-memory-pressure (e2e harness ordering)

Case 23 reset the mock counter without waiting for the mock-side
in_flight to drain. Case 20's last assertion (A4 streaming
/v1/chat/completions) returns `[DONE]` to the client well before the
mock-side handler finishes writing chunks (the mock counter is bumped
post-`wfile.flush()`). The leaked stream from case 20 finished during
case 23's burst window, bumping the counter and producing a false
6-of-5 fail.

Fix: in case 23, poll `/__mock__/state` for `in_flight == 0` (50 × 100ms,
bounded ~5s) before issuing the reset.

File: `e2e/cases/data/23_mock_memory_pressure.sh`.

## Verification

```bash
e2e/tools/run-all-cases --mock-only
# → 15 PASS / 0 FAIL / 7 SKIP (skipped are Tier=real, expected)
```

Tier: C (case 20 — universal bug fix on streaming SSE) + B (case 07, 23
— internal e2e infrastructure).
@songkuan-zheng songkuan-zheng merged commit e1031db into ship/v1.87.0 Jun 4, 2026
@songkuan-zheng songkuan-zheng deleted the fix/v1.87.0-e2e-mock-only-greens branch June 4, 2026 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant