fix(v1.87.0): unblock e2e --mock-only — 3 fails → 0 (Layer 1)#67
Merged
Merged
Conversation
Layer 1 of the post-bump audit. E2E mock-only run found 3 fails. Root causes were independent; one is a real production regression (case 20), the other two are e2e-harness wiring. After this commit: `e2e/tools/run-all-cases --mock-only` reports 15 PASS / 0 FAIL / 7 SKIP (skips are Tier=real cases that require a real provider). ## 1. case 07 metrics-endpoint-smoke (e2e config) Upstream v1.84.0 flipped the default of `litellm.require_auth_for_metrics_endpoint` to True. Anonymous `curl /metrics` now returns 401 in the e2e harness, so the test's `families=0 ct_ok=0` was a misread auth failure rather than a Prometheus emission bug. The case 07 runbook explicitly assumes anonymous /metrics access (the scrape posture for a trusted-network local Prometheus). Restore that posture in the e2e config — production unaffected (this only ships in the autogenerated `e2e/_config/.litellm.rendered.yaml`). File: `e2e/tools/proxy` — added `require_auth_for_metrics_endpoint: false` to the rendered `litellm_settings` block. ## 2. case 20 returned_model_name streaming /v1/messages (real bug) **Production regression.** Wave 5b placed the `message_start.message.model` SSE rewrite AFTER upstream's `async_post_call_streaming_hook` call. Upstream PR BerriAI#28289 (in v1.87.0) introduced a `fast_path` short-circuit before that hook for the dominant config (no guardrails, default `include_cost_in_streaming_usage`), so the rewrite was being skipped on every streaming /v1/messages request where `returned_model_name` is set. The upstream model id leaked. Fix: move the rewrite block BEFORE the `fast_path` short-circuit. Pay near-zero overhead in the unset-override case (one dict get + one substring test in the SSE byte rewriter). File: `litellm/proxy/common_request_processing.py:2113-2173`. ## 3. case 23 mock-memory-pressure (e2e harness ordering) Case 23 reset the mock counter without waiting for the mock-side in_flight to drain. Case 20's last assertion (A4 streaming /v1/chat/completions) returns `[DONE]` to the client well before the mock-side handler finishes writing chunks (the mock counter is bumped post-`wfile.flush()`). The leaked stream from case 20 finished during case 23's burst window, bumping the counter and producing a false 6-of-5 fail. Fix: in case 23, poll `/__mock__/state` for `in_flight == 0` (50 × 100ms, bounded ~5s) before issuing the reset. File: `e2e/cases/data/23_mock_memory_pressure.sh`. ## Verification ```bash e2e/tools/run-all-cases --mock-only # → 15 PASS / 0 FAIL / 7 SKIP (skipped are Tier=real, expected) ``` Tier: C (case 20 — universal bug fix on streaming SSE) + B (case 07, 23 — internal e2e infrastructure).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Layer 1 of post-bump audit: e2e --mock-only had 3 fails. Root causes diagnosed by agents, one real prod regression (case 20 streaming fast_path) + 2 e2e harness fixes (case 07 auth default, case 23 in_flight drain). After: 15 PASS / 0 FAIL / 7 SKIP.