fix(proxy): run model-level post_call guardrails on streaming requests#26922
Conversation
Greptile SummaryThis PR fixes the streaming path of Confidence Score: 5/5Safe to merge — minimal, well-targeted fix with appropriate test coverage and no backward-incompatible changes. The fix is a direct mirror of an already-reviewed pattern used at two other call sites in the same file. No new logic is introduced; the local import style matches convention. Tests are mock-only and correctly cover both the should-fire and should-not-fire paths. No security, schema, or backwards-compatibility concerns. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/utils.py | Adds _check_and_merge_model_level_guardrails call at the top of async_post_call_streaming_iterator_hook, mirroring the existing pattern at the two other call sites in the file; change is minimal and correct. |
| tests/test_litellm/proxy/test_model_level_guardrails.py | Adds two new async tests for the streaming iterator path that mirror the existing non-streaming tests; tests use only mocks (no real network calls), are well-structured, and cover both the positive (guardrail fires) and negative (guardrail correctly skipped) cases. |
Reviews (1): Last reviewed commit: "fix(proxy): run model-level post_call gu..." | Re-trigger Greptile
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit b1558ae. Configure here.
E2E reproduction & verification (real proxy, real providers — no mocks)Drove a local proxy ( Setup
model_list:
- model_name: gpt-4o-mini-guarded
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
guardrails:
- e2e-streaming-tracker # custom guardrail: own iterator hook (Branch 1)
- openai-mod-post # openai_moderation: apply_guardrail (Branch 2)
- model_name: claude-haiku-guarded
litellm_params:
model: anthropic/claude-haiku-4-5
api_key: os.environ/ANTHROPIC_API_KEY
guardrails: [e2e-streaming-tracker, openai-mod-post]
- model_name: gpt-4o-mini-bare # control: no model-level guardrails
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
guardrails:
- guardrail_name: e2e-streaming-tracker
litellm_params:
guardrail: e2e_streaming_guardrail.E2EStreamingGuardrail
mode: post_call
default_on: false # critical
- guardrail_name: openai-mod-post
litellm_params:
guardrail: openai_moderation
mode: post_call
api_key: os.environ/OPENAI_API_KEY
default_on: false
model: omni-moderation-latestThe custom Methodology
curl requests (identical across all three runs)# A: non-streaming, model with guardrails
curl http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
-d '{"model":"gpt-4o-mini-guarded","stream":false,"messages":[...],"max_tokens":20}'
# B: streaming, OpenAI
curl -N http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
-d '{"model":"gpt-4o-mini-guarded","stream":true,"messages":[...],"max_tokens":20}'
# C: streaming, Anthropic
curl -N http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
-d '{"model":"claude-haiku-guarded","stream":true,"messages":[...],"max_tokens":20}'
# D: streaming, control model (no guardrails)
curl -N http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
-d '{"model":"gpt-4o-mini-bare","stream":true,"messages":[...],"max_tokens":20}'Results — custom-guardrail JSONL trackerWith PR fix applied ( With PR diff reverted (everything else identical, same prompts, same proxy config): Results —
|
| Run | Time | Model | stream | guardrail_information |
|---|---|---|---|---|
| buggy | 18:05:37 | openai/gpt-4o-mini | false | YES (openai-mod-post) |
| buggy | 18:05:38 | openai/gpt-4o-mini | true | NO ✗ ← bug |
| buggy | 18:05:39 | anthropic/claude-haiku-4-5 | true | NO ✗ ← bug |
| fixed | 18:06:45 | openai/gpt-4o-mini | false | YES |
| fixed | 18:06:47 | openai/gpt-4o-mini | true | YES ✓ |
| fixed | 18:06:49 | anthropic/claude-haiku-4-5 | true | YES ✓ |
| fixed | 18:06:50 | openai/gpt-4o-mini (bare control) | true | NO (correctly skipped) |
The fix works on both Branch 1 (custom guardrail with own iterator hook) and Branch 2 (apply_guardrail via unified guardrail) of the dispatcher in ProxyLogging.async_post_call_streaming_iterator_hook, and is provider-agnostic (verified against both real OpenAI and real Anthropic streaming responses).
Failing CI checks — investigation
The four red checks are persistent, but they are all unrelated to the 7-line streaming-guardrail change. Re-running each (POST /api/v2/workflow/<id>/rerun?from_failed=true and gh run rerun --failed) reproduced the same failures, which themselves match what's failing on the PR's base branch litellm_internal_staging:
| CircleCI job | Failing test | Root cause |
|---|---|---|
proxy_logging_guardrails_model_info_tests |
test_e2e_model_access.py::test_model_access_patterns[bedrock/anthropic.claude-3] |
404 from exampleopenaiendpoint-production.up.railway.app mock — bedrock wildcard test, unrelated to streaming/guardrails |
llm_responses_api_testing |
test_azure_responses_api.py::test_basic_openai_responses_delete_endpoint |
Azure API rejects DELETE with body — Unexpected body with size 2. This API method does not accept a request body |
llm_translation_testing |
test_anthropic_completion.py::test_async_pdf_handling_with_file_id |
Anthropic upstream error: Unable to download the file. Please verify the URL |
proxy-infra / Run tests (GH Actions) |
test_proxy_server.py::TestLazyFeaturesNotImportedAtStartup::test_heavy_modules_absent_at_startup |
Subprocess timeout at 120s during from litellm.proxy.proxy_server import app. Passes locally in 11 s. Pure CI runner perf issue. |
Sanity checks:
- The PR's added import
from litellm.proxy.proxy_server import llm_routeris function-local insideasync_post_call_streaming_iterator_hook— it can't affect startup-timesys.modules. - The PR doesn't touch bedrock routing, Azure responses, anthropic file handling, or proxy startup.
- The PR's two new unit tests pass locally:
tests/test_litellm/proxy/test_model_level_guardrails.py::test_streaming_iterator_hook_runs_model_level_guardrail PASSED tests/test_litellm/proxy/test_model_level_guardrails.py::test_streaming_iterator_hook_skips_guardrail_not_on_model PASSED TestLazyFeaturesNotImportedAtStartup::test_heavy_modules_absent_at_startuppasses locally in 11 s.
LGTM; thanks!
b1558ae to
47d184d
Compare
Re-confirmed e2e on the rebased commit (
Force-push preserved the approval. Waiting on the fresh CI sweep. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
9f1b41d
into
litellm_internal_staging
Relevant issues
Follow-up to #23774 — same bug class, missed call site on the streaming path.
Linear ticket
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
The new unit test
test_streaming_iterator_hook_runs_model_level_guardrailis bisect-confirmed: it fails withassert False is Trueagainstlitellm_internal_stagingwithout this change and passes with it. The companiontest_streaming_iterator_hook_skips_guardrail_not_on_modelguards against false positives by asserting the dispatcher gate stays closed for guardrails not attached at the model level.Type
🐛 Bug Fix
Changes
Bug
For a request with
stream: trueagainst a model that has a post_call guardrail attached at the model level (litellm_params.guardrails) and configured withdefault_on: false, the guardrail is silently skipped — no log entry, no error,metadata.guardrail_informationends upnull. The same configuration on the same model fires correctly for non-streaming after #23774.Root cause
ProxyLogging.async_post_call_streaming_iterator_hookinlitellm/proxy/utils.pyis the streaming-side dispatcher that wraps each callback's iterator hook into the response stream. It gates each callback by callingshould_run_guardrail(data=request_data, ...)against the rawrequest_data— without first mergingdeployment.litellm_params.guardrailsintometadata.guardrails. Withdefault_on: false,should_run_guardrailconsultsmetadata.guardrails/data.guardrails(both empty for a no-body-attached request) and returnsFalse. None of the dispatcher's three downstream branches fire — so forapply_guardrailguardrails (OpenAI Moderation, Bedrock Guardrails, Lakera, etc.) theunified_guardrailend-of-stream block that writesguardrail_informationnever runs.#23774 patched the three non-streaming sites where
should_run_guardrailis called against request data. The streaming iterator dispatcher is the missed fourth site.Fix
Mirror #23774's pattern: at the top of
async_post_call_streaming_iterator_hook, call_check_and_merge_model_level_guardrails(...)and assign the result back torequest_databefore the per-callback loop. The merge helper's existing shallow-copy semantics propagate the mergedmetadata.guardrailsto all downstream wrappers (includingunified_guardrail.async_post_call_streaming_iterator_hook's own internalshould_run_guardrailcheck) through the sharedmetadatareference.The local
from litellm.proxy.proxy_server import llm_routerimport matches the established pattern at the two existing call sites inutils.py(post_call_success_hookand the per-chunkasync_post_call_streaming_hook), where the local import breaks a circular dependency withproxy_server.Tests
Two new async tests in
tests/test_litellm/proxy/test_model_level_guardrails.py, mirroring the file's existing non-streaming tests verbatim:test_streaming_iterator_hook_runs_model_level_guardrail— asserts the model-level guardrail fires for streaming withdefault_on: falseand no body-levelguardrailsfield.test_streaming_iterator_hook_skips_guardrail_not_on_model— regression guard: deployment configured with a different guardrail name, the dispatcher gate stays closed.Note
Medium Risk
Changes the streaming response dispatch path to run model-level
post_callguardrails, which can affect safety/compliance behavior for streamed completions. Low code churn, but touches guardrail gating logic that can change which callbacks execute in production.Overview
Fixes a gap where model-level
post_callguardrails were skipped for streaming responses by merging deployment guardrails intorequest_dataat the start ofProxyLogging.async_post_call_streaming_iterator_hookbeforeshould_run_guardrailchecks.Adds two async integration tests to verify that a model-attached streaming guardrail runs even when not specified in the request body, and that unrelated guardrails remain gated off.
Reviewed by Cursor Bugbot for commit b1558ae. Bugbot is set up for automated code reviews on this repo. Configure here.