fix(proxy): run model-level post_call guardrails on streaming requests by michelligabriele · Pull Request #26922 · BerriAI/litellm

michelligabriele · 2026-04-30T21:33:40Z

Relevant issues

Follow-up to #23774 — same bug class, missed call site on the streaming path.

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

The new unit test test_streaming_iterator_hook_runs_model_level_guardrail is bisect-confirmed: it fails with assert False is True against litellm_internal_staging without this change and passes with it. The companion test_streaming_iterator_hook_skips_guardrail_not_on_model guards against false positives by asserting the dispatcher gate stays closed for guardrails not attached at the model level.

Type

🐛 Bug Fix

Changes

Bug

For a request with stream: true against a model that has a post_call guardrail attached at the model level (litellm_params.guardrails) and configured with default_on: false, the guardrail is silently skipped — no log entry, no error, metadata.guardrail_information ends up null. The same configuration on the same model fires correctly for non-streaming after #23774.

Root cause

ProxyLogging.async_post_call_streaming_iterator_hook in litellm/proxy/utils.py is the streaming-side dispatcher that wraps each callback's iterator hook into the response stream. It gates each callback by calling should_run_guardrail(data=request_data, ...) against the raw request_data — without first merging deployment.litellm_params.guardrails into metadata.guardrails. With default_on: false, should_run_guardrail consults metadata.guardrails / data.guardrails (both empty for a no-body-attached request) and returns False. None of the dispatcher's three downstream branches fire — so for apply_guardrail guardrails (OpenAI Moderation, Bedrock Guardrails, Lakera, etc.) the unified_guardrail end-of-stream block that writes guardrail_information never runs.

#23774 patched the three non-streaming sites where should_run_guardrail is called against request data. The streaming iterator dispatcher is the missed fourth site.

Fix

Mirror #23774's pattern: at the top of async_post_call_streaming_iterator_hook, call _check_and_merge_model_level_guardrails(...) and assign the result back to request_data before the per-callback loop. The merge helper's existing shallow-copy semantics propagate the merged metadata.guardrails to all downstream wrappers (including unified_guardrail.async_post_call_streaming_iterator_hook's own internal should_run_guardrail check) through the shared metadata reference.

The local from litellm.proxy.proxy_server import llm_router import matches the established pattern at the two existing call sites in utils.py (post_call_success_hook and the per-chunk async_post_call_streaming_hook), where the local import breaks a circular dependency with proxy_server.

Tests

Two new async tests in tests/test_litellm/proxy/test_model_level_guardrails.py, mirroring the file's existing non-streaming tests verbatim:

test_streaming_iterator_hook_runs_model_level_guardrail — asserts the model-level guardrail fires for streaming with default_on: false and no body-level guardrails field.
test_streaming_iterator_hook_skips_guardrail_not_on_model — regression guard: deployment configured with a different guardrail name, the dispatcher gate stays closed.

Note

Medium Risk
Changes the streaming response dispatch path to run model-level post_call guardrails, which can affect safety/compliance behavior for streamed completions. Low code churn, but touches guardrail gating logic that can change which callbacks execute in production.

Overview
Fixes a gap where model-level post_call guardrails were skipped for streaming responses by merging deployment guardrails into request_data at the start of ProxyLogging.async_post_call_streaming_iterator_hook before should_run_guardrail checks.

Adds two async integration tests to verify that a model-attached streaming guardrail runs even when not specified in the request body, and that unrelated guardrails remain gated off.

^{Reviewed by Cursor Bugbot for commit b1558ae. Bugbot is set up for automated code reviews on this repo. Configure here.}

greptile-apps · 2026-04-30T21:34:50Z

Greptile Summary

This PR fixes the streaming path of ProxyLogging.async_post_call_streaming_iterator_hook to merge model-level guardrails from deployment.litellm_params.guardrails into request_data before the per-callback gate check — the same fix applied to the non-streaming sites in #23774. The change is a 7-line addition that follows the established pattern used at the two other existing call sites in utils.py, and is validated by two new mock-only unit tests covering both the positive and negative guard conditions.

Confidence Score: 5/5

Safe to merge — minimal, well-targeted fix with appropriate test coverage and no backward-incompatible changes.

The fix is a direct mirror of an already-reviewed pattern used at two other call sites in the same file. No new logic is introduced; the local import style matches convention. Tests are mock-only and correctly cover both the should-fire and should-not-fire paths. No security, schema, or backwards-compatibility concerns.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/proxy/utils.py	Adds `_check_and_merge_model_level_guardrails` call at the top of `async_post_call_streaming_iterator_hook`, mirroring the existing pattern at the two other call sites in the file; change is minimal and correct.
tests/test_litellm/proxy/test_model_level_guardrails.py	Adds two new async tests for the streaming iterator path that mirror the existing non-streaming tests; tests use only mocks (no real network calls), are well-structured, and cover both the positive (guardrail fires) and negative (guardrail correctly skipped) cases.

_{Reviews (1): Last reviewed commit: "fix(proxy): run model-level post_call gu..." | Re-trigger Greptile}

mateo-berri · 2026-05-01T21:46:35Z

bugbot run

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit b1558ae. Configure here.}

mateo-berri · 2026-05-07T18:23:05Z

E2E reproduction & verification (real proxy, real providers — no mocks)

Drove a local proxy (uv run python litellm/proxy/proxy_cli.py --config e2e_pr26922/config.yaml --port 4000 --detailed_debug) against real OpenAI and Anthropic completions to confirm the bug and the fix.

Setup

e2e_pr26922/config.yaml defines two model-level guardrails attached to a deployment via litellm_params.guardrails, both with default_on: false:

model_list:
  - model_name: gpt-4o-mini-guarded
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY
      guardrails:
        - e2e-streaming-tracker        # custom guardrail: own iterator hook (Branch 1)
        - openai-mod-post              # openai_moderation: apply_guardrail (Branch 2)
  - model_name: claude-haiku-guarded
    litellm_params:
      model: anthropic/claude-haiku-4-5
      api_key: os.environ/ANTHROPIC_API_KEY
      guardrails: [e2e-streaming-tracker, openai-mod-post]
  - model_name: gpt-4o-mini-bare        # control: no model-level guardrails
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

guardrails:
  - guardrail_name: e2e-streaming-tracker
    litellm_params:
      guardrail: e2e_streaming_guardrail.E2EStreamingGuardrail
      mode: post_call
      default_on: false                 # critical
  - guardrail_name: openai-mod-post
    litellm_params:
      guardrail: openai_moderation
      mode: post_call
      api_key: os.environ/OPENAI_API_KEY
      default_on: false
      model: omni-moderation-latest

The custom E2EStreamingGuardrail records every hook invocation to /tmp/pr26922_guardrail_calls.jsonl so we can prove which dispatcher branches actually fire.

Methodology

Start proxy with PR commit applied → run 4 curl requests → inspect tracker JSONL + /spend/logs for metadata.guardrail_information.
git apply -R the source-only diff (keep tests + commit history) → restart proxy → re-run the same curls.
git checkout to restore the PR fix → restart → re-run.

curl requests (identical across all three runs)

# A: non-streaming, model with guardrails
curl http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
  -d '{"model":"gpt-4o-mini-guarded","stream":false,"messages":[...],"max_tokens":20}'
# B: streaming, OpenAI
curl -N http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
  -d '{"model":"gpt-4o-mini-guarded","stream":true,"messages":[...],"max_tokens":20}'
# C: streaming, Anthropic
curl -N http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
  -d '{"model":"claude-haiku-guarded","stream":true,"messages":[...],"max_tokens":20}'
# D: streaming, control model (no guardrails)
curl -N http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
  -d '{"model":"gpt-4o-mini-bare","stream":true,"messages":[...],"max_tokens":20}'

Results — custom-guardrail JSONL tracker

With PR fix applied (utils.py:2278 calls _check_and_merge_model_level_guardrails):

post_call_success                      | model=gpt-4o-mini-guarded   stream=False  mg=['openai-mod-post', 'e2e-streaming-tracker']
post_call_streaming_iterator_START     | model=gpt-4o-mini-guarded   stream=-      mg=['openai-mod-post', 'e2e-streaming-tracker']
post_call_streaming_iterator_END       | -                           -             chunks=23
post_call_streaming_iterator_START     | model=claude-haiku-guarded  stream=-      mg=['openai-mod-post', 'e2e-streaming-tracker']
post_call_streaming_iterator_END       | -                           -             chunks=3
# (TEST D bare model: no events — correctly skipped)

With PR diff reverted (everything else identical, same prompts, same proxy config):

post_call_success                      | model=gpt-4o-mini-guarded   stream=False  mg=['e2e-streaming-tracker', 'openai-mod-post']
# stream=True for B and C: NO post_call_streaming_iterator_* events ← BUG

Results — `/spend/logs` `metadata.guardrail_information`

Same proxy, same upstream LLM provider, same payloads — only variable is the 7-line PR diff:

Run	Time	Model	stream	`guardrail_information`
buggy	18:05:37	openai/gpt-4o-mini	false	YES (`openai-mod-post`)
buggy	18:05:38	openai/gpt-4o-mini	true	NO ✗ ← bug
buggy	18:05:39	anthropic/claude-haiku-4-5	true	NO ✗ ← bug
fixed	18:06:45	openai/gpt-4o-mini	false	YES
fixed	18:06:47	openai/gpt-4o-mini	true	YES ✓
fixed	18:06:49	anthropic/claude-haiku-4-5	true	YES ✓
fixed	18:06:50	openai/gpt-4o-mini (bare control)	true	NO (correctly skipped)

The fix works on both Branch 1 (custom guardrail with own iterator hook) and Branch 2 (apply_guardrail via unified guardrail) of the dispatcher in ProxyLogging.async_post_call_streaming_iterator_hook, and is provider-agnostic (verified against both real OpenAI and real Anthropic streaming responses).

Failing CI checks — investigation

The four red checks are persistent, but they are all unrelated to the 7-line streaming-guardrail change. Re-running each (POST /api/v2/workflow/<id>/rerun?from_failed=true and gh run rerun --failed) reproduced the same failures, which themselves match what's failing on the PR's base branch litellm_internal_staging:

CircleCI job	Failing test	Root cause
`proxy_logging_guardrails_model_info_tests`	`test_e2e_model_access.py::test_model_access_patterns[bedrock/anthropic.claude-3]`	404 from `exampleopenaiendpoint-production.up.railway.app` mock — bedrock wildcard test, unrelated to streaming/guardrails
`llm_responses_api_testing`	`test_azure_responses_api.py::test_basic_openai_responses_delete_endpoint`	Azure API rejects DELETE with body — `Unexpected body with size 2. This API method does not accept a request body`
`llm_translation_testing`	`test_anthropic_completion.py::test_async_pdf_handling_with_file_id`	Anthropic upstream error: `Unable to download the file. Please verify the URL`
`proxy-infra / Run tests` (GH Actions)	`test_proxy_server.py::TestLazyFeaturesNotImportedAtStartup::test_heavy_modules_absent_at_startup`	Subprocess timeout at 120s during `from litellm.proxy.proxy_server import app`. Passes locally in 11 s. Pure CI runner perf issue.

Sanity checks:

The PR's added import from litellm.proxy.proxy_server import llm_router is function-local inside async_post_call_streaming_iterator_hook — it can't affect startup-time sys.modules.
The PR doesn't touch bedrock routing, Azure responses, anthropic file handling, or proxy startup.

The PR's two new unit tests pass locally:

tests/test_litellm/proxy/test_model_level_guardrails.py::test_streaming_iterator_hook_runs_model_level_guardrail PASSED
tests/test_litellm/proxy/test_model_level_guardrails.py::test_streaming_iterator_hook_skips_guardrail_not_on_model PASSED

TestLazyFeaturesNotImportedAtStartup::test_heavy_modules_absent_at_startup passes locally in 11 s.

LGTM; thanks!

mateo-berri

LGTM; thanks!

mateo-berri · 2026-05-07T18:39:17Z

Note: I rebased onto current litellm_internal_staging head (fee5900acc) since the PR was 945 commits behind and the proxy-infra / Run tests job was hitting a pre-existing 120s subprocess timeout on TestLazyFeaturesNotImportedAtStartup::test_heavy_modules_absent_at_startup that has since been resolved on staging (the test passes locally on the rebased commit in 4.69 s).

Re-confirmed e2e on the rebased commit (47d184d3):

streaming gpt-4o-mini-guarded → post_call_streaming_iterator_START fires ✓
streaming claude-haiku-guarded (Anthropic) → post_call_streaming_iterator_START fires ✓
both unit tests still pass locally:
- test_streaming_iterator_hook_runs_model_level_guardrail PASSED
- test_streaming_iterator_hook_skips_guardrail_not_on_model PASSED

Force-push preserved the approval. Waiting on the fresh CI sweep.

codecov · 2026-05-07T18:42:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

mateo-berri self-requested a review May 1, 2026 21:46

cursor Bot reviewed May 1, 2026

View reviewed changes

mateo-berri approved these changes May 7, 2026

View reviewed changes

mateo-berri enabled auto-merge (squash) May 7, 2026 18:23

mateo-berri disabled auto-merge May 7, 2026 18:23

fix(proxy): run model-level post_call guardrails on streaming requests

47d184d

mateo-berri force-pushed the litellm_fix_streaming_model_level_guardrail branch from b1558ae to 47d184d Compare May 7, 2026 18:38

mateo-berri merged commit 9f1b41d into litellm_internal_staging May 7, 2026
113 of 114 checks passed

mateo-berri deleted the litellm_fix_streaming_model_level_guardrail branch May 7, 2026 18:53

cursor Bot mentioned this pull request May 13, 2026

Add chat completions streaming benchmark and fast paths #27816

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(proxy): run model-level post_call guardrails on streaming requests#26922

fix(proxy): run model-level post_call guardrails on streaming requests#26922
mateo-berri merged 1 commit into
litellm_internal_stagingfrom
litellm_fix_streaming_model_level_guardrail

michelligabriele commented Apr 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

greptile-apps Bot commented Apr 30, 2026

Important Files Changed

Uh oh!

mateo-berri commented May 1, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

mateo-berri commented May 7, 2026

Uh oh!

mateo-berri left a comment

Uh oh!

mateo-berri commented May 7, 2026

Uh oh!

codecov Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

michelligabriele commented Apr 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Bug

Root cause

Fix

Tests

Uh oh!

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

mateo-berri commented May 1, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mateo-berri commented May 7, 2026

E2E reproduction & verification (real proxy, real providers — no mocks)

Setup

Methodology

curl requests (identical across all three runs)

Results — custom-guardrail JSONL tracker

Results — /spend/logs metadata.guardrail_information

Failing CI checks — investigation

Uh oh!

mateo-berri left a comment

Choose a reason for hiding this comment

Uh oh!

mateo-berri commented May 7, 2026

Uh oh!

codecov Bot commented May 7, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michelligabriele commented Apr 30, 2026 •

edited by cursor Bot

Loading

Results — `/spend/logs` `metadata.guardrail_information`