Skip to content

fix(proxy): run model-level post_call guardrails on streaming requests#26922

Merged
mateo-berri merged 1 commit into
litellm_internal_stagingfrom
litellm_fix_streaming_model_level_guardrail
May 7, 2026
Merged

fix(proxy): run model-level post_call guardrails on streaming requests#26922
mateo-berri merged 1 commit into
litellm_internal_stagingfrom
litellm_fix_streaming_model_level_guardrail

Conversation

@michelligabriele

@michelligabriele michelligabriele commented Apr 30, 2026

Copy link
Copy Markdown
Collaborator

Relevant issues

Follow-up to #23774 — same bug class, missed call site on the streaming path.

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

The new unit test test_streaming_iterator_hook_runs_model_level_guardrail is bisect-confirmed: it fails with assert False is True against litellm_internal_staging without this change and passes with it. The companion test_streaming_iterator_hook_skips_guardrail_not_on_model guards against false positives by asserting the dispatcher gate stays closed for guardrails not attached at the model level.

Type

🐛 Bug Fix

Changes

Bug

For a request with stream: true against a model that has a post_call guardrail attached at the model level (litellm_params.guardrails) and configured with default_on: false, the guardrail is silently skipped — no log entry, no error, metadata.guardrail_information ends up null. The same configuration on the same model fires correctly for non-streaming after #23774.

Root cause

ProxyLogging.async_post_call_streaming_iterator_hook in litellm/proxy/utils.py is the streaming-side dispatcher that wraps each callback's iterator hook into the response stream. It gates each callback by calling should_run_guardrail(data=request_data, ...) against the raw request_data — without first merging deployment.litellm_params.guardrails into metadata.guardrails. With default_on: false, should_run_guardrail consults metadata.guardrails / data.guardrails (both empty for a no-body-attached request) and returns False. None of the dispatcher's three downstream branches fire — so for apply_guardrail guardrails (OpenAI Moderation, Bedrock Guardrails, Lakera, etc.) the unified_guardrail end-of-stream block that writes guardrail_information never runs.

#23774 patched the three non-streaming sites where should_run_guardrail is called against request data. The streaming iterator dispatcher is the missed fourth site.

Fix

Mirror #23774's pattern: at the top of async_post_call_streaming_iterator_hook, call _check_and_merge_model_level_guardrails(...) and assign the result back to request_data before the per-callback loop. The merge helper's existing shallow-copy semantics propagate the merged metadata.guardrails to all downstream wrappers (including unified_guardrail.async_post_call_streaming_iterator_hook's own internal should_run_guardrail check) through the shared metadata reference.

The local from litellm.proxy.proxy_server import llm_router import matches the established pattern at the two existing call sites in utils.py (post_call_success_hook and the per-chunk async_post_call_streaming_hook), where the local import breaks a circular dependency with proxy_server.

Tests

Two new async tests in tests/test_litellm/proxy/test_model_level_guardrails.py, mirroring the file's existing non-streaming tests verbatim:

  • test_streaming_iterator_hook_runs_model_level_guardrail — asserts the model-level guardrail fires for streaming with default_on: false and no body-level guardrails field.
  • test_streaming_iterator_hook_skips_guardrail_not_on_model — regression guard: deployment configured with a different guardrail name, the dispatcher gate stays closed.

Note

Medium Risk
Changes the streaming response dispatch path to run model-level post_call guardrails, which can affect safety/compliance behavior for streamed completions. Low code churn, but touches guardrail gating logic that can change which callbacks execute in production.

Overview
Fixes a gap where model-level post_call guardrails were skipped for streaming responses by merging deployment guardrails into request_data at the start of ProxyLogging.async_post_call_streaming_iterator_hook before should_run_guardrail checks.

Adds two async integration tests to verify that a model-attached streaming guardrail runs even when not specified in the request body, and that unrelated guardrails remain gated off.

Reviewed by Cursor Bugbot for commit b1558ae. Bugbot is set up for automated code reviews on this repo. Configure here.

@greptile-apps

greptile-apps Bot commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes the streaming path of ProxyLogging.async_post_call_streaming_iterator_hook to merge model-level guardrails from deployment.litellm_params.guardrails into request_data before the per-callback gate check — the same fix applied to the non-streaming sites in #23774. The change is a 7-line addition that follows the established pattern used at the two other existing call sites in utils.py, and is validated by two new mock-only unit tests covering both the positive and negative guard conditions.

Confidence Score: 5/5

Safe to merge — minimal, well-targeted fix with appropriate test coverage and no backward-incompatible changes.

The fix is a direct mirror of an already-reviewed pattern used at two other call sites in the same file. No new logic is introduced; the local import style matches convention. Tests are mock-only and correctly cover both the should-fire and should-not-fire paths. No security, schema, or backwards-compatibility concerns.

No files require special attention.

Important Files Changed

Filename Overview
litellm/proxy/utils.py Adds _check_and_merge_model_level_guardrails call at the top of async_post_call_streaming_iterator_hook, mirroring the existing pattern at the two other call sites in the file; change is minimal and correct.
tests/test_litellm/proxy/test_model_level_guardrails.py Adds two new async tests for the streaming iterator path that mirror the existing non-streaming tests; tests use only mocks (no real network calls), are well-structured, and cover both the positive (guardrail fires) and negative (guardrail correctly skipped) cases.

Reviews (1): Last reviewed commit: "fix(proxy): run model-level post_call gu..." | Re-trigger Greptile

@mateo-berri

Copy link
Copy Markdown
Collaborator

bugbot run

@mateo-berri mateo-berri self-requested a review May 1, 2026 21:46

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit b1558ae. Configure here.

@mateo-berri

Copy link
Copy Markdown
Collaborator

E2E reproduction & verification (real proxy, real providers — no mocks)

Drove a local proxy (uv run python litellm/proxy/proxy_cli.py --config e2e_pr26922/config.yaml --port 4000 --detailed_debug) against real OpenAI and Anthropic completions to confirm the bug and the fix.

Setup

e2e_pr26922/config.yaml defines two model-level guardrails attached to a deployment via litellm_params.guardrails, both with default_on: false:

model_list:
  - model_name: gpt-4o-mini-guarded
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY
      guardrails:
        - e2e-streaming-tracker        # custom guardrail: own iterator hook (Branch 1)
        - openai-mod-post              # openai_moderation: apply_guardrail (Branch 2)
  - model_name: claude-haiku-guarded
    litellm_params:
      model: anthropic/claude-haiku-4-5
      api_key: os.environ/ANTHROPIC_API_KEY
      guardrails: [e2e-streaming-tracker, openai-mod-post]
  - model_name: gpt-4o-mini-bare        # control: no model-level guardrails
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

guardrails:
  - guardrail_name: e2e-streaming-tracker
    litellm_params:
      guardrail: e2e_streaming_guardrail.E2EStreamingGuardrail
      mode: post_call
      default_on: false                 # critical
  - guardrail_name: openai-mod-post
    litellm_params:
      guardrail: openai_moderation
      mode: post_call
      api_key: os.environ/OPENAI_API_KEY
      default_on: false
      model: omni-moderation-latest

The custom E2EStreamingGuardrail records every hook invocation to /tmp/pr26922_guardrail_calls.jsonl so we can prove which dispatcher branches actually fire.

Methodology

  1. Start proxy with PR commit applied → run 4 curl requests → inspect tracker JSONL + /spend/logs for metadata.guardrail_information.
  2. git apply -R the source-only diff (keep tests + commit history) → restart proxy → re-run the same curls.
  3. git checkout to restore the PR fix → restart → re-run.

curl requests (identical across all three runs)

# A: non-streaming, model with guardrails
curl http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
  -d '{"model":"gpt-4o-mini-guarded","stream":false,"messages":[...],"max_tokens":20}'
# B: streaming, OpenAI
curl -N http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
  -d '{"model":"gpt-4o-mini-guarded","stream":true,"messages":[...],"max_tokens":20}'
# C: streaming, Anthropic
curl -N http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
  -d '{"model":"claude-haiku-guarded","stream":true,"messages":[...],"max_tokens":20}'
# D: streaming, control model (no guardrails)
curl -N http://localhost:4000/v1/chat/completions -H 'Authorization: Bearer sk-1234' \
  -d '{"model":"gpt-4o-mini-bare","stream":true,"messages":[...],"max_tokens":20}'

Results — custom-guardrail JSONL tracker

With PR fix applied (utils.py:2278 calls _check_and_merge_model_level_guardrails):

post_call_success                      | model=gpt-4o-mini-guarded   stream=False  mg=['openai-mod-post', 'e2e-streaming-tracker']
post_call_streaming_iterator_START     | model=gpt-4o-mini-guarded   stream=-      mg=['openai-mod-post', 'e2e-streaming-tracker']
post_call_streaming_iterator_END       | -                           -             chunks=23
post_call_streaming_iterator_START     | model=claude-haiku-guarded  stream=-      mg=['openai-mod-post', 'e2e-streaming-tracker']
post_call_streaming_iterator_END       | -                           -             chunks=3
# (TEST D bare model: no events — correctly skipped)

With PR diff reverted (everything else identical, same prompts, same proxy config):

post_call_success                      | model=gpt-4o-mini-guarded   stream=False  mg=['e2e-streaming-tracker', 'openai-mod-post']
# stream=True for B and C: NO post_call_streaming_iterator_* events ← BUG

Results — /spend/logs metadata.guardrail_information

Same proxy, same upstream LLM provider, same payloads — only variable is the 7-line PR diff:

Run Time Model stream guardrail_information
buggy 18:05:37 openai/gpt-4o-mini false YES (openai-mod-post)
buggy 18:05:38 openai/gpt-4o-mini true NO ✗ ← bug
buggy 18:05:39 anthropic/claude-haiku-4-5 true NO ✗ ← bug
fixed 18:06:45 openai/gpt-4o-mini false YES
fixed 18:06:47 openai/gpt-4o-mini true YES
fixed 18:06:49 anthropic/claude-haiku-4-5 true YES
fixed 18:06:50 openai/gpt-4o-mini (bare control) true NO (correctly skipped)

The fix works on both Branch 1 (custom guardrail with own iterator hook) and Branch 2 (apply_guardrail via unified guardrail) of the dispatcher in ProxyLogging.async_post_call_streaming_iterator_hook, and is provider-agnostic (verified against both real OpenAI and real Anthropic streaming responses).

Failing CI checks — investigation

The four red checks are persistent, but they are all unrelated to the 7-line streaming-guardrail change. Re-running each (POST /api/v2/workflow/<id>/rerun?from_failed=true and gh run rerun --failed) reproduced the same failures, which themselves match what's failing on the PR's base branch litellm_internal_staging:

CircleCI job Failing test Root cause
proxy_logging_guardrails_model_info_tests test_e2e_model_access.py::test_model_access_patterns[bedrock/anthropic.claude-3] 404 from exampleopenaiendpoint-production.up.railway.app mock — bedrock wildcard test, unrelated to streaming/guardrails
llm_responses_api_testing test_azure_responses_api.py::test_basic_openai_responses_delete_endpoint Azure API rejects DELETE with body — Unexpected body with size 2. This API method does not accept a request body
llm_translation_testing test_anthropic_completion.py::test_async_pdf_handling_with_file_id Anthropic upstream error: Unable to download the file. Please verify the URL
proxy-infra / Run tests (GH Actions) test_proxy_server.py::TestLazyFeaturesNotImportedAtStartup::test_heavy_modules_absent_at_startup Subprocess timeout at 120s during from litellm.proxy.proxy_server import app. Passes locally in 11 s. Pure CI runner perf issue.

Sanity checks:

  • The PR's added import from litellm.proxy.proxy_server import llm_router is function-local inside async_post_call_streaming_iterator_hook — it can't affect startup-time sys.modules.
  • The PR doesn't touch bedrock routing, Azure responses, anthropic file handling, or proxy startup.
  • The PR's two new unit tests pass locally:
    tests/test_litellm/proxy/test_model_level_guardrails.py::test_streaming_iterator_hook_runs_model_level_guardrail PASSED
    tests/test_litellm/proxy/test_model_level_guardrails.py::test_streaming_iterator_hook_skips_guardrail_not_on_model PASSED
    
  • TestLazyFeaturesNotImportedAtStartup::test_heavy_modules_absent_at_startup passes locally in 11 s.

LGTM; thanks!

@mateo-berri mateo-berri left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; thanks!

@mateo-berri mateo-berri enabled auto-merge (squash) May 7, 2026 18:23
@mateo-berri mateo-berri disabled auto-merge May 7, 2026 18:23
@mateo-berri mateo-berri force-pushed the litellm_fix_streaming_model_level_guardrail branch from b1558ae to 47d184d Compare May 7, 2026 18:38
@mateo-berri

Copy link
Copy Markdown
Collaborator

Note: I rebased onto current litellm_internal_staging head (fee5900acc) since the PR was 945 commits behind and the proxy-infra / Run tests job was hitting a pre-existing 120s subprocess timeout on TestLazyFeaturesNotImportedAtStartup::test_heavy_modules_absent_at_startup that has since been resolved on staging (the test passes locally on the rebased commit in 4.69 s).

Re-confirmed e2e on the rebased commit (47d184d3):

  • streaming gpt-4o-mini-guarded → post_call_streaming_iterator_START fires ✓
  • streaming claude-haiku-guarded (Anthropic) → post_call_streaming_iterator_START fires ✓
  • both unit tests still pass locally:
    • test_streaming_iterator_hook_runs_model_level_guardrail PASSED
    • test_streaming_iterator_hook_skips_guardrail_not_on_model PASSED

Force-push preserved the approval. Waiting on the fresh CI sweep.

@codecov

codecov Bot commented May 7, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@mateo-berri mateo-berri merged commit 9f1b41d into litellm_internal_staging May 7, 2026
113 of 114 checks passed
@mateo-berri mateo-berri deleted the litellm_fix_streaming_model_level_guardrail branch May 7, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants