Skip to content

fix(bedrock_guardrails): select latest user message by original role in apply_guardrail#30482

Open
michelligabriele wants to merge 2 commits into
litellm_internal_stagingfrom
litellm_fix_bedrock_guardrail_role_leak_v2
Open

fix(bedrock_guardrails): select latest user message by original role in apply_guardrail#30482
michelligabriele wants to merge 2 commits into
litellm_internal_stagingfrom
litellm_fix_bedrock_guardrail_role_leak_v2

Conversation

@michelligabriele

Copy link
Copy Markdown
Collaborator

Relevant issues

Fixes #23476

Supersedes #25355 — that PR guarded at content-build time, but the role information is already lost upstream at message-selection time, so it did not stop the leak. This fixes the selection itself.

Linear ticket

Pre-Submission checklist

  • I have added meaningful tests
  • My PR passes all unit tests on make test-unit — ran the affected enterprise guardrail suite locally (14 passed); full make test-unit deferred to CI
  • My PR's scope is as isolated as possible; it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Screenshots / Proof of Fix

With experimental_use_latest_role_message_only enabled, the unified apply_guardrail path scanned the latest message of any role as the Bedrock INPUT, rather than the latest user-role message. In tool-calling conversations ending in a tool/assistant message, that non-user content was sent to the ApplyGuardrail INPUT scan.

The new regression tests reproduce this on the pre-fix code and pass with the fix.

Before the fix (conversation ending in a tool message — the INPUT scan receives the tool result):

>       assert kwargs["messages"] == [data["messages"][1]]   # expected the user message
E       AssertionError: At index 0 diff:
E         {'role': 'user', 'content': 'TOOL secret output'}        # what was actually scanned
E         != {'role': 'user', 'content': 'my SSN is 123-45-6789'}  # the real latest user message

After the fix — the latest original-role user message is the only content scanned, and masked content is written back to that message's position only (system/assistant/tool untouched):

tests/enterprise/litellm_enterprise/proxy/guardrails/test_bedrock_apply_guardrail.py
..............                                                            [100%]
14 passed

Type

🐛 Bug Fix
✅ Test

Changes

  • apply_guardrail now selects the latest message whose original role is user from inputs["structured_messages"] (falling back to request_data["messages"]), instead of wrapping every flattened text as role="user" and taking the latest of any role.
  • Skips the INPUT scan entirely when there is no user-role message or the latest user message has no text content.
  • Writes masked content back to the correct slice of the flat texts list, keeping it aligned with the translation handler's positional message↔text mapping (no whole-list clobber when only a subset is scanned).
  • Tests: latest-user selection on a tool-ending conversation; skip when no user message; masked write-back to the correct position; and an end-to-end test through the real OpenAIChatCompletionsHandler.process_input_messages (only the network call mocked) asserting the mask lands on the user message and other roles are left unchanged.

@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 28.81356% with 42 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...y/guardrails/guardrail_hooks/bedrock_guardrails.py 28.81% 42 Missing ⚠️

📢 Thoughts on this report? Let us know!

@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a message-selection bug in BedrockGuardrail.apply_guardrail: when experimental_use_latest_role_message_only is enabled, agentic conversations ending in a tool or assistant message were leaking that non-user content to the Bedrock INPUT scan because the old code wrapped every flat text as a role="user" mock and then picked the "latest user message."

  • Introduces _select_messages_for_apply_guardrail which resolves role from inputs["structured_messages"] (or falls back to request_data["messages"]), finds the last true user-role message, and skips the scan entirely when none exists or when the user message has no text content.
  • Adds _locate_message_texts_slice to map the selected message back to its positional slice in the flat texts list, so masked content is written to the right indices and other roles are left untouched.
  • Five regression tests (all mocked) cover the tool-ending conversation, no-user-message skip, positional write-back, and an end-to-end handler integration path.

Confidence Score: 4/5

Safe to merge for the core bug fix; the new role-selection and slice write-back logic is well-tested, the fallback paths preserve pre-existing behaviour, and no real network calls are made in tests.

The fix is tightly scoped and the five new regression tests cover all the main scenarios. Two minor gaps exist: the write-back guard is not exhaustive for the edge case where scanned_role_subset=True, scanned_slice=None, and lengths happen to equal; and the fallback to request_data messages can silently skip masking when skip-message flags are active for direct API callers. Neither is a regression introduced by this PR, but they are worth closing before the feature is promoted beyond experimental.

bedrock_guardrails.py — specifically the write-back block after line 1846 and the structured_messages fallback at line 455.

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py Adds _select_messages_for_apply_guardrail and _locate_message_texts_slice helpers to fix role-based message selection in apply_guardrail; includes one gap in the write-back guard logic for an unlikely edge case.
tests/enterprise/litellm_enterprise/proxy/guardrails/test_bedrock_apply_guardrail.py Adds 5 new regression tests covering tool-ending conversations, no-user-message skip, positional write-back, and an end-to-end handler integration test; all mocked correctly with no real network calls.

Reviews (1): Last reviewed commit: "test(bedrock_guardrails): cover masking ..." | Re-trigger Greptile

Comment on lines +1858 to +1862
elif scanned_role_subset and len(masked_texts) != len(texts):
# Scanned a role-selected subset but could not map it back to
# flat-text positions — keep the original texts rather than
# misapply masked content to the wrong message.
verbose_proxy_logger.warning(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unguarded write-back when scanned_role_subset=True, scanned_slice=None, and lengths happen to match

The guard at line 1858 catches the case where a role-subset scan produced a differently-sized masked_texts list. However, there is no branch for when scanned_role_subset=True, scanned_slice=None (i.e. _locate_message_texts_slice failed), and len(masked_texts) == len(texts). In that state the single masked text returned by Bedrock would silently propagate as the sole element of inputs["texts"], clobbering everything downstream in the handler's positional write-back.

In practice this requires len(texts) == 1 simultaneously with a total != len(texts) mismatch in _locate_message_texts_slice, which is a very narrow window. Replacing the length check with a blanket scanned_role_subset and scanned_slice is None guard would close the gap entirely.

Comment on lines +455 to +480
structured_messages = cast(
Optional[List[AllMessageValues]],
inputs.get("structured_messages") or request_data.get("messages"),
)
if input_type != "request" or not structured_messages:
# No role information available (e.g. raw-text callers like
# /guardrails/apply_guardrail) — keep the legacy behavior of
# scanning the latest text only.
filter_result = self._prepare_guardrail_messages_for_role(
messages=mock_messages
)
return ApplyGuardrailMessageSelection(
filtered_messages=filter_result.payload_messages or mock_messages,
scanned_slice=None,
scanned_role_subset=False,
)

latest_user_index = self._find_latest_message_index(
structured_messages, target_role="user"
)
if latest_user_index is None:
verbose_proxy_logger.debug(
"Bedrock Guardrail: no user-role message in request, skipping INPUT scan"
)
return ApplyGuardrailMessageSelection(None, None, True, skip_scan=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fallback to request_data["messages"] may desync structured_messages from texts when skip flags are active

When inputs["structured_messages"] is absent (e.g., a direct /guardrails/apply_guardrail caller), the code falls back to request_data["messages"] — the unfiltered message list. If skip_system_message_in_guardrail or skip_tool_message_in_guardrail is set, texts will have fewer entries than the unfiltered message list, causing _locate_message_texts_slice to detect total != len(texts) and return None. The code then falls into the warning-and-skip branch, which discards the masking.

This is not a regression, but it means the fix silently does nothing for that caller+flag combination. A comment at the fallback site would help future maintainers understand why request_data["messages"] is only safe here when no skip flags are active.

@Sameerlite

Copy link
Copy Markdown
Collaborator

@michelligabriele can you fix the greptile comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: experimental_use_latest_role_message_only erroneously sending tool and assistant prompts to guardrail

2 participants