fix(bedrock_guardrails): select latest user message by original role in apply_guardrail#30482
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR fixes a message-selection bug in
Confidence Score: 4/5Safe to merge for the core bug fix; the new role-selection and slice write-back logic is well-tested, the fallback paths preserve pre-existing behaviour, and no real network calls are made in tests. The fix is tightly scoped and the five new regression tests cover all the main scenarios. Two minor gaps exist: the write-back guard is not exhaustive for the edge case where scanned_role_subset=True, scanned_slice=None, and lengths happen to equal; and the fallback to request_data messages can silently skip masking when skip-message flags are active for direct API callers. Neither is a regression introduced by this PR, but they are worth closing before the feature is promoted beyond experimental. bedrock_guardrails.py — specifically the write-back block after line 1846 and the structured_messages fallback at line 455.
|
| Filename | Overview |
|---|---|
| litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py | Adds _select_messages_for_apply_guardrail and _locate_message_texts_slice helpers to fix role-based message selection in apply_guardrail; includes one gap in the write-back guard logic for an unlikely edge case. |
| tests/enterprise/litellm_enterprise/proxy/guardrails/test_bedrock_apply_guardrail.py | Adds 5 new regression tests covering tool-ending conversations, no-user-message skip, positional write-back, and an end-to-end handler integration test; all mocked correctly with no real network calls. |
Reviews (1): Last reviewed commit: "test(bedrock_guardrails): cover masking ..." | Re-trigger Greptile
| elif scanned_role_subset and len(masked_texts) != len(texts): | ||
| # Scanned a role-selected subset but could not map it back to | ||
| # flat-text positions — keep the original texts rather than | ||
| # misapply masked content to the wrong message. | ||
| verbose_proxy_logger.warning( |
There was a problem hiding this comment.
Unguarded write-back when
scanned_role_subset=True, scanned_slice=None, and lengths happen to match
The guard at line 1858 catches the case where a role-subset scan produced a differently-sized masked_texts list. However, there is no branch for when scanned_role_subset=True, scanned_slice=None (i.e. _locate_message_texts_slice failed), and len(masked_texts) == len(texts). In that state the single masked text returned by Bedrock would silently propagate as the sole element of inputs["texts"], clobbering everything downstream in the handler's positional write-back.
In practice this requires len(texts) == 1 simultaneously with a total != len(texts) mismatch in _locate_message_texts_slice, which is a very narrow window. Replacing the length check with a blanket scanned_role_subset and scanned_slice is None guard would close the gap entirely.
| structured_messages = cast( | ||
| Optional[List[AllMessageValues]], | ||
| inputs.get("structured_messages") or request_data.get("messages"), | ||
| ) | ||
| if input_type != "request" or not structured_messages: | ||
| # No role information available (e.g. raw-text callers like | ||
| # /guardrails/apply_guardrail) — keep the legacy behavior of | ||
| # scanning the latest text only. | ||
| filter_result = self._prepare_guardrail_messages_for_role( | ||
| messages=mock_messages | ||
| ) | ||
| return ApplyGuardrailMessageSelection( | ||
| filtered_messages=filter_result.payload_messages or mock_messages, | ||
| scanned_slice=None, | ||
| scanned_role_subset=False, | ||
| ) | ||
|
|
||
| latest_user_index = self._find_latest_message_index( | ||
| structured_messages, target_role="user" | ||
| ) | ||
| if latest_user_index is None: | ||
| verbose_proxy_logger.debug( | ||
| "Bedrock Guardrail: no user-role message in request, skipping INPUT scan" | ||
| ) | ||
| return ApplyGuardrailMessageSelection(None, None, True, skip_scan=True) | ||
|
|
There was a problem hiding this comment.
Fallback to
request_data["messages"] may desync structured_messages from texts when skip flags are active
When inputs["structured_messages"] is absent (e.g., a direct /guardrails/apply_guardrail caller), the code falls back to request_data["messages"] — the unfiltered message list. If skip_system_message_in_guardrail or skip_tool_message_in_guardrail is set, texts will have fewer entries than the unfiltered message list, causing _locate_message_texts_slice to detect total != len(texts) and return None. The code then falls into the warning-and-skip branch, which discards the masking.
This is not a regression, but it means the fix silently does nothing for that caller+flag combination. A comment at the fallback site would help future maintainers understand why request_data["messages"] is only safe here when no skip flags are active.
|
@michelligabriele can you fix the greptile comments? |
Relevant issues
Fixes #23476
Supersedes #25355 — that PR guarded at content-build time, but the role information is already lost upstream at message-selection time, so it did not stop the leak. This fixes the selection itself.
Linear ticket
Pre-Submission checklist
make test-unit— ran the affected enterprise guardrail suite locally (14 passed); fullmake test-unitdeferred to CI@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewScreenshots / Proof of Fix
With
experimental_use_latest_role_message_onlyenabled, the unifiedapply_guardrailpath scanned the latest message of any role as the BedrockINPUT, rather than the latest user-role message. In tool-calling conversations ending in atool/assistantmessage, that non-user content was sent to the ApplyGuardrailINPUTscan.The new regression tests reproduce this on the pre-fix code and pass with the fix.
Before the fix (conversation ending in a tool message — the INPUT scan receives the tool result):
After the fix — the latest original-role user message is the only content scanned, and masked content is written back to that message's position only (system/assistant/tool untouched):
Type
🐛 Bug Fix
✅ Test
Changes
apply_guardrailnow selects the latest message whose original role isuserfrominputs["structured_messages"](falling back torequest_data["messages"]), instead of wrapping every flattened text asrole="user"and taking the latest of any role.INPUTscan entirely when there is no user-role message or the latest user message has no text content.OpenAIChatCompletionsHandler.process_input_messages(only the network call mocked) asserting the mask lands on the user message and other roles are left unchanged.