Fix #8673: SD3 attention masks for text padding tokens#4
Conversation
Standalone test avoids full transformer import chain; verifies padding invariance at processor level. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Joint mask on image self-attention
- JointAttnProcessor2_0 now only prepares and applies the joint attention mask when encoder_hidden_states is present, so the attn2 image-only self-attention path ignores the text mask passed via joint_attention_kwargs.
Or push these changes by commenting:
@cursor push 01eaf44c8f
Preview (01eaf44c8f)
diff --git a/src/diffusers/models/attention_processor.py b/src/diffusers/models/attention_processor.py
--- a/src/diffusers/models/attention_processor.py
+++ b/src/diffusers/models/attention_processor.py
@@ -1513,7 +1513,10 @@
value = torch.cat([value, encoder_hidden_states_value_proj], dim=2)
if attention_mask is not None:
- attention_mask = attn.prepare_joint_attention_mask(attention_mask, key.shape[2], key.dtype)
+ if encoder_hidden_states is not None:
+ attention_mask = attn.prepare_joint_attention_mask(attention_mask, key.shape[2], key.dtype)
+ else:
+ attention_mask = None
hidden_states = F.scaled_dot_product_attention(
query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=FalseYou can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit f41465a. Configure here.
|
|
||
| hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.0, is_causal=False) | ||
| if attention_mask is not None: | ||
| attention_mask = attn.prepare_joint_attention_mask(attention_mask, key.shape[2], key.dtype) |
There was a problem hiding this comment.
Joint mask on image self-attention
High Severity
In JointAttnProcessor2_0, prepare_joint_attention_mask runs whenever attention_mask is set, even when encoder_hidden_states is None. SD3.5 dual-attention blocks pass the same joint_attention_kwargs (including the text mask) into the second JointAttnProcessor2_0 self-attention pass, so image-only keys get a wrongly padded joint mask and incorrect SDPA masking.
Reviewed by Cursor Bugbot for commit f41465a. Configure here.



Summary
Attention.prepare_joint_attention_mask()for SD3's[hidden_states, encoder_hidden_states]concat orderJointAttnProcessor2_0andFusedJointAttnProcessor2_0Fixes huggingface#8673
Test plan
pytest tests/models/test_sd3_joint_attention_mask.py -qpython utils/check_copies.pyMade with Cursor