Skip to content

Fix #8673: SD3 attention masks for text padding tokens#4

Open
srlynch1 wants to merge 2 commits into
mainfrom
e2e/diffusers-8673
Open

Fix #8673: SD3 attention masks for text padding tokens#4
srlynch1 wants to merge 2 commits into
mainfrom
e2e/diffusers-8673

Conversation

@srlynch1

Copy link
Copy Markdown
Owner

Summary

  • Add Attention.prepare_joint_attention_mask() for SD3's [hidden_states, encoder_hidden_states] concat order
  • Wire mask into JointAttnProcessor2_0 and FusedJointAttnProcessor2_0
  • Add padding-invariance tests (full transformer + processor-level)

Fixes huggingface#8673

Test plan

  • pytest tests/models/test_sd3_joint_attention_mask.py -q
  • python utils/check_copies.py

Made with Cursor

srlynch1 and others added 2 commits June 21, 2026 21:25
Standalone test avoids full transformer import chain; verifies padding invariance at processor level.

Co-authored-by: Cursor <cursoragent@cursor.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Joint mask on image self-attention
    • JointAttnProcessor2_0 now only prepares and applies the joint attention mask when encoder_hidden_states is present, so the attn2 image-only self-attention path ignores the text mask passed via joint_attention_kwargs.

Create PR

Or push these changes by commenting:

@cursor push 01eaf44c8f
Preview (01eaf44c8f)
diff --git a/src/diffusers/models/attention_processor.py b/src/diffusers/models/attention_processor.py
--- a/src/diffusers/models/attention_processor.py
+++ b/src/diffusers/models/attention_processor.py
@@ -1513,7 +1513,10 @@
             value = torch.cat([value, encoder_hidden_states_value_proj], dim=2)
 
         if attention_mask is not None:
-            attention_mask = attn.prepare_joint_attention_mask(attention_mask, key.shape[2], key.dtype)
+            if encoder_hidden_states is not None:
+                attention_mask = attn.prepare_joint_attention_mask(attention_mask, key.shape[2], key.dtype)
+            else:
+                attention_mask = None
 
         hidden_states = F.scaled_dot_product_attention(
             query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit f41465a. Configure here.


hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.0, is_causal=False)
if attention_mask is not None:
attention_mask = attn.prepare_joint_attention_mask(attention_mask, key.shape[2], key.dtype)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joint mask on image self-attention

High Severity

In JointAttnProcessor2_0, prepare_joint_attention_mask runs whenever attention_mask is set, even when encoder_hidden_states is None. SD3.5 dual-attention blocks pass the same joint_attention_kwargs (including the text mask) into the second JointAttnProcessor2_0 self-attention pass, so image-only keys get a wrongly padded joint mask and incorrect SDPA masking.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f41465a. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Attention masks are missing in SD3 to mask out text padding tokens

1 participant