Describe the bug
In the attention implementation of SD3, attention masks currently are not used. This will result in inconsistent outputs for the different values max_seq_length where padding exists in text tokens as the attention scores of padding tokens are non-zero. This issue has been discussed in #8628, and is created to track the progress of fixing this problem.
Thanks @sayakpaul for the discussion.
Reproduction
n/a
Logs
No response
System Info
n/a
Who can help?
No response
Describe the bug
In the attention implementation of SD3, attention masks currently are not used. This will result in inconsistent outputs for the different values
max_seq_lengthwhere padding exists in text tokens as the attention scores of padding tokens are non-zero. This issue has been discussed in #8628, and is created to track the progress of fixing this problem.Thanks @sayakpaul for the discussion.
Reproduction
n/a
Logs
No response
System Info
n/a
Who can help?
No response