Skip to content

Attention masks are missing in SD3 to mask out text padding tokens #8673

Description

@reminisce

Describe the bug

In the attention implementation of SD3, attention masks currently are not used. This will result in inconsistent outputs for the different values max_seq_length where padding exists in text tokens as the attention scores of padding tokens are non-zero. This issue has been discussed in #8628, and is created to track the progress of fixing this problem.

Thanks @sayakpaul for the discussion.

Reproduction

n/a

Logs

No response

System Info

n/a

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions