Swin Transformer: Hierarchical Vision Transformer using Shifted Windows To-Dos Patch partition Patch merging Relative position bias Feature map padding Self-attention in non-overlapped windows Shifted Window based Self-Attention