[WIP][Megatron-LM] feat: reduce extra qkv transpose in attn by RuibinCheung · Pull Request #625 · AMD-AGI/Primus

RuibinCheung · 2026-03-24T07:26:32Z

Keep input layout to SBHD layout to reduce extra q,k,v transpose in attention.

Copilot

Pull request overview

This PR updates the Megatron Primus-Turbo attention wrapper to keep tensors in an SBHD-oriented flow and adjust the attention call/output reshaping to reduce redundant QKV layout transposes.

Changes:

Tightens allowed qkv_format handling to a single supported layout.
Updates Q/K/V layout conversion before calling flash_attn_*.
Updates output layout conversion back to SBHD and reshapes to (S, B, hidden).

Copilot · 2026-03-24T07:31:16Z

+        # NOTE(ruibin): The layout of q, k and v is (S, B, H, D). But attn accept the shape of qkv is (B, S, H, D).
+        query, key, value = [x.permute(1, 0, 2, 3) for x in (query, key, value)]


The NOTE has a couple grammar issues that make it harder to parse quickly (e.g., “attn accept” / “shape of qkv is”). Consider rephrasing to something like: “Q/K/V come in (S, B, H, D), but flash_attn_* expects (B, S, H, D).”

Copilot · 2026-03-24T07:31:16Z

+        SUPPORTED_QKV_FORMATS = "sbhd"
+
        packed_seq_kwargs = (
            {key: getattr(packed_seq_params, key) for key in self.kept_packed_seq_params}
            if packed_seq_params is not None
            else {}
        )

        qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)
-        assert qkv_format in ("sbhd", "bhsd"), "qkv_format only support bshd, but got {qkv_format}"
-        if qkv_format == "sbhd":
-            query, key, value = [x.transpose(0, 1).contiguous() for x in (query, key, value)]
+        assert (
+            qkv_format in SUPPORTED_QKV_FORMATS
+        ), f"qkv_format only support {SUPPORTED_QKV_FORMATS}, but got {qkv_format}"


SUPPORTED_QKV_FORMATS is a string, so qkv_format in SUPPORTED_QKV_FORMATS performs substring matching (e.g., an invalid value like "bhd" would pass because it’s contained in "sbhd"). Make this a container of allowed formats (e.g., a tuple/set) or change to a single SUPPORTED_QKV_FORMAT equality check so validation is correct.

Copilot · 2026-03-24T07:31:17Z

+            qkv_format in SUPPORTED_QKV_FORMATS
+        ), f"qkv_format only support {SUPPORTED_QKV_FORMATS}, but got {qkv_format}"
+        # NOTE(ruibin): The layout of q, k and v is (S, B, H, D). But attn accept the shape of qkv is (B, S, H, D).
+        query, key, value = [x.permute(1, 0, 2, 3) for x in (query, key, value)]


permute(1, 0, 2, 3) returns a non-contiguous view, but previously this path called .contiguous() after transposing. If pt.ops.flash_attn_* expects contiguous Q/K/V (or incurs an implicit copy), this can cause a runtime error or unexpected perf regressions. Consider using transpose(0, 1).contiguous() (same layout change) or permute(...).contiguous() before calling self.attn.

Suggested change

query, key, value = [x.permute(1, 0, 2, 3) for x in (query, key, value)]

query, key, value = [x.transpose(0, 1).contiguous() for x in (query, key, value)]

…625)" This reverts commit 7665157.

…641) Reverts #625

feat: reduce extra qkv transpose in attn

0c9c9a4

RuibinCheung requested review from Xiaoming-AMD, limou102 and wenxie-amd as code owners March 24, 2026 07:26

Copilot AI review requested due to automatic review settings March 24, 2026 07:26

RuibinCheung changed the title ~~[Megatron-LM] feat: reduce extra qkv transpose in attn~~ [WIP][Megatron-LM] feat: reduce extra qkv transpose in attn Mar 24, 2026

Copilot started reviewing on behalf of RuibinCheung March 24, 2026 07:28 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Xiaoming-AMD approved these changes Mar 31, 2026

View reviewed changes

Merge branch 'main' into dev/zhangrb/refine_turbo_attn

9288ee0

Xiaoming-AMD merged commit 7665157 into main Mar 31, 2026
2 checks passed

Xiaoming-AMD added a commit that referenced this pull request Apr 1, 2026

Revert "[WIP][Megatron-LM] feat: reduce extra qkv transpose in attn (#…

4b1dca6

…625)" This reverts commit 7665157.

Xiaoming-AMD mentioned this pull request Apr 1, 2026

Revert "[WIP][Megatron-LM] feat: reduce extra qkv transpose in attn" #641

Merged

Xiaoming-AMD added a commit that referenced this pull request Apr 1, 2026

Revert "[WIP][Megatron-LM] feat: reduce extra qkv transpose in attn" (#…

b61cddc

…641) Reverts #625

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Megatron-LM] feat: reduce extra qkv transpose in attn#625

[WIP][Megatron-LM] feat: reduce extra qkv transpose in attn#625
Xiaoming-AMD merged 2 commits intomainfrom
dev/zhangrb/refine_turbo_attn

RuibinCheung commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# NOTE(ruibin): The layout of q, k and v is (S, B, H, D). But attn accept the shape of qkv is (B, S, H, D).
		query, key, value = [x.permute(1, 0, 2, 3) for x in (query, key, value)]

	query, key, value = [x.permute(1, 0, 2, 3) for x in (query, key, value)]
	query, key, value = [x.transpose(0, 1).contiguous() for x in (query, key, value)]

Conversation

RuibinCheung commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants