[Kernel] feat: Add MXFP6-E2M3 activation support to mixed_moe_gemm_2stage#709
Open
amd-satre wants to merge 2 commits into
Open
[Kernel] feat: Add MXFP6-E2M3 activation support to mixed_moe_gemm_2stage#709amd-satre wants to merge 2 commits into
amd-satre wants to merge 2 commits into
Conversation
055c64e to
74651d6
Compare
…tage Extends the MoE stage-1 (gate+up) and stage-2 (down) GEMM kernels to accept MXFP6-E2M3 (6-bit, E2M3, block-32 E8M0 scale) activations paired with MXFP4-E2M1 weights, exposed as a_dtype="fp6" in both compile_mixed_moe_gemm1 and compile_mixed_moe_gemm2. Kernel changes (kernels/mixed_moe_gemm_2stage.py): - is_f6_a / is_f4_or_f6_a flags; a_dtype validation extended to "fp6" - cbsz=2 for MXFP6-E2M3 A (vs cbsz=4 for MXFP4, cbsz=0 for FP8) - a_per_lane_kpack_bytes=32 for fp6: cbsz=2 MFMA reads A in FP8-padded layout — 24 B of packed FP6 codes + 8 B zero pad per K=32 block - Three LDS loads per K-block to fill the 32-byte A register slot; 4th slot zero-filled (cbsz=2 MFMA discards it) - a_elem_vec_pack stays 1 for fp6 (1 stored byte per logical element) Test infrastructure (tests/): - fp4_utils.py: fp6_e2m3_to_f32 (LUT-based E2M3 decoder) and per_1x32_f6_quant (returns a_pad, scale, a_unpacked) - test_ref.py: _dequant_mxfp6_per_1x32; a2_kind override on torch_moe_gemm2 to select mxfp6 dequant without dtype-shape ambiguity - test_moe_gemm.py: test_moe_stage2_standalone parametrized with a6w4 (gfx950+); reference comparison enabled via a_unpacked (no skip_ref); realworld shapes: Mixtral-8x7B, Qwen3-30B-A3B (T=128/512) Signed-off-by: Shreyas Atre <satre@amd.com> Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
74651d6 to
780f08d
Compare
Collaborator
|
@amd-satre why added fp6 moe here? We never heard of such model configs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Support 6 bits for A operand/activations in moe gemm
Technical Details
Adds
a_dtype="fp6"(MXFP6-E2M3) activation support to both stage-1 (gate+up) and stage-2 (down) MoE grouped GEMMs inmixed_moe_gemm_2stage.py, enabling W_MXFP4_A_MXFP6 inference on gfx950 (CDNA4).Test Plan
test_moe_stage2_standalone[a6w4-*]Test Result
Passed
Submission Checklist
Signed-off-by: Shreyas Atre satre@amd.com
Co-Authored-By: Claude Sonnet 4.6 (1M context) noreply@anthropic.com