Skip to content

[AMD] Add MiniMax-M3-FP8 MI355X ATOMESH#1865

Merged
functionstackx merged 34 commits into
mainfrom
amd/atom_mesh_0619_m3_fp8
Jun 25, 2026
Merged

[AMD] Add MiniMax-M3-FP8 MI355X ATOMESH#1865
functionstackx merged 34 commits into
mainfrom
amd/atom_mesh_0619_m3_fp8

Conversation

@seungrokj

@seungrokj seungrokj commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add minimaxm3-fp8-mi355x-atom-disagg CI recipe: multi-node disaggregated prefill-decode on MI355X via ATOM for MiniMax-M3-MXFP8
  • Align server settings with slurm reference script: MEM_FRAC_STATIC=0.8, MAX_NUM_SEQS=128, BLOCK_SIZE=128, MAX_MODEL_LEN=32768, KV_CACHE_DTYPE=auto
  • server_atom.sh: fix _MAX_CONC assignment before cudagraph size check; gate ATOM_MOE_GU_ITLV and AITER_BF16_FP8_MOE_BOUND on DeepSeek-V4-Pro only; use ${KV_CACHE_DTYPE:-fp8} default
  • Search space: ISL=8192 and ISL=1024, 1P1D TP4, conc 1–512

Test plan

  • CI sweep on mi355x-disagg runner triggers correctly
  • --kv-cache-dtype is not passed when KV_CACHE_DTYPE=auto
  • Decode node cudagraph sizes scale with max concurrency

🤖 Generated with Claude Code


Note

Medium Risk
Touches shared multi-node ATOM launch paths (server_atom.sh, env_atom.sh, job.slurm) used by other disagg recipes; behavior changes are mostly gated by model name but could affect non–DSv4 atom-disagg runs.

Overview
Adds minimaxm3-fp8-mi355x-atom-disagg to AMD CI: multi-node 1P1D prefill–decode on MI355X with ATOM + mooncake, TP4, 1k/1k and 8k/1k, concurrency 1–512, via new launcher minimaxm3_fp8_mi355x_atom-disagg.sh (reference tuning: MEM_FRAC_STATIC=0.8, block size 128, MAX_MODEL_LEN=32768, KV_CACHE_DTYPE=auto, no MTP).

server_atom.sh is generalized for non–DeepSeek-V4 models: MEM_FRACTIONMEM_FRAC_STATIC, optional MAX_MODEL_LEN / MAX_NUM_BATCHED_TOKENS, MTP/spec args, KV dtype omitted when auto, model-specific parallel flags (DSv4 TBO/HF overrides vs AITER_QUICK_REDUCE_QUANTIZATION=INT4 elsewhere), and eval-built server commands. env_atom.sh applies ATOM_MOE_GU_ITLV / AITER_BF16_FP8_MOE_BOUND only for DeepSeek-V4-Pro. job.slurm passes BENCH_REQUEST_RATE and the new atom-disagg env vars; models_atom.yaml registers MiniMax-M3 MXFP4/MXFP8. bench.sh uses --dsv4 for MTP on DeepSeek-V4-Pro. perf-changelog.yaml documents the recipe.

Reviewed by Cursor Bugbot for commit c363d91. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

3 similar comments
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

@seungrokj seungrokj changed the title feat: MiniMax-M3 MXFP8 MI355X ATOM disaggregated PD benchmark [AMD] Add MiniMax-M3-FP8 MI355X ATOMMESH Jun 20, 2026
Comment thread benchmarks/multi_node/minimaxm3_fp4_mi355x_atom-disagg.sh

@functionstackx functionstackx left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review it in an bit but seems like we need to merge vllm disagg first?

i thought we chatted about this before about sglang/vllm native engine first back in april 17 & your thumbs up means u acholwdege the guidelines

#1043 (comment)

Image

@github-actions

Copy link
Copy Markdown
Contributor

Comment thread .github/configs/amd-master.yaml
Comment thread benchmarks/multi_node/amd_utils/server_atom.sh
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit cb168e4. Configure here.

Comment thread .github/configs/amd-master.yaml
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

Comment thread benchmarks/multi_node/minimaxm3_fp4_mi355x_atom-disagg.sh
@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx added the all-evals Expand eval selection to every fixed-sequence config label Jun 21, 2026
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@Oseltamivir Oseltamivir added the evals-only Suppress throughput and run only eval jobs; combine with all-evals to expand selection label Jun 21, 2026
seungrokj and others added 20 commits June 24, 2026 17:21
"${ARRAY[@]}" inside a double-quoted assignment breaks bash -n's quote
parser. Since all three CMD strings are passed to eval, ${ARRAY[*]}
is equivalent — eval handles word splitting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- benchmarks/multi_node/minimaxm3_fp4_mi355x_atom-disagg.sh: new CI entry
  point for MiniMax-M3-MXFP4, mirroring dsv4_fp4_mi355x_atom-disagg.sh.
  No MTP (SPEC_DECODING=none), KV_CACHE_DTYPE=auto (no fp8),
  MAX_MODEL_LEN/MAX_NUM_BATCHED_TOKENS=32768.

- server_atom.sh: make --kv_cache_dtype conditional (skipped when
  KV_CACHE_DTYPE is empty or "auto"); add MAX_MODEL_LEN,
  MAX_NUM_BATCHED_TOKENS, CUDAGRAPH_OPT support (prefill+decode for
  model-len args; decode-only for cudagraph).

- job.slurm: pass MAX_MODEL_LEN, MAX_NUM_BATCHED_TOKENS, CUDAGRAPH_OPT
  through Docker env for atom-disagg engine.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…UCE_QUANTIZATION=INT4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also remove CUDAGRAPH_OPT from job.slurm (linter cleanup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…isagg.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r default)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e --enable-tbo for non-DSv4 models

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…YPE default to empty for minimaxm3 disagg

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ode node

- Change runner from mi355x to mi355x-disagg in amd-master.yaml for minimaxm3-fp4 disagg
- Add dynamic CUDAGRAPH_SIZES selection in server_atom.sh based on max concurrency thresholds (512/1024/2048)
- Pass --cudagraph-capture-sizes to decode node server args

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…4-Pro only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use ${KV_CACHE_DTYPE-fp8} so empty string (set by minimaxm3 script) is
left as-is, avoiding unintended --kv-cache-dtype pass-through.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dtype flag

Set KV_CACHE_DTYPE to auto in minimaxm3_fp4_mi355x_atom-disagg.sh and
revert server_atom.sh to use :- expansion (auto is explicitly excluded
from KV_CACHE_ARG in server_atom.sh, so the flag is not passed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- disagg.sh: export MEM_FRAC_STATIC=0.8 and MAX_NUM_SEQS=128
- server_atom.sh: fix missing _MAX_CONC assignment before cudagraph size check
- amd-master.yaml: trim ISL=8192 to 1P1D only, cap conc at 512 for both ISLs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ove stale perf-changelog entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- amd-master.yaml: bump image to rocm/atom-dev:MiniMax-M3-20260622
- minimaxm3_fp8_mi355x_atom-disagg.sh: unconditionally set MAX_MODEL_LEN=32768
- server_atom.sh: minor comment cleanup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@functionstackx functionstackx force-pushed the amd/atom_mesh_0619_m3_fp8 branch from 43ee069 to c0a813b Compare June 24, 2026 21:23
@functionstackx

Copy link
Copy Markdown
Collaborator

hi @seungrokj @andyluo7 @chunfangamd

vLLM FP8 Disagg M3 has merged so we can merge this PR once yall has followed the PR review process and properly filled in the PR review form so that the signoff CI verification can trigger https://github.com/SemiAnalysisAI/InferenceX/blob/main/.github/workflows/codeowner-signoff-verify.yml

can u please use this PR CHECKLIST template for reviewing it https://github.com/SemiAnalysisAI/InferenceX/blob/main/docs/PR_REVIEW_CHECKLIST.md

here is an example of how to do it #1891 (comment)

If you don't follow the PR CHECKLIST template including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have in ur approval comment, it wouldn't trigger our codeowner signoff CI verifcation https://github.com/SemiAnalysisAI/InferenceX/blob/main/.github/workflows/codeowner-signoff-verify.yml

@Klaud-Cold

Copy link
Copy Markdown
Collaborator

@functionstackx BLOCKED — not a CODEOWNER for the AMD config changed here, no executed sweep/eval on any in-PR commit, and no recipe link. The signed comment is also a request to the AMD reviewers to fill in the checklist, not a completed sign-off.

  • Check 0 (CODEOWNER): FAIL — PR changes .github/configs/amd-master.yaml, owned by @billishyahao @chunfangamd @seungrokj @yctseng0211 @1am9trash. functionstackx is not in that list and is not listed anywhere in CODEOWNERS. Sign-off must come from an AMD owner.
  • Check 1 (green executed sweep+eval on an in-PR commit): FAIL — No in-PR commit has green, executed multi-node */ + eval / check-runs. Head c0a813b has them all skipped (reuse run, executed nothing on this commit: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28130510662); no other of the 33 PR commits has any benchmark/eval check-run. Fix: run (or re-anchor via /reuse-sweep-run) a passing full sweep on a commit currently in this PR.
  • Check 2 (eval accuracy): FAIL — no passing eval run exists on any in-PR commit, so accuracy cannot be verified.
  • Check 3 (recipe linked + complete): FAIL — the sign-off comment contains no recipe link (no vllm-project/recipes, sgl-project/sglang, recipes.vllm.ai, or docs.sglang.io). It also lacks the required checklist/As a PR reviewer and CODEOWNER, I have reviewed this sign-off text.

@billishyahao

billishyahao commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

As a PR reviewer and CODEOWNER, I have reviewed this and have:

Additional detail section:

This is a multi-node FP8 disaggregated ATOMesh config (week-1 MI355X support); no single-node recipe PR required. vLLM equivalent merged in #1762.

Signed: billishyahao

@seungrokj

Copy link
Copy Markdown
Collaborator Author

@functionstackx can you please approve this ?

@Klaud-Cold

Copy link
Copy Markdown
Collaborator

@billishyahao Not ready to merge: no passing sweep/eval exists on any commit in this PR, and the sign-off has no public recipe link.

  • Check 0 PASS — billishyahao is a named owner of .github/configs/amd-master.yaml; remaining paths fall under the * @InferenceX/core catch-all and are covered.
  • Check 1 FAIL — No passing sweep/eval on any commit currently in this PR. Every benchmark/eval check-run (single-node */, multi-node */, eval /) is skipped on all 34 PR commits (head a2e7439); only the reuse-sweep-gate gate is green. The signed-off run 27996171132 executed on fa89765aa9c838c6143e1b2d16dd71bccac40cbc, which is not in GET /pulls/1865/commits (rebased out) — validate_reusable_run will reject it. Fix: run or /reuse-sweep-run a full green sweep on a commit currently in the PR.
  • Check 2 BLOCKED — No valid in-PR eval run to verify accuracy against; depends on Check 1.
  • Check 3 FAIL — The Additional detail section contains no link to a public recipe (vllm-project/recipes or sgl-project/sglang). It only references InferenceX PR minimaxm3-fp8-mi355x-vllm-disagg #1762, which is an internal PR, not a published/public recipe. A recipe link is required even for a multi-node config.

@functionstackx functionstackx merged commit 4c021b1 into main Jun 25, 2026
26 checks passed
@functionstackx functionstackx deleted the amd/atom_mesh_0619_m3_fp8 branch June 25, 2026 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

all-evals Expand eval selection to every fixed-sequence config AMD full-sweep-enabled

Projects

Development

Successfully merging this pull request may close these issues.

6 participants