Skip to content

[AMD] Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-reduce#1946

Open
chunfangamd wants to merge 2 commits into
mainfrom
chun_hongxia/minimaxm3_fp8
Open

[AMD] Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-reduce#1946
chunfangamd wants to merge 2 commits into
mainfrom
chun_hongxia/minimaxm3_fp8

Conversation

@chunfangamd

Copy link
Copy Markdown
Collaborator

Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-reduce

Pin the minimaxm3-fp8-mi355x-vllm config to nightly
3f5a1e1733200760169ff31ebe60a271072b199e, which includes the gfx950
mxfp8 moe/linear tuning for MiniMax-M3 (vllm-project/vllm#45725).

Export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 in the standard and MTP
bench scripts to use INT6 quick all-reduce on CDNA4/gfx950, improving
TP communication throughput for the mxfp8 workload.

Co-authored with @hongxiayang

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

2 similar comments
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

# MXFP8 runs from TP=4 on gfx950; block size 128 is mandatory for MSA.
minimaxm3-fp8-mi355x-vllm:
image: vllm/vllm-openai-rocm:minimax-m3
image: vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 This PR bumps the minimaxm3-fp8-mi355x-vllm image and adds VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 to both the non-MTP and MTP bench scripts, but does not append a perf-changelog.yaml entry — AGENTS.md (§Updating Docker images, lines 124-135) requires one for both kinds of change, and changelog entries are what trigger the benchmark sweep. Without an entry the new image+INT6 combination will land unbenchmarked, so the PR description's throughput claim cannot be validated. Append an entry under config-keys minimaxm3-fp8-mi355x-vllm (image pin + INT6) and minimaxm3-fp8-mi355x-vllm-mtp (the MTP script also gets the INT6 env var) — see #1941 (the directly analogous MTP image bump to the same nightly) for the precedent.

Extended reasoning...

What the bug is

AGENTS.md lines 124-135 (§Updating Docker images) state explicitly: "Update the image tag in the relevant .github/configs/*-master.yaml and/or benchmarks/*.sh, update any related env vars / config params, and append a perf-changelog.yaml entry (required - triggers benchmarks)". Line 58 of the same doc reiterates: "Changes to perf-changelog.yaml trigger benchmark runs".

This PR does both of the change classes the policy enumerates:

  1. Image bump in .github/configs/amd-master.yaml line 2528: vllm/vllm-openai-rocm:minimax-m3vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e.
  2. New env var VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 exported in both benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x.sh (line 34) and benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x_mtp.sh (line 64).

The PR diff modifies exactly three files (amd-master.yaml + the two .sh scripts); no perf-changelog.yaml entry is added.

Why this matters / impact

perf-changelog.yaml is the trigger for the sweep generator. Without an entry, this PR will not produce a benchmark run for the new image+INT6 combination, so the PR description's claim — "improving TP communication throughput for the mxfp8 workload" — lands unvalidated. That is precisely the failure mode the policy is designed to prevent.

Sibling-PR precedent

The tail of perf-changelog.yaml shows every recent sibling MiniMax-M3 PR followed this convention:

This PR is the missing twin to #1941 (it pins -vllm to the same nightly that #1941 pinned -vllm-mtp to), and additionally exports INT6 quick-reduce in both scripts — yet no changelog entry exists.

Step-by-step proof

  1. git diff for this PR returns three files: amd-master.yaml, minimaxm3_fp8_mi355x.sh, minimaxm3_fp8_mi355x_mtp.sh — no perf-changelog.yaml.
  2. Inspecting amd-master.yaml line 2528 confirms the image string change for the minimaxm3-fp8-mi355x-vllm config-key.
  3. grep -n VLLM_ROCM_QUICK_REDUCE_QUANTIZATION benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x*.sh shows the env var exported at line 34 of the non-MTP script and line 64 of the MTP script.
  4. AGENTS.md lines 124-126 say a perf-changelog.yaml entry is required and triggers benchmarks; line 58 confirms the trigger mechanism.
  5. The last entry in perf-changelog.yaml is PR [codex] update MiniMax M3 FP8 MI355X vLLM MTP image #1941 — the analogous image bump to the same nightly hash on the sibling MTP config. It is on the list of sibling MiniMax-M3 PRs that all appended entries.
  6. Therefore the new image+INT6 combination will not be swept on merge, and the PR-description throughput claim cannot be validated before landing.

Fix

Append an entry like the following (note the MTP script also picks up INT6, so the entry should cover both config-keys, or use a minimaxm3-fp8-mi355x-vllm* wildcard):

- config-keys:
    - minimaxm3-fp8-mi355x-vllm
    - minimaxm3-fp8-mi355x-vllm-mtp
  description:
    - "Pin minimaxm3-fp8-mi355x-vllm image to nightly-3f5a1e1733200760169ff31ebe60a271072b199e (includes gfx950 mxfp8 moe/linear tuning from vllm-project/vllm#45725)."
    - "Export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 in the standard and MTP bench scripts to use INT6 quick all-reduce on CDNA4/gfx950."
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1946

@github-actions

Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant