-
Notifications
You must be signed in to change notification settings - Fork 208
[AMD] Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-reduce #1946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
chunfangamd
wants to merge
2
commits into
main
Choose a base branch
from
chun_hongxia/minimaxm3_fp8
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+12
−1
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 This PR bumps the
minimaxm3-fp8-mi355x-vllmimage and addsVLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6to both the non-MTP and MTP bench scripts, but does not append aperf-changelog.yamlentry — AGENTS.md (§Updating Docker images, lines 124-135) requires one for both kinds of change, and changelog entries are what trigger the benchmark sweep. Without an entry the new image+INT6 combination will land unbenchmarked, so the PR description's throughput claim cannot be validated. Append an entry under config-keysminimaxm3-fp8-mi355x-vllm(image pin + INT6) andminimaxm3-fp8-mi355x-vllm-mtp(the MTP script also gets the INT6 env var) — see #1941 (the directly analogous MTP image bump to the same nightly) for the precedent.Extended reasoning...
What the bug is
AGENTS.mdlines 124-135 (§Updating Docker images) state explicitly: "Update the image tag in the relevant.github/configs/*-master.yamland/orbenchmarks/*.sh, update any related env vars / config params, and append aperf-changelog.yamlentry (required - triggers benchmarks)". Line 58 of the same doc reiterates: "Changes toperf-changelog.yamltrigger benchmark runs".This PR does both of the change classes the policy enumerates:
.github/configs/amd-master.yamlline 2528:vllm/vllm-openai-rocm:minimax-m3→vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e.VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6exported in bothbenchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x.sh(line 34) andbenchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x_mtp.sh(line 64).The PR diff modifies exactly three files (
amd-master.yaml+ the two.shscripts); noperf-changelog.yamlentry is added.Why this matters / impact
perf-changelog.yamlis the trigger for the sweep generator. Without an entry, this PR will not produce a benchmark run for the new image+INT6 combination, so the PR description's claim — "improving TP communication throughput for the mxfp8 workload" — lands unvalidated. That is precisely the failure mode the policy is designed to prevent.Sibling-PR precedent
The tail of
perf-changelog.yamlshows every recent sibling MiniMax-M3 PR followed this convention:3f5a1e1733...) — appended an entry underminimaxm3-fp8-mi355x-vllm-mtp.This PR is the missing twin to #1941 (it pins
-vllmto the same nightly that #1941 pinned-vllm-mtpto), and additionally exports INT6 quick-reduce in both scripts — yet no changelog entry exists.Step-by-step proof
git difffor this PR returns three files:amd-master.yaml,minimaxm3_fp8_mi355x.sh,minimaxm3_fp8_mi355x_mtp.sh— noperf-changelog.yaml.amd-master.yamlline 2528 confirms the image string change for theminimaxm3-fp8-mi355x-vllmconfig-key.grep -n VLLM_ROCM_QUICK_REDUCE_QUANTIZATION benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x*.shshows the env var exported at line 34 of the non-MTP script and line 64 of the MTP script.perf-changelog.yamlentry is required and triggers benchmarks; line 58 confirms the trigger mechanism.perf-changelog.yamlis PR [codex] update MiniMax M3 FP8 MI355X vLLM MTP image #1941 — the analogous image bump to the same nightly hash on the sibling MTP config. It is on the list of sibling MiniMax-M3 PRs that all appended entries.Fix
Append an entry like the following (note the MTP script also picks up INT6, so the entry should cover both config-keys, or use a
minimaxm3-fp8-mi355x-vllm*wildcard):