Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2525,7 +2525,7 @@ dsv4-fp4-mi355x-atom-disagg:
# https://github.com/vllm-project/recipes/commit/2a3728ed9892debfd767a72a58ebc90b33f186e5
# MXFP8 runs from TP=4 on gfx950; block size 128 is mandatory for MSA.
minimaxm3-fp8-mi355x-vllm:
image: vllm/vllm-openai-rocm:minimax-m3
image: vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 This PR bumps the minimaxm3-fp8-mi355x-vllm image and adds VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 to both the non-MTP and MTP bench scripts, but does not append a perf-changelog.yaml entry — AGENTS.md (§Updating Docker images, lines 124-135) requires one for both kinds of change, and changelog entries are what trigger the benchmark sweep. Without an entry the new image+INT6 combination will land unbenchmarked, so the PR description's throughput claim cannot be validated. Append an entry under config-keys minimaxm3-fp8-mi355x-vllm (image pin + INT6) and minimaxm3-fp8-mi355x-vllm-mtp (the MTP script also gets the INT6 env var) — see #1941 (the directly analogous MTP image bump to the same nightly) for the precedent.

Extended reasoning...

What the bug is

AGENTS.md lines 124-135 (§Updating Docker images) state explicitly: "Update the image tag in the relevant .github/configs/*-master.yaml and/or benchmarks/*.sh, update any related env vars / config params, and append a perf-changelog.yaml entry (required - triggers benchmarks)". Line 58 of the same doc reiterates: "Changes to perf-changelog.yaml trigger benchmark runs".

This PR does both of the change classes the policy enumerates:

  1. Image bump in .github/configs/amd-master.yaml line 2528: vllm/vllm-openai-rocm:minimax-m3vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e.
  2. New env var VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 exported in both benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x.sh (line 34) and benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x_mtp.sh (line 64).

The PR diff modifies exactly three files (amd-master.yaml + the two .sh scripts); no perf-changelog.yaml entry is added.

Why this matters / impact

perf-changelog.yaml is the trigger for the sweep generator. Without an entry, this PR will not produce a benchmark run for the new image+INT6 combination, so the PR description's claim — "improving TP communication throughput for the mxfp8 workload" — lands unvalidated. That is precisely the failure mode the policy is designed to prevent.

Sibling-PR precedent

The tail of perf-changelog.yaml shows every recent sibling MiniMax-M3 PR followed this convention:

This PR is the missing twin to #1941 (it pins -vllm to the same nightly that #1941 pinned -vllm-mtp to), and additionally exports INT6 quick-reduce in both scripts — yet no changelog entry exists.

Step-by-step proof

  1. git diff for this PR returns three files: amd-master.yaml, minimaxm3_fp8_mi355x.sh, minimaxm3_fp8_mi355x_mtp.sh — no perf-changelog.yaml.
  2. Inspecting amd-master.yaml line 2528 confirms the image string change for the minimaxm3-fp8-mi355x-vllm config-key.
  3. grep -n VLLM_ROCM_QUICK_REDUCE_QUANTIZATION benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x*.sh shows the env var exported at line 34 of the non-MTP script and line 64 of the MTP script.
  4. AGENTS.md lines 124-126 say a perf-changelog.yaml entry is required and triggers benchmarks; line 58 confirms the trigger mechanism.
  5. The last entry in perf-changelog.yaml is PR [codex] update MiniMax M3 FP8 MI355X vLLM MTP image #1941 — the analogous image bump to the same nightly hash on the sibling MTP config. It is on the list of sibling MiniMax-M3 PRs that all appended entries.
  6. Therefore the new image+INT6 combination will not be swept on merge, and the PR-description throughput claim cannot be validated before landing.

Fix

Append an entry like the following (note the MTP script also picks up INT6, so the entry should cover both config-keys, or use a minimaxm3-fp8-mi355x-vllm* wildcard):

- config-keys:
    - minimaxm3-fp8-mi355x-vllm
    - minimaxm3-fp8-mi355x-vllm-mtp
  description:
    - "Pin minimaxm3-fp8-mi355x-vllm image to nightly-3f5a1e1733200760169ff31ebe60a271072b199e (includes gfx950 mxfp8 moe/linear tuning from vllm-project/vllm#45725)."
    - "Export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 in the standard and MTP bench scripts to use INT6 quick all-reduce on CDNA4/gfx950."
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1946

model: MiniMaxAI/MiniMax-M3-MXFP8
model-prefix: minimaxm3
runner: mi355x
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ fi
SERVER_LOG=/workspace/server.log
export VLLM_ENGINE_READY_TIMEOUT_S=3600
export VLLM_USE_BREAKABLE_CUDAGRAPH=0
export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6

if [ "${EVAL_ONLY}" = "true" ]; then
setup_eval_context
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ export VLLM_ENGINE_READY_TIMEOUT_S=3600
# Run with CUDA graphs (no --enforce-eager): VLLM_USE_BREAKABLE_CUDAGRAPH=0
# avoids the M3-decode breakable-cudagraph path that previously forced eager.
export VLLM_USE_BREAKABLE_CUDAGRAPH=0
export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6

if [ "${EVAL_ONLY}" = "true" ]; then
setup_eval_context
Expand Down
9 changes: 9 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4280,3 +4280,12 @@
- "Update the MiniMax-M3 MXFP8 MI355X vLLM EAGLE3 benchmark image from vllm/vllm-openai-rocm:minimax-m3 to vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e."
- "Benchmark configuration, EAGLE3 draft model, serving flags, and search space are unchanged."
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1941

- config-keys:
- minimaxm3-fp8-mi355x-vllm
- minimaxm3-fp8-mi355x-vllm-mtp
description:
- "Update the MiniMax-M3 MXFP8 MI355X vLLM benchmark image from vllm/vllm-openai-rocm:minimax-m3 to vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e, which includes the gfx950 mxfp8 MoE/linear tuning for MiniMax-M3 (vllm-project/vllm#45725)."
- "Export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 in the standard and EAGLE3 (MTP) bench scripts to use INT6 quick all-reduce on CDNA4/gfx950, reducing TP all-reduce cost for the mxfp8 workload."
- "Benchmark serving flags and search space are otherwise unchanged."
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1946
Loading