[Klaud Cold][NVIDIA] feat: MiniMax M3 Day 0 support H200 by functionstackx · Pull Request #1728 · SemiAnalysisAI/InferenceX

functionstackx · 2026-06-12T21:10:26Z

Summary

Day-zero single-node vLLM recipe for MiniMaxAI/MiniMax-M3-MXFP8 on H200, following the official vLLM recipe (https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3). Sibling of the B200 (#1723) / B300 (#1724) / MI355X (#1725) day-zero PRs.

New config key: minimaxm3-fp8-h200-vllm (runner pool: h200)
New script: benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_h200.sh
perf-changelog.yaml entry

Model & image

Model: MiniMaxAI/MiniMax-M3-MXFP8 — 427B total / 26B active MoE with MSA sparse attention, NVIDIA-quantized MXFP8 (~427 GB weights, roughly half of BF16). Verified on HF.
Image: vllm/vllm-openai:minimax-m3 — dedicated day-zero image (M3 support has not shipped in a stable vLLM release). Tag verified on Docker Hub (amd64 + arm64, pushed 2026-06-12).

Recipe details (per recipes.vllm.ai + repo conventions)

--block-size 128 is mandatory: MSA sparse_block_size is 128; the default 16 misaligns sparse indexing.
--language-model-only: the benchmark is text-only, skipping the vision encoder frees VRAM for KV.
Parallelism maps: ep > 1 → TP+EP (--enable-expert-parallel); dp-attn: true → the recipe's "DP8 + Expert Parallel" mode (--data-parallel-size 8 --enable-expert-parallel).
Fixed-seq-len scenarios use the harness-provided MAX_MODEL_LEN = isl + osl + 256 (not the model's full 1M context), and CUDA graph capture is bounded at the next power of two ≥ CONC (--max-cudagraph-capture-size), matching the other day-zero M3 recipes.
VLLM_ENGINE_READY_TIMEOUT_S=3600 (~444 GB weights off shared FS can exceed the default 600 s readiness window) plus a retrying hf download + HF_HUB_OFFLINE=1 serve to dodge the shared-FS WeakFileLock race on day-zero concurrent downloads.

Sweep space

Concurrency/parallelism chosen from the official recipe's serve modes plus existing H200 large-MoE configs (dsv4-fp8-h200-vllm, minimaxm2.5-fp8-h200-vllm). On 8x H200 (1128 GB), TP8 leaves ~70 GB/GPU of KV headroom; TP4 (~112 GB weights/GPU) is memory-tight and only swept at low/mid concurrency.

Scenario	Search space
1k1k	TP4 (1–64), TP4+EP4 (128–256), TP8 (1–128), TP8+EP8 (256–512), DEP8 (256–1024)
8k1k	TP4 (1–32), TP8 (1–128), TP8+EP8 (256), DEP8 (256–512)

Validated with generate_sweep_configs.py test-config → 39 sweep points, including eval entries.

🤖 Generated with Claude Code

Note

Low Risk
Additive benchmark harness and CI config only; no changes to production serving, auth, or application runtime paths.

Overview
Adds day-zero single-node benchmarking for MiniMax-M3 MXFP8 on H200 via vLLM, aligned with the official vLLM recipe.

A new config key minimaxm3-fp8-h200-vllm in nvidia-master.yaml points at vllm/vllm-openai:minimax-m3 and MiniMaxAI/MiniMax-M3-MXFP8, with fixed-seq-len sweeps at 1k/1k and 8k/1k over TP4/TP8, TP+EP, and DP-attention + EP concurrency ranges tuned for H200 memory headroom.

The companion script minimaxm3_fp8_h200.sh implements serve flags required for M3 (--block-size 128, --language-model-only), maps dp-attn / EP to the right vLLM parallel args, bounds CUDA graph capture to concurrency, retries large hf download on shared FS lock races, and extends engine readiness timeout for the ~444 GB checkpoint. perf-changelog.yaml documents the new config.

^{Reviewed by Cursor Bugbot for commit 086f643. Bugbot is set up for automated code reviews on this repo. Configure here.}

Day-zero single-node vLLM recipe for MiniMaxAI/MiniMax-M3-MXFP8 on H200, following https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3. Uses the dedicated vllm/vllm-openai:minimax-m3 image (M3 has not shipped in a stable vLLM release). Sweeps TP4/TP8, TP+EP, and DP-attention+EP at 1k1k and 8k1k. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-12T21:10:34Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-12T21:10:34Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-12T21:14:21Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27443411976
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27443411976

github-actions · 2026-06-12T21:21:08Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27443433612
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27443433612

functionstackx requested a review from a team June 12, 2026 21:10

functionstackx requested review from jgangani and kedarpotdar-nv as code owners June 12, 2026 21:10

github-project-automation Bot added this to InferenceMAX Board Jun 12, 2026

perf-changelog: fill in PR link for minimaxm3-fp8-h200-vllm

086f643

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx added the full-sweep-enabled label Jun 12, 2026

minimaxm3-fp8-h200-vllm: start TP-only sweeps at conc 1

18c9c26

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx closed this Jun 12, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Klaud Cold][NVIDIA] feat: MiniMax M3 Day 0 support H200#1728

[Klaud Cold][NVIDIA] feat: MiniMax M3 Day 0 support H200#1728
functionstackx wants to merge 3 commits into
mainfrom
feat/minimax-m3-h200-dayzero

functionstackx commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

functionstackx commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Model & image

Recipe details (per recipes.vllm.ai + repo conventions)

Sweep space

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented Jun 12, 2026 •

edited

Loading