[NV] Update MiniMax M3 B200/B300 MTP settings#1784
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Claude finished @jasonlizhengjian's task in 1m 44s —— View job PR Review: Update MiniMax M3 B200/B300 MTP settings
LGTM — no blocking issues found. Both B200 and B300 MTP scripts now align with the non-MTP B200 serving settings ( |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27572276756 |
|
/reuse-sweep-run |
Updates MiniMax M3 B200/B300 EAGLE3 MTP recipes with the serving settings and TP4+EP4 coverage used for the non-MTP recipes.
Validation:
bash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b200_mtp.shbash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300_mtp.shgit diff --checkNote: local matrix generation was not run because
pydanticis not installed in this environment.Note
Low Risk
Benchmark config and shell script tuning only; no application logic, auth, or data paths.
Overview
Aligns MiniMax-M3 FP8 EAGLE3 MTP B200/B300 benchmark recipes with the non-MTP serving setup and broadens the fixed-seq-len matrix in
nvidia-master.yaml.The B200 and B300 MTP runner scripts now set
VLLM_FLOAT32_MATMUL_PRECISION=highand use a fixed--max-cudagraph-capture-size 2048instead of computing capture size from concurrency and speculative token count.For
minimaxm3-fp8-b200-vllm-mtpandminimaxm3-fp8-b300-vllm-mtp, the search space adds TP4+EP4 rows: DP-attention sweeps on 1k1k (conc 128–512) and 8k1k (conc 64–128), plus the missing non–DP-attention TP4+EP4 row for 8k1k (conc 64–256).perf-changelog.yamldocuments these changes.Reviewed by Cursor Bugbot for commit 52d37c0. Bugbot is set up for automated code reviews on this repo. Configure here.