Skip to content

[NV] Update MiniMax M3 B200/B300 MTP settings#1784

Merged
functionstackx merged 3 commits into
mainfrom
nv/jasonli/minimaxm3-b200-b300-mtp-serving-settings
Jun 16, 2026
Merged

[NV] Update MiniMax M3 B200/B300 MTP settings#1784
functionstackx merged 3 commits into
mainfrom
nv/jasonli/minimaxm3-b200-b300-mtp-serving-settings

Conversation

@jasonlizhengjian

@jasonlizhengjian jasonlizhengjian commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Updates MiniMax M3 B200/B300 EAGLE3 MTP recipes with the serving settings and TP4+EP4 coverage used for the non-MTP recipes.

Validation:

  • bash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b200_mtp.sh
  • bash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300_mtp.sh
  • git diff --check

Note: local matrix generation was not run because pydantic is not installed in this environment.


Note

Low Risk
Benchmark config and shell script tuning only; no application logic, auth, or data paths.

Overview
Aligns MiniMax-M3 FP8 EAGLE3 MTP B200/B300 benchmark recipes with the non-MTP serving setup and broadens the fixed-seq-len matrix in nvidia-master.yaml.

The B200 and B300 MTP runner scripts now set VLLM_FLOAT32_MATMUL_PRECISION=high and use a fixed --max-cudagraph-capture-size 2048 instead of computing capture size from concurrency and speculative token count.

For minimaxm3-fp8-b200-vllm-mtp and minimaxm3-fp8-b300-vllm-mtp, the search space adds TP4+EP4 rows: DP-attention sweeps on 1k1k (conc 128–512) and 8k1k (conc 64–128), plus the missing non–DP-attention TP4+EP4 row for 8k1k (conc 64–256). perf-changelog.yaml documents these changes.

Reviewed by Cursor Bugbot for commit 52d37c0. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@jasonlizhengjian jasonlizhengjian marked this pull request as ready for review June 15, 2026 19:52
@jasonlizhengjian jasonlizhengjian requested a review from a team June 15, 2026 19:52
@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Claude finished @jasonlizhengjian's task in 1m 44s —— View job


PR Review: Update MiniMax M3 B200/B300 MTP settings

  • Gather context and read changed files
  • Validate master config + perf-changelog
  • Validate perf-changelog entry position
  • Check benchmark script code style and MTP requirements
  • Post review findings

LGTM — no blocking issues found.

Both B200 and B300 MTP scripts now align with the non-MTP B200 serving settings (VLLM_FLOAT32_MATMUL_PRECISION=high, hardcoded --max-cudagraph-capture-size 2048), and the new TP4+EP4 config rows mirror the coverage pattern from the STP recipes. All required validations pass: --use-chat-template is present, expert parallelism is conditional on EP_SIZE, server commands use multi-line formatting, and the perf-changelog entry is correctly appended to the end.

@jasonlizhengjian jasonlizhengjian changed the title [WIP][NV] Update MiniMax M3 B200/B300 MTP settings [NV] Update MiniMax M3 B200/B300 MTP settings Jun 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx

Copy link
Copy Markdown
Collaborator

/reuse-sweep-run

@functionstackx functionstackx merged commit e4fcda4 into main Jun 16, 2026
3 checks passed
@functionstackx functionstackx deleted the nv/jasonli/minimaxm3-b200-b300-mtp-serving-settings branch June 16, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants