Skip to content

[NV] Add MiniMax M3 B300 Dynamo vLLM recipes#1787

Closed
jasonlizhengjian wants to merge 7 commits into
mainfrom
nv/jasonli/minimaxm3-fp8-b300-dynamo-vllm
Closed

[NV] Add MiniMax M3 B300 Dynamo vLLM recipes#1787
jasonlizhengjian wants to merge 7 commits into
mainfrom
nv/jasonli/minimaxm3-fp8-b300-dynamo-vllm

Conversation

@jasonlizhengjian

@jasonlizhengjian jasonlizhengjian commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Adds MiniMax M3 MXFP8 B300 disaggregated vLLM benchmarks via Dynamo for 1k1k STP.

Validation:

  • bash -n runners/launch_b300-nv.sh
  • git diff --check
  • PyYAML parse for touched YAML files
  • CONFIG_FILE path consistency check

Note: local matrix generation was not run because pydantic is not installed in this environment.


Note

Low Risk
Benchmark and CI launcher configuration only; no runtime application or auth logic changes.

Overview
Adds MiniMax-M3 MXFP8 disaggregated Dynamo + vLLM coverage on B300 for fixed-seq-len 1k1k and 8k1k STP, registered as minimaxm3-fp8-b300-dynamo-vllm in nvidia-master.yaml with a large prefill/decode search space (worker counts, TP/EP, dp-attn) and per-point CONFIG_FILE recipe paths.

Introduces local srt-slurm recipes under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b300-fp8/ (NixlConnector KV transfer, dep2 prefill, varied decode topologies including Marlin MoE decode for some low-concurrency points). runners/launch_b300-nv.sh now recognizes minimaxm3 + fp8 + dynamo-vllm, sets model paths, clones srt-slurm at sa-submission-q2-2026, and overlays those recipes into recipes/vllm/minimax-m3.

Documents the change in perf-changelog.yaml.

Reviewed by Cursor Bugbot for commit 1ac5daa. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@jasonlizhengjian jasonlizhengjian marked this pull request as ready for review June 16, 2026 17:46
@jasonlizhengjian jasonlizhengjian requested a review from a team June 16, 2026 17:46
stream-interval: 32
max-num-seqs: 4096
max-num-batched-tokens: 16384
max-cudagraph-capture-size: 8196

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong cudagraph capture size

Medium Severity

All six new MiniMax M3 B300 decode blocks set max-cudagraph-capture-size to 8196, while prefill uses 2048 and decode sets max-num-seqs to 4096. That value is not used elsewhere in the repo and sits four above the usual power-of-two 8192 paired with 4096-sequence decode configs, so CUDA graph capture may not align with intended batch sizes.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 54b829a. Configure here.

@jasonlizhengjian jasonlizhengjian changed the title [WIP][NV] Add MiniMax M3 B300 Dynamo vLLM recipes [NV] Add MiniMax M3 B300 Dynamo vLLM recipes Jun 16, 2026
@jasonlizhengjian jasonlizhengjian force-pushed the nv/jasonli/minimaxm3-fp8-b300-dynamo-vllm branch from 54b829a to a2d9824 Compare June 16, 2026 17:59
@github-actions

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8abe295. Configure here.

stream-interval: 32
max-num-seqs: 1024
max-num-batched-tokens: 16384
max-cudagraph-capture-size: 4096

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8k1k omits expert parallel flags

High Severity

The new 8k1k MiniMax-M3 disaggregated recipes use data-parallel decode (and dep2 prefill) without enable-expert-parallel, while the matching 1k1k recipes and MiniMax-M2.5 B300 dep recipes set it for the same MoE layout. That mismatch can prevent correct expert sharding or cause vLLM startup failures on 8k1k benchmark jobs.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8abe295. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants