[WIP][NV] Use Marlin for MiniMax M3 TP-only configs by jasonlizhengjian · Pull Request #1807 · SemiAnalysisAI/InferenceX

jasonlizhengjian · 2026-06-16T15:40:43Z

Stacked on #1781 and #1784.

Adds --moe-backend marlin for MiniMax-M3 B200/B300 TP-only vLLM launch paths when expert parallelism is disabled.

Validation:

bash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b200.sh benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300.sh benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b200_mtp.sh benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300_mtp.sh
git diff --check
PyYAML parse for perf-changelog.yaml and .github/configs/nvidia-master.yaml

Note

Low Risk
Benchmark-only vLLM serve flags for a specific MoE backend when expert parallelism is off; no auth, data, or core application logic changes.

Overview
For MiniMax-M3 MXFP8 B200/B300 vLLM fixed-sequence recipes, the TP-only launch path (DP_ATTENTION false and EP_SIZE 1) now passes --moe-backend marlin alongside --tensor-parallel-size=$TP. DP-attention and expert-parallel branches are unchanged.

The same PARALLEL_ARGS tweak is applied to the non-MTP and EAGLE3 MTP scripts (minimaxm3_fp8_b200.sh, minimaxm3_fp8_b300.sh, and their _mtp siblings). perf-changelog.yaml records the change for minimaxm3-fp8-b200-vllm, minimaxm3-fp8-b300-vllm, and the matching MTP config keys.

^{Reviewed by Cursor Bugbot for commit 32c5fe0. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-16T15:41:05Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-06-16T15:43:24Z

Claude finished @jasonlizhengjian's task in 59s —— View job

Review

Gather context and read changed files
Validate benchmark scripts
Validate perf-changelog.yaml
Post review summary

LGTM — no blocking issues found.

All four scripts consistently add --moe-backend marlin only in the TP-only else branch (when EP_SIZE is not > 1 and DP_ATTENTION is not true). Expert parallelism gating follows the correct if [ "$EP_SIZE" -gt 1 ] pattern. MTP scripts include --use-chat-template. Perf-changelog entry is correctly appended at the end of the file with proper config-keys and PR link.

jasonlizhengjian · 2026-06-16T15:50:32Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys minimaxm3-fp8-b200-vllm minimaxm3-fp8-b300-vllm minimaxm3-fp8-b200-vllm-mtp minimaxm3-fp8-b300-vllm-mtp

github-actions · 2026-06-16T15:50:54Z

@jasonlizhengjian Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27630154690
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys minimaxm3-fp8-b200-vllm minimaxm3-fp8-b300-vllm minimaxm3-fp8-b200-vllm-mtp minimaxm3-fp8-b300-vllm-mtp
Pinned ref: 02f6f47
Approval: not required (trusted collaborator).

Use Marlin for MiniMax M3 TP-only configs

161adc0

github-project-automation Bot added this to InferenceMAX Board Jun 16, 2026

Update MiniMax M3 Marlin changelog link

02f6f47

jasonlizhengjian marked this pull request as ready for review June 16, 2026 15:42

jasonlizhengjian added the full-sweep-fail-fast label Jun 16, 2026

Update MiniMax M3 Marlin changelog link

32c5fe0

jasonlizhengjian closed this Jun 16, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][NV] Use Marlin for MiniMax M3 TP-only configs#1807

[WIP][NV] Use Marlin for MiniMax M3 TP-only configs#1807
jasonlizhengjian wants to merge 3 commits into
nv/jasonli/minimaxm3-stack-base-1781-1784from
nv/jasonli/minimaxm3-marlin-tp-only

jasonlizhengjian commented Jun 16, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

claude Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

jasonlizhengjian commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jasonlizhengjian commented Jun 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

claude Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Uh oh!

jasonlizhengjian commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jasonlizhengjian commented Jun 16, 2026 •

edited by cursor Bot

Loading

claude Bot commented Jun 16, 2026 •

edited

Loading