Skip to content

[WIP][NV] Use Marlin for MiniMax M3 TP-only configs#1807

Closed
jasonlizhengjian wants to merge 3 commits into
nv/jasonli/minimaxm3-stack-base-1781-1784from
nv/jasonli/minimaxm3-marlin-tp-only
Closed

[WIP][NV] Use Marlin for MiniMax M3 TP-only configs#1807
jasonlizhengjian wants to merge 3 commits into
nv/jasonli/minimaxm3-stack-base-1781-1784from
nv/jasonli/minimaxm3-marlin-tp-only

Conversation

@jasonlizhengjian

@jasonlizhengjian jasonlizhengjian commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #1781 and #1784.

Adds --moe-backend marlin for MiniMax-M3 B200/B300 TP-only vLLM launch paths when expert parallelism is disabled.

Validation:

  • bash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b200.sh benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300.sh benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b200_mtp.sh benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300_mtp.sh
  • git diff --check
  • PyYAML parse for perf-changelog.yaml and .github/configs/nvidia-master.yaml

Note

Low Risk
Benchmark-only vLLM serve flags for a specific MoE backend when expert parallelism is off; no auth, data, or core application logic changes.

Overview
For MiniMax-M3 MXFP8 B200/B300 vLLM fixed-sequence recipes, the TP-only launch path (DP_ATTENTION false and EP_SIZE 1) now passes --moe-backend marlin alongside --tensor-parallel-size=$TP. DP-attention and expert-parallel branches are unchanged.

The same PARALLEL_ARGS tweak is applied to the non-MTP and EAGLE3 MTP scripts (minimaxm3_fp8_b200.sh, minimaxm3_fp8_b300.sh, and their _mtp siblings). perf-changelog.yaml records the change for minimaxm3-fp8-b200-vllm, minimaxm3-fp8-b300-vllm, and the matching MTP config keys.

Reviewed by Cursor Bugbot for commit 32c5fe0. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@jasonlizhengjian jasonlizhengjian marked this pull request as ready for review June 16, 2026 15:42
@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Claude finished @jasonlizhengjian's task in 59s —— View job


Review

  • Gather context and read changed files
  • Validate benchmark scripts
  • Validate perf-changelog.yaml
  • Post review summary

LGTM — no blocking issues found.

All four scripts consistently add --moe-backend marlin only in the TP-only else branch (when EP_SIZE is not > 1 and DP_ATTENTION is not true). Expert parallelism gating follows the correct if [ "$EP_SIZE" -gt 1 ] pattern. MTP scripts include --use-chat-template. Perf-changelog entry is correctly appended at the end of the file with proper config-keys and PR link.

Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys minimaxm3-fp8-b200-vllm minimaxm3-fp8-b300-vllm minimaxm3-fp8-b200-vllm-mtp minimaxm3-fp8-b300-vllm-mtp

@github-actions

Copy link
Copy Markdown
Contributor

@jasonlizhengjian Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27630154690
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys minimaxm3-fp8-b200-vllm minimaxm3-fp8-b300-vllm minimaxm3-fp8-b200-vllm-mtp minimaxm3-fp8-b300-vllm-mtp
Pinned ref: 02f6f47
Approval: not required (trusted collaborator).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant