Skip to content

[Klaud Cold] MI300X MiniMax-M3 nightly image and FP8 KV cache#1837

Merged
cquil11 merged 2 commits into
mainfrom
codex/minimaxm3-mi300x-fp8-kvcache
Jun 19, 2026
Merged

[Klaud Cold] MI300X MiniMax-M3 nightly image and FP8 KV cache#1837
cquil11 merged 2 commits into
mainfrom
codex/minimaxm3-mi300x-fp8-kvcache

Conversation

@cquil11

@cquil11 cquil11 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What changed

  • pin minimaxm3-fp8-mi300x-vllm to vllm/vllm-openai-rocm:nightly-b53b1c7ffe7aebdafd0876350f30e51d1226c92a
  • launch the MI300X MiniMax-M3 server with --kv-cache-dtype fp8
  • exclude unprovisioned chi-mi300x-121 from Slurm allocation
  • append an MI300X-only performance changelog entry

Why

Use the updated ROCm vLLM build and reduce KV-cache memory use for the MI300X MiniMax-M3 sweep. The node exclusion prevents Pyxis failures on a node missing required Enroot/RAID provisioning.

Validation

  • Bash syntax validation
  • targeted MI300X MiniMax-M3 full-sweep config generation

Note

Medium Risk
FP8 KV cache shifts memory/accuracy behavior for a recipe that previously documented BF16 KV to avoid bad FP8 attention scales; the nightly image pin also changes the serving stack for benchmark comparability.

Overview
Updates the MI300X MiniMax-M3 MXFP8 vLLM benchmark to a pinned ROCm nightly (vllm/vllm-openai-rocm:nightly-b53b1c7…) and turns on --kv-cache-dtype fp8 in minimaxm3_fp8_mi300x.sh, replacing the prior BF16 KV default that avoided uncalibrated ROCm FP8 attention scales.

Slurm allocation for MI300X now also excludes chi-mi300x-121 (missing Enroot/RAID provisioning), alongside the existing bad-node list. Config comments and perf-changelog.yaml document the image, KV, and node changes for minimaxm3-fp8-mi300x-vllm.

Reviewed by Cursor Bugbot for commit 771e633. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@cquil11 cquil11 force-pushed the codex/minimaxm3-mi300x-fp8-kvcache branch from e8da035 to 37812d6 Compare June 18, 2026 22:34

cquil11 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

/reuse-sweep-run

@cquil11 cquil11 force-pushed the codex/minimaxm3-mi300x-fp8-kvcache branch from 37812d6 to 5ab2002 Compare June 18, 2026 22:47
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@cquil11

cquil11 commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

/reuse-sweep-run

@cquil11 cquil11 merged commit 0a60172 into main Jun 19, 2026
26 checks passed
@cquil11 cquil11 deleted the codex/minimaxm3-mi300x-fp8-kvcache branch June 19, 2026 00:29
cquil11 added a commit that referenced this pull request Jun 19, 2026
ZhengGong-amd added a commit to ZhengGong-amd/InferenceX that referenced this pull request Jun 26, 2026
Sync the branch with the latest upstream main (fork main force-synced to
upstream). Resolve the perf-changelog.yaml conflict by taking main's version
and re-appending the branch's own minimaxm3-fp8-mi300x-vllm AITER entry at the
tail. The AITER target benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi300x.sh
auto-merged cleanly (main's SemiAnalysisAI#1837 image/FP8-KV change was reverted by SemiAnalysisAI#1857, so
main's net change to that file is zero); the AITER env exports are preserved.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant