[Klaud Cold] MI300X MiniMax-M3 nightly image and FP8 KV cache by cquil11 · Pull Request #1837 · SemiAnalysisAI/InferenceX

cquil11 · 2026-06-18T22:33:44Z

What changed

pin minimaxm3-fp8-mi300x-vllm to vllm/vllm-openai-rocm:nightly-b53b1c7ffe7aebdafd0876350f30e51d1226c92a
launch the MI300X MiniMax-M3 server with --kv-cache-dtype fp8
exclude unprovisioned chi-mi300x-121 from Slurm allocation
append an MI300X-only performance changelog entry

Why

Use the updated ROCm vLLM build and reduce KV-cache memory use for the MI300X MiniMax-M3 sweep. The node exclusion prevents Pyxis failures on a node missing required Enroot/RAID provisioning.

Validation

Bash syntax validation
targeted MI300X MiniMax-M3 full-sweep config generation

Note

Medium Risk
FP8 KV cache shifts memory/accuracy behavior for a recipe that previously documented BF16 KV to avoid bad FP8 attention scales; the nightly image pin also changes the serving stack for benchmark comparability.

Overview
Updates the MI300X MiniMax-M3 MXFP8 vLLM benchmark to a pinned ROCm nightly (vllm/vllm-openai-rocm:nightly-b53b1c7…) and turns on --kv-cache-dtype fp8 in minimaxm3_fp8_mi300x.sh, replacing the prior BF16 KV default that avoided uncalibrated ROCm FP8 attention scales.

Slurm allocation for MI300X now also excludes chi-mi300x-121 (missing Enroot/RAID provisioning), alongside the existing bad-node list. Config comments and perf-changelog.yaml document the image, KV, and node changes for minimaxm3-fp8-mi300x-vllm.

^{Reviewed by Cursor Bugbot for commit 771e633. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-18T22:33:51Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-18T22:33:52Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cquil11 · 2026-06-18T22:34:17Z

/reuse-sweep-run

github-actions · 2026-06-18T22:48:26Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27793464928
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27793464928

github-actions · 2026-06-19T00:22:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27794193147
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27794193147

cquil11 · 2026-06-19T00:28:12Z

/reuse-sweep-run

…#1837)" (#1857) [skip-sweep] This reverts commit 0a60172.

Sync the branch with the latest upstream main (fork main force-synced to upstream). Resolve the perf-changelog.yaml conflict by taking main's version and re-appending the branch's own minimaxm3-fp8-mi300x-vllm AITER entry at the tail. The AITER target benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi300x.sh auto-merged cleanly (main's SemiAnalysisAI#1837 image/FP8-KV change was reverted by SemiAnalysisAI#1857, so main's net change to that file is zero); the AITER env exports are preserved. Co-authored-by: Cursor <cursoragent@cursor.com>

cquil11 requested a review from a team June 18, 2026 22:33

cquil11 requested review from billishyahao, chunfangamd and seungrokj as code owners June 18, 2026 22:33

github-project-automation Bot added this to InferenceMAX Board Jun 18, 2026

cquil11 requested review from 1am9trash and yctseng0211 as code owners June 18, 2026 22:33

cquil11 force-pushed the codex/minimaxm3-mi300x-fp8-kvcache branch from e8da035 to 37812d6 Compare June 18, 2026 22:34

cquil11 added the full-sweep-fail-fast label Jun 18, 2026 — with ChatGPT Codex Connector

cquil11 mentioned this pull request Jun 18, 2026

[codex] update MI300X MiniMax-M3 image and FP8 KV cache #1830

Closed

perf: update MI300X MiniMax-M3 image and FP8 KV cache

5ab2002

cquil11 force-pushed the codex/minimaxm3-mi300x-fp8-kvcache branch from 37812d6 to 5ab2002 Compare June 18, 2026 22:47

cquil11 removed the full-sweep-fail-fast label Jun 18, 2026

cquil11 added the full-sweep-fail-fast label Jun 18, 2026 — with ChatGPT Codex Connector

chore: validate PR #1837 changelog before reuse [skip-sweep]

771e633

cquil11 merged commit 0a60172 into main Jun 19, 2026
26 checks passed

cquil11 deleted the codex/minimaxm3-mi300x-fp8-kvcache branch June 19, 2026 00:29

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 19, 2026

This was referenced Jun 19, 2026

Revert "[Klaud Cold] MI300X MiniMax-M3 nightly image and FP8 KV cache" #1857

Merged

[Klaud Cold] MI300X MiniMax-M3 nightly image and FP8 KV cache #1858

Open

cquil11 added a commit that referenced this pull request Jun 19, 2026

Revert "[Klaud Cold] MI300X MiniMax-M3 nightly image and FP8 KV cache (…

218bf61

…#1837)" (#1857) [skip-sweep] This reverts commit 0a60172.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Klaud Cold] MI300X MiniMax-M3 nightly image and FP8 KV cache#1837

[Klaud Cold] MI300X MiniMax-M3 nightly image and FP8 KV cache#1837
cquil11 merged 2 commits into
mainfrom
codex/minimaxm3-mi300x-fp8-kvcache

cquil11 commented Jun 18, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

cquil11 commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

cquil11 commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cquil11 commented Jun 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Why

Validation

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

cquil11 commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

cquil11 commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cquil11 commented Jun 18, 2026 •

edited by cursor Bot

Loading