(ci)(recipe): Add DeepSeek-R1 FP4 TP4 validation and DS recipe for SGLang-ATOM by zhuyuhua-v · Pull Request #614 · ROCm/ATOM

zhuyuhua-v · 2026-04-20T08:59:31Z

Motivation

add DeepSeek-R1-FP4 TP4 coverage to SGLang-ATOM accuracy flows, including nightly/manual validation and dashboard metadata, with a 0.91 GSM8K threshold
align the DeepSeek-R1-FP8 TP4 GSM8K threshold to 0.91 across the ATOM SGLang PR and nightly accuracy workflows to avoid data floating issues.
add recipes/sglang_atom/DeepSeek-R1.md in the same style as the vLLM-ATOM recipe, covering server launch, benchmarking, accuracy validation, and profiling usage
Updates aiter wheel download, align with PR [atom-vllm CI] align the aiter download logic with atom CI #706

ATOM SGLang CI / Nightly / Benchmark Scope

Scope	Workflow	Trigger	Case 数	用途
CI	`.github/workflows/atom-sglang-test.yaml`	PR to `main`，非 draft，非 closed	2	PR SGLang accuracy smoke
Nightly Accuracy	`.github/workflows/atom-sglang-accuracy-validation.yaml`	每天 18:00 UTC / 北京 02:00，或手动触发	4	全量 SGLang GSM8K accuracy validation
Nightly Benchmark	`.github/workflows/atom-sglang-benchmark.yaml`	每天 17:00 UTC / 北京 01:00，或手动触发	nightly: 5 × 10 = 50	SGLang serving performance benchmark

Shared Accuracy Parameters

Item	Value
SGLang ref	`v0.5.10`
Task	`gsm8k`
Metric checked	`results.gsm8k["exact_match,flexible-extract"]`
Few-shot	`3`
LM Eval concurrency	`65`
Server args	`--trust-remote-code --kv-cache-dtype fp8_e4m3 --mem-fraction-static 0.8 --page-size 1 --disable-radix-cache`
Common env	`SGLANG_AITER_FP8_PREFILL_ATTN=0`, `SGLANG_USE_AITER=1`, `ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1`

CI Cases

Model	Weight	Runner	TP	Extra Args	Env Vars	Threshold
DeepSeek-R1-FP8 TP4	`deepseek-ai/DeepSeek-R1-0528`	`linux-atom-mi35x-4`	4	`--tensor-parallel-size 4`	`AITER_QUICK_REDUCE_QUANTIZATION=INT4`; common env	`0.91`
DeepSeek-R1-FP4 TP4	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`linux-atom-mi35x-4`	4	`--tensor-parallel-size 4`	`AITER_QUICK_REDUCE_QUANTIZATION=INT4`; common env	`0.91`

Nightly Accuracy Cases

Model	Weight	Runner	TP	Extra Args	Threshold
DeepSeek-R1-FP8 TP4	`deepseek-ai/DeepSeek-R1-0528`	`linux-atom-mi35x-4`	4	`--tensor-parallel-size 4`	`0.91`
DeepSeek-R1-FP8 TP8	`deepseek-ai/DeepSeek-R1-0528`	`linux-atom-mi35x-8`	8	`--tensor-parallel-size 8`	`0.93`
DeepSeek-R1-FP4 TP4	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`linux-atom-mi35x-4`	4	`--tensor-parallel-size 4`	`0.91`
DeepSeek-R1-FP4 TP8	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`linux-atom-mi35x-8`	8	`--tensor-parallel-size 8`	`0.93`

Benchmark Schedule

当前 benchmark workflow 支持两种模式：

Mode	Model Selection	Param Selection	Dashboard
Scheduled nightly	自动选择全部 5 个 SGLang benchmark models	默认 10 组参数	默认 publish
Manual dispatch	通过 checkbox 选择模型	`param_lists` 输入，默认 10 组参数	`publish_to_dashboard` 控制，默认 true

Schedule:

Cron: 0 17 * * *
Beijing time: 每晚 01:00

Benchmark Parameters

Default param sets:

ISL	OSL	Concurrency	Random Range Ratio
1024	1024	4, 8, 16, 32, 64	0.8
8192	1024	4, 8, 16, 32, 64	0.8

Benchmark command:

backend: sglang
dataset: random
num-prompts = concurrency * 10
num-warmups = 2 * concurrency
request-rate=inf
metrics: ttft,tpot,itl,e2el

Benchmark Models

Model	Weight	Serve Args	Runner
DeepSeek-R1-0528 FP8 TP8	`deepseek-ai/DeepSeek-R1-0528`	`--trust-remote-code --tensor-parallel-size 8`	`atom-mi355-8gpu-oot-benchmark`
DeepSeek-R1-0528 FP8 TP4	`deepseek-ai/DeepSeek-R1-0528`	`--trust-remote-code --tensor-parallel-size 4`	`atom-mi355-8gpu-oot-benchmark`
DeepSeek-R1-0528-MXFP4 FP4 TP8	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`--trust-remote-code --tensor-parallel-size 8`	`atom-mi355-8gpu-oot-benchmark`
DeepSeek-R1-0528-MXFP4 FP4 TP4	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`--trust-remote-code --tensor-parallel-size 4`	`atom-mi355-8gpu-oot-benchmark`
DeepSeek-R1-0528-MXFP4 FP4 TP8 EP8	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`--trust-remote-code --tensor-parallel-size 8 --expert-parallel-size 8`	`atom-mi355-8gpu-oot-benchmark`

valarLip · 2026-04-23T03:12:43Z

still wip?

Copilot

Pull request overview

Adds DeepSeek-R1 FP4 (MXFP4 weights) TP4 accuracy coverage to the ATOM SGLang CI/validation flows and documents how to run/benchmark/validate DeepSeek-R1 using the SGLang-ATOM backend.

Changes:

Add DeepSeek-R1 FP4 TP4 (MXFP4 checkpoint) to PR CI accuracy matrix and to nightly/manual accuracy validation matrix.
Align DeepSeek-R1 FP8 TP4 GSM8K accuracy threshold from 0.92 to 0.91 across workflows and dashboard model metadata.
Add an SGLang-ATOM DeepSeek-R1 recipe covering server launch, benchmarking, profiling, and GSM8K validation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`recipes/sglang_atom/DeepSeek-R1.md`	New SGLang-ATOM DeepSeek-R1 recipe (launch, benchmark, profiling, lm-eval).
`.github/workflows/atom-sglang-test.yaml`	Updates PR CI accuracy threshold and adds DeepSeek-R1 FP4 TP4 to the matrix.
`.github/workflows/atom-sglang-accuracy-validation.yaml`	Adds manual toggle + nightly coverage for DeepSeek-R1 FP4 TP4; aligns FP8 TP4 threshold.
`.github/benchmark/sglang_models_accuracy.json`	Adds/updates dashboard metadata for the two DeepSeek-R1 TP4 accuracy entries (thresholds, baseline fields).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…Lang-ATOM Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

zhuyuhua-v · 2026-05-11T08:49:50Z

still wip?

fixed in #747

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

…nt SGLang V3.2 Follow-up to the citation audit, two more verified corrections in the plugin sections: - vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the misattributed #532, but Qwen3.5 does have real vLLM-plugin support. Restore it with the correct PRs: #448 (fp8 functionality/accuracy, touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4 nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772 (Qwen3-Next MTP). - SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614, FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…itations in v0.1.3 (#1061) * docs(release-notes): fix misattributed plugin PR citations in v0.1.3 Four citations in the vLLM-ATOM sections referenced PRs that actually belong to SGLang-ATOM or the native ATOM path. Verified each PR title against GitHub before correcting. - Model Support / DeepSeek V4 / R1 FP4: dropped the bullet. #650 is the native DeepSeek V4 triton-MoE path (already cited under ATOM Server) and #614 is a SGLang-ATOM R1 FP4 PR (already cited under SGLang-ATOM); neither supports a vLLM-ATOM V4 / R1 FP4 claim. - Model Support / Qwen3.5 / Qwen3-Next: dropped #532 (it adds Qwen3.5 / Qwen3-Next to SGLang, not vLLM); keep #772 (Qwen3-Next MTP for vLLM). - H&P / vLLM-ATOM: dropped #528 + the "Q/K norm-quant fusion" claim; #528 is the SGLang+ATOM qk-norm fusion PR (already cited under SGLang). - H&P / vLLM-ATOM: dropped #614 from the DeepSeek FP4 validation bullet (SGLang-ATOM PR), leaving the genuine #639 TP8/EP8 case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(release-notes): correct Qwen3.5 vLLM citation and drop nonexistent SGLang V3.2 Follow-up to the citation audit, two more verified corrections in the plugin sections: - vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the misattributed #532, but Qwen3.5 does have real vLLM-plugin support. Restore it with the correct PRs: #448 (fp8 functionality/accuracy, touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4 nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772 (Qwen3-Next MTP). - SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614, FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(release-notes): fix 3 more cross-section PR misattributions Verified each PR's changed files to confirm which engine path it belongs to: - vLLM-ATOM Engine Core: drop #793 + "handles scalar KV scales". #793 only touches atom/model_ops/{attention_mha,base_attention}.py (native, no plugin files) and is already cited correctly in the native section. - vLLM-ATOM H&P: drop the DeepSeek FP4 TP8/EP8 bullet and move #639 to SGLang-ATOM H&P. #639 only touches sglang_benchmark_models.json and atom-sglang-benchmark.yaml -> it is a SGLang benchmark PR, not vLLM. - vLLM-ATOM H&P: drop "V4 DP benchmark coverage (#949)". #949 touches the native benchmark (.github/benchmark/models.json, atom-benchmark.yaml), already cited under ATOM Server; it is not a plugin PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ZLkanyo009 marked this pull request as ready for review April 21, 2026 07:50

zhuyuhua-v requested review from ZLkanyo009, ZhiweiYan-96, qichu-yun, wuhuikx and zejunchen-zejun April 21, 2026 07:55

qichu-yun previously approved these changes Apr 21, 2026

View reviewed changes

wuhuikx previously approved these changes Apr 22, 2026

View reviewed changes

valarLip previously approved these changes Apr 23, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 23, 2026 06:21

Copilot started reviewing on behalf of zhuyuhua-v April 23, 2026 06:23 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread .github/benchmark/sglang_models_accuracy.json

Comment thread recipes/atom_sglang/DeepSeek-R1.md Outdated

Comment thread recipes/atom_sglang/DeepSeek-R1.md

Comment thread .github/workflows/atom-sglang-test.yaml

Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml

zhuyuhua-v dismissed stale reviews from wuhuikx, valarLip, and qichu-yun via 91f30ab April 23, 2026 09:18

zhuyuhua-v marked this pull request as draft April 24, 2026 05:24

zhuyuhua-v marked this pull request as ready for review April 24, 2026 05:26

Copilot AI review requested due to automatic review settings April 24, 2026 05:26

Copilot started reviewing on behalf of zhuyuhua-v April 24, 2026 05:27 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread .github/workflows/atom-sglang-test.yaml Outdated

Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml

Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml

Comment thread .github/benchmark/sglang_models_accuracy.json

zhuyuhua-v marked this pull request as draft April 30, 2026 06:37

(ci)(recipe): Add DeepSeek-R1 FP4 TP4 validation and DS recipe for SG…

1696e64

…Lang-ATOM Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

zhuyuhua-v force-pushed the yuhua/sgl-dsrecipe-fp4ci branch from f5d5175 to 1696e64 Compare May 11, 2026 06:31

zhuyuhua-v added 3 commits May 11, 2026 06:36

update ci threshold

9a3e4c3

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

update aiter whl download flow

76d6b9f

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Merge branch 'main' into yuhua/sgl-dsrecipe-fp4ci

488cced

zhuyuhua-v marked this pull request as ready for review May 11, 2026 07:13

Copilot AI review requested due to automatic review settings May 11, 2026 07:13

Copilot started reviewing on behalf of zhuyuhua-v May 11, 2026 07:15 View session

zhuyuhua-v requested a review from Yuechguo May 11, 2026 07:16

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml

Comment thread .github/benchmark/sglang_models_accuracy.json

zhuyuhua-v marked this pull request as draft May 11, 2026 07:42

zhuyuhua-v added 2 commits May 11, 2026 07:44

remove int flag from ci cases

3fc1b07

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

update recipe

a43916e

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

zhuyuhua-v marked this pull request as ready for review May 11, 2026 08:42

Copilot AI review requested due to automatic review settings May 11, 2026 08:42

Copilot started reviewing on behalf of zhuyuhua-v May 11, 2026 08:43 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread .github/workflows/atom-sglang-test.yaml

Comment thread .github/benchmark/sglang_benchmark_models.json

zhuyuhua-v mentioned this pull request May 11, 2026

[fix][acc][sgl-atom] fix accuracy of fp8 attn weights model using ptpc quant recipe #747

Merged

1 task

zhuyuhua-v marked this pull request as draft May 11, 2026 08:54

Merge branch 'main' into yuhua/sgl-dsrecipe-fp4ci

0913886

zhuyuhua-v marked this pull request as ready for review May 11, 2026 09:00

Copilot AI review requested due to automatic review settings May 11, 2026 09:00

Copilot started reviewing on behalf of zhuyuhua-v May 11, 2026 09:01 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

adjust threshold for fp4 tp4 to 0.91

ae99d0f

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Copilot AI review requested due to automatic review settings May 12, 2026 09:00

zhuyuhua-v force-pushed the yuhua/sgl-dsrecipe-fp4ci branch from 476b5dd to ae99d0f Compare May 12, 2026 09:00

Copilot started reviewing on behalf of zhuyuhua-v May 12, 2026 09:02 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

zhuyuhua-v requested review from valarLip and wuhuikx May 12, 2026 14:16

valarLip approved these changes May 12, 2026

View reviewed changes

valarLip merged commit c615b35 into main May 12, 2026
53 of 58 checks passed

valarLip deleted the yuhua/sgl-dsrecipe-fp4ci branch May 12, 2026 14:18

zejunchen-zejun mentioned this pull request Jun 4, 2026

[to hattie branch] docs(release-notes): fix misattributed plugin PR citations in v0.1.3 #1061

Merged

Uh oh!

Conversation

zhuyuhua-v commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

ATOM SGLang CI / Nightly / Benchmark Scope

Shared Accuracy Parameters

CI Cases

Nightly Accuracy Cases

Benchmark Schedule

Benchmark Parameters

Benchmark Models

Uh oh!

valarLip commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

zhuyuhua-v commented May 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zhuyuhua-v commented Apr 20, 2026 •

edited

Loading