Skip to content

(ci)(recipe): Add DeepSeek-R1 FP4 TP4 validation and DS recipe for SGLang-ATOM#614

Merged
valarLip merged 8 commits into
mainfrom
yuhua/sgl-dsrecipe-fp4ci
May 12, 2026
Merged

(ci)(recipe): Add DeepSeek-R1 FP4 TP4 validation and DS recipe for SGLang-ATOM#614
valarLip merged 8 commits into
mainfrom
yuhua/sgl-dsrecipe-fp4ci

Conversation

@zhuyuhua-v

@zhuyuhua-v zhuyuhua-v commented Apr 20, 2026

Copy link
Copy Markdown
Collaborator

Motivation

  • add DeepSeek-R1-FP4 TP4 coverage to SGLang-ATOM accuracy flows, including nightly/manual validation and dashboard metadata, with a 0.91 GSM8K threshold
  • align the DeepSeek-R1-FP8 TP4 GSM8K threshold to 0.91 across the ATOM SGLang PR and nightly accuracy workflows to avoid data floating issues.
  • add recipes/sglang_atom/DeepSeek-R1.md in the same style as the vLLM-ATOM recipe, covering server launch, benchmarking, accuracy validation, and profiling usage
  • Updates aiter wheel download, align with PR [atom-vllm CI] align the aiter download logic with atom CI #706

ATOM SGLang CI / Nightly / Benchmark Scope

Scope Workflow Trigger Case 数 用途
CI .github/workflows/atom-sglang-test.yaml PR to main,非 draft,非 closed 2 PR SGLang accuracy smoke
Nightly Accuracy .github/workflows/atom-sglang-accuracy-validation.yaml 每天 18:00 UTC / 北京 02:00,或手动触发 4 全量 SGLang GSM8K accuracy validation
Nightly Benchmark .github/workflows/atom-sglang-benchmark.yaml 每天 17:00 UTC / 北京 01:00,或手动触发 nightly: 5 × 10 = 50 SGLang serving performance benchmark

Shared Accuracy Parameters

Item Value
SGLang ref v0.5.10
Task gsm8k
Metric checked results.gsm8k["exact_match,flexible-extract"]
Few-shot 3
LM Eval concurrency 65
Server args --trust-remote-code --kv-cache-dtype fp8_e4m3 --mem-fraction-static 0.8 --page-size 1 --disable-radix-cache
Common env SGLANG_AITER_FP8_PREFILL_ATTN=0, SGLANG_USE_AITER=1, ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1

CI Cases

Model Weight Runner TP Extra Args Env Vars Threshold
DeepSeek-R1-FP8 TP4 deepseek-ai/DeepSeek-R1-0528 linux-atom-mi35x-4 4 --tensor-parallel-size 4 AITER_QUICK_REDUCE_QUANTIZATION=INT4; common env 0.91
DeepSeek-R1-FP4 TP4 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 linux-atom-mi35x-4 4 --tensor-parallel-size 4 AITER_QUICK_REDUCE_QUANTIZATION=INT4; common env 0.91

Nightly Accuracy Cases

Model Weight Runner TP Extra Args Threshold
DeepSeek-R1-FP8 TP4 deepseek-ai/DeepSeek-R1-0528 linux-atom-mi35x-4 4 --tensor-parallel-size 4 0.91
DeepSeek-R1-FP8 TP8 deepseek-ai/DeepSeek-R1-0528 linux-atom-mi35x-8 8 --tensor-parallel-size 8 0.93
DeepSeek-R1-FP4 TP4 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 linux-atom-mi35x-4 4 --tensor-parallel-size 4 0.91
DeepSeek-R1-FP4 TP8 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 linux-atom-mi35x-8 8 --tensor-parallel-size 8 0.93

Benchmark Schedule

当前 benchmark workflow 支持两种模式:

Mode Model Selection Param Selection Dashboard
Scheduled nightly 自动选择全部 5 个 SGLang benchmark models 默认 10 组参数 默认 publish
Manual dispatch 通过 checkbox 选择模型 param_lists 输入,默认 10 组参数 publish_to_dashboard 控制,默认 true

Schedule:

  • Cron: 0 17 * * *
  • Beijing time: 每晚 01:00

Benchmark Parameters

Default param sets:

ISL OSL Concurrency Random Range Ratio
1024 1024 4, 8, 16, 32, 64 0.8
8192 1024 4, 8, 16, 32, 64 0.8

Benchmark command:

  • backend: sglang
  • dataset: random
  • num-prompts = concurrency * 10
  • num-warmups = 2 * concurrency
  • request-rate=inf
  • metrics: ttft,tpot,itl,e2el

Benchmark Models

Model Weight Serve Args Runner
DeepSeek-R1-0528 FP8 TP8 deepseek-ai/DeepSeek-R1-0528 --trust-remote-code --tensor-parallel-size 8 atom-mi355-8gpu-oot-benchmark
DeepSeek-R1-0528 FP8 TP4 deepseek-ai/DeepSeek-R1-0528 --trust-remote-code --tensor-parallel-size 4 atom-mi355-8gpu-oot-benchmark
DeepSeek-R1-0528-MXFP4 FP4 TP8 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 --trust-remote-code --tensor-parallel-size 8 atom-mi355-8gpu-oot-benchmark
DeepSeek-R1-0528-MXFP4 FP4 TP4 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 --trust-remote-code --tensor-parallel-size 4 atom-mi355-8gpu-oot-benchmark
DeepSeek-R1-0528-MXFP4 FP4 TP8 EP8 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 --trust-remote-code --tensor-parallel-size 8 --expert-parallel-size 8 atom-mi355-8gpu-oot-benchmark

@ZLkanyo009 ZLkanyo009 marked this pull request as ready for review April 21, 2026 07:50
qichu-yun
qichu-yun previously approved these changes Apr 21, 2026
wuhuikx
wuhuikx previously approved these changes Apr 22, 2026
valarLip
valarLip previously approved these changes Apr 23, 2026
@valarLip

Copy link
Copy Markdown
Collaborator
image still wip?

Copilot AI review requested due to automatic review settings April 23, 2026 06:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds DeepSeek-R1 FP4 (MXFP4 weights) TP4 accuracy coverage to the ATOM SGLang CI/validation flows and documents how to run/benchmark/validate DeepSeek-R1 using the SGLang-ATOM backend.

Changes:

  • Add DeepSeek-R1 FP4 TP4 (MXFP4 checkpoint) to PR CI accuracy matrix and to nightly/manual accuracy validation matrix.
  • Align DeepSeek-R1 FP8 TP4 GSM8K accuracy threshold from 0.92 to 0.91 across workflows and dashboard model metadata.
  • Add an SGLang-ATOM DeepSeek-R1 recipe covering server launch, benchmarking, profiling, and GSM8K validation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
recipes/sglang_atom/DeepSeek-R1.md New SGLang-ATOM DeepSeek-R1 recipe (launch, benchmark, profiling, lm-eval).
.github/workflows/atom-sglang-test.yaml Updates PR CI accuracy threshold and adds DeepSeek-R1 FP4 TP4 to the matrix.
.github/workflows/atom-sglang-accuracy-validation.yaml Adds manual toggle + nightly coverage for DeepSeek-R1 FP4 TP4; aligns FP8 TP4 threshold.
.github/benchmark/sglang_models_accuracy.json Adds/updates dashboard metadata for the two DeepSeek-R1 TP4 accuracy entries (thresholds, baseline fields).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/benchmark/sglang_models_accuracy.json
Comment thread recipes/atom_sglang/DeepSeek-R1.md Outdated
Comment thread recipes/atom_sglang/DeepSeek-R1.md
Comment thread .github/workflows/atom-sglang-test.yaml
Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml
@zhuyuhua-v zhuyuhua-v dismissed stale reviews from wuhuikx, valarLip, and qichu-yun via 91f30ab April 23, 2026 09:18
@zhuyuhua-v zhuyuhua-v marked this pull request as draft April 24, 2026 05:24
@zhuyuhua-v zhuyuhua-v marked this pull request as ready for review April 24, 2026 05:26
Copilot AI review requested due to automatic review settings April 24, 2026 05:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/atom-sglang-test.yaml Outdated
Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml
Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml
Comment thread .github/benchmark/sglang_models_accuracy.json
@zhuyuhua-v zhuyuhua-v marked this pull request as draft April 30, 2026 06:37
…Lang-ATOM

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
@zhuyuhua-v zhuyuhua-v force-pushed the yuhua/sgl-dsrecipe-fp4ci branch from f5d5175 to 1696e64 Compare May 11, 2026 06:31
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
@zhuyuhua-v zhuyuhua-v marked this pull request as ready for review May 11, 2026 07:13
Copilot AI review requested due to automatic review settings May 11, 2026 07:13
@zhuyuhua-v zhuyuhua-v requested a review from Yuechguo May 11, 2026 07:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml
Comment thread .github/benchmark/sglang_models_accuracy.json
@zhuyuhua-v zhuyuhua-v marked this pull request as draft May 11, 2026 07:42
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
@zhuyuhua-v zhuyuhua-v marked this pull request as ready for review May 11, 2026 08:42
Copilot AI review requested due to automatic review settings May 11, 2026 08:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread .github/workflows/atom-sglang-test.yaml
Comment thread .github/benchmark/sglang_benchmark_models.json
@zhuyuhua-v

Copy link
Copy Markdown
Collaborator Author

image still wip?

fixed in #747

@zhuyuhua-v zhuyuhua-v marked this pull request as ready for review May 11, 2026 09:00
Copilot AI review requested due to automatic review settings May 11, 2026 09:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Comment thread .github/workflows/atom-sglang-benchmark.yaml
Comment thread .github/benchmark/sglang_benchmark_models.json
Comment thread .github/workflows/atom-sglang-test.yaml
Comment thread .github/workflows/atom-sglang-test.yaml
Comment thread .github/benchmark/sglang_models_accuracy.json
Comment thread recipes/atom_sglang/DeepSeek-R1.md
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Copilot AI review requested due to automatic review settings May 12, 2026 09:00
@zhuyuhua-v zhuyuhua-v force-pushed the yuhua/sgl-dsrecipe-fp4ci branch from 476b5dd to ae99d0f Compare May 12, 2026 09:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Comment thread .github/workflows/atom-sglang-test.yaml
Comment thread .github/workflows/atom-sglang-test.yaml
Comment thread .github/workflows/atom-sglang-test.yaml
Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml
Comment thread .github/workflows/atom-sglang-accuracy-validation.yaml
Comment thread recipes/atom_sglang/DeepSeek-R1.md
Comment thread recipes/atom_sglang/DeepSeek-R1.md
Comment thread .github/benchmark/sglang_models_accuracy.json
Comment thread .github/benchmark/sglang_benchmark_models.json
@zhuyuhua-v zhuyuhua-v requested review from valarLip and wuhuikx May 12, 2026 14:16
@valarLip valarLip merged commit c615b35 into main May 12, 2026
53 of 58 checks passed
@valarLip valarLip deleted the yuhua/sgl-dsrecipe-fp4ci branch May 12, 2026 14:18
zejunchen-zejun added a commit that referenced this pull request Jun 4, 2026
…nt SGLang V3.2

Follow-up to the citation audit, two more verified corrections in the
plugin sections:

- vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the
  misattributed #532, but Qwen3.5 does have real vLLM-plugin support.
  Restore it with the correct PRs: #448 (fp8 functionality/accuracy,
  touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4
  nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772
  (Qwen3-Next MTP).
- SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang
  DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614,
  FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
zejunchen-zejun added a commit that referenced this pull request Jun 4, 2026
…itations in v0.1.3 (#1061)

* docs(release-notes): fix misattributed plugin PR citations in v0.1.3

Four citations in the vLLM-ATOM sections referenced PRs that actually
belong to SGLang-ATOM or the native ATOM path. Verified each PR title
against GitHub before correcting.

- Model Support / DeepSeek V4 / R1 FP4: dropped the bullet. #650 is the
  native DeepSeek V4 triton-MoE path (already cited under ATOM Server)
  and #614 is a SGLang-ATOM R1 FP4 PR (already cited under SGLang-ATOM);
  neither supports a vLLM-ATOM V4 / R1 FP4 claim.
- Model Support / Qwen3.5 / Qwen3-Next: dropped #532 (it adds Qwen3.5 /
  Qwen3-Next to SGLang, not vLLM); keep #772 (Qwen3-Next MTP for vLLM).
- H&P / vLLM-ATOM: dropped #528 + the "Q/K norm-quant fusion" claim;
  #528 is the SGLang+ATOM qk-norm fusion PR (already cited under SGLang).
- H&P / vLLM-ATOM: dropped #614 from the DeepSeek FP4 validation bullet
  (SGLang-ATOM PR), leaving the genuine #639 TP8/EP8 case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(release-notes): correct Qwen3.5 vLLM citation and drop nonexistent SGLang V3.2

Follow-up to the citation audit, two more verified corrections in the
plugin sections:

- vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the
  misattributed #532, but Qwen3.5 does have real vLLM-plugin support.
  Restore it with the correct PRs: #448 (fp8 functionality/accuracy,
  touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4
  nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772
  (Qwen3-Next MTP).
- SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang
  DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614,
  FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(release-notes): fix 3 more cross-section PR misattributions

Verified each PR's changed files to confirm which engine path it belongs
to:

- vLLM-ATOM Engine Core: drop #793 + "handles scalar KV scales". #793
  only touches atom/model_ops/{attention_mha,base_attention}.py (native,
  no plugin files) and is already cited correctly in the native section.
- vLLM-ATOM H&P: drop the DeepSeek FP4 TP8/EP8 bullet and move #639 to
  SGLang-ATOM H&P. #639 only touches sglang_benchmark_models.json and
  atom-sglang-benchmark.yaml -> it is a SGLang benchmark PR, not vLLM.
- vLLM-ATOM H&P: drop "V4 DP benchmark coverage (#949)". #949 touches the
  native benchmark (.github/benchmark/models.json, atom-benchmark.yaml),
  already cited under ATOM Server; it is not a plugin PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants