Skip to content

Add Qwen3.5 FP4 to vLLM-ATOM nightly accuracy check and benchmark#593

Merged
zejunchen-zejun merged 5 commits into
mainfrom
hattie/add_qwen3.5_fp4
Apr 17, 2026
Merged

Add Qwen3.5 FP4 to vLLM-ATOM nightly accuracy check and benchmark#593
zejunchen-zejun merged 5 commits into
mainfrom
hattie/add_qwen3.5_fp4

Conversation

@wuhuikx

@wuhuikx wuhuikx commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

Add Qwen3.5 FP4 for vLLM-ATOM, including:

  1. nightly accuracy test
  2. performance benchmark candidate list
  3. update the recipe

Comment thread .github/benchmark/oot_benchmark_models.json
Comment thread .github/benchmark/oot_models_accuracy.json Outdated
Comment thread .github/benchmark/oot_models_accuracy.json Outdated
@wuhuikx wuhuikx marked this pull request as draft April 17, 2026 04:48
@wuhuikx wuhuikx marked this pull request as ready for review April 17, 2026 09:01
@zejunchen-zejun zejunchen-zejun merged commit 173f3ee into main Apr 17, 2026
33 of 44 checks passed
@zejunchen-zejun zejunchen-zejun deleted the hattie/add_qwen3.5_fp4 branch April 17, 2026 10:07
zejunchen-zejun added a commit that referenced this pull request Jun 4, 2026
…nt SGLang V3.2

Follow-up to the citation audit, two more verified corrections in the
plugin sections:

- vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the
  misattributed #532, but Qwen3.5 does have real vLLM-plugin support.
  Restore it with the correct PRs: #448 (fp8 functionality/accuracy,
  touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4
  nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772
  (Qwen3-Next MTP).
- SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang
  DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614,
  FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
zejunchen-zejun added a commit that referenced this pull request Jun 4, 2026
…itations in v0.1.3 (#1061)

* docs(release-notes): fix misattributed plugin PR citations in v0.1.3

Four citations in the vLLM-ATOM sections referenced PRs that actually
belong to SGLang-ATOM or the native ATOM path. Verified each PR title
against GitHub before correcting.

- Model Support / DeepSeek V4 / R1 FP4: dropped the bullet. #650 is the
  native DeepSeek V4 triton-MoE path (already cited under ATOM Server)
  and #614 is a SGLang-ATOM R1 FP4 PR (already cited under SGLang-ATOM);
  neither supports a vLLM-ATOM V4 / R1 FP4 claim.
- Model Support / Qwen3.5 / Qwen3-Next: dropped #532 (it adds Qwen3.5 /
  Qwen3-Next to SGLang, not vLLM); keep #772 (Qwen3-Next MTP for vLLM).
- H&P / vLLM-ATOM: dropped #528 + the "Q/K norm-quant fusion" claim;
  #528 is the SGLang+ATOM qk-norm fusion PR (already cited under SGLang).
- H&P / vLLM-ATOM: dropped #614 from the DeepSeek FP4 validation bullet
  (SGLang-ATOM PR), leaving the genuine #639 TP8/EP8 case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(release-notes): correct Qwen3.5 vLLM citation and drop nonexistent SGLang V3.2

Follow-up to the citation audit, two more verified corrections in the
plugin sections:

- vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the
  misattributed #532, but Qwen3.5 does have real vLLM-plugin support.
  Restore it with the correct PRs: #448 (fp8 functionality/accuracy,
  touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4
  nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772
  (Qwen3-Next MTP).
- SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang
  DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614,
  FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(release-notes): fix 3 more cross-section PR misattributions

Verified each PR's changed files to confirm which engine path it belongs
to:

- vLLM-ATOM Engine Core: drop #793 + "handles scalar KV scales". #793
  only touches atom/model_ops/{attention_mha,base_attention}.py (native,
  no plugin files) and is already cited correctly in the native section.
- vLLM-ATOM H&P: drop the DeepSeek FP4 TP8/EP8 bullet and move #639 to
  SGLang-ATOM H&P. #639 only touches sglang_benchmark_models.json and
  atom-sglang-benchmark.yaml -> it is a SGLang benchmark PR, not vLLM.
- vLLM-ATOM H&P: drop "V4 DP benchmark coverage (#949)". #949 touches the
  native benchmark (.github/benchmark/models.json, atom-benchmark.yaml),
  already cited under ATOM Server; it is not a plugin PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants