[OOT Plugin] Fix qwen3.5 fp8 functionality and accuracy issue by ganyi1996ppo · Pull Request #448 · ROCm/ATOM

ganyi1996ppo · 2026-03-30T07:07:26Z

Motivation

This PR fix both fp8 and bf16 qwen3.5 moe model

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

This PR updates the ATOM vLLM plugin path to better handle Qwen3.5 quantized checkpoints by remapping quantization layer patterns/exclusions using model-specific weight-name mappings, and extends Qwen3.5 MoE packed-module remapping to support fused BF16 checkpoints.

Changes:

Remap QuantizationConfig layer patterns / exclude_layers in vLLM plugin mode using model-provided packed_modules_mapping and hf_to_atom_mapper.
Add gate_up_proj fused→split mapping for Qwen3.5 MoE packed weights (BF16 fused format).
Extend QuantizationConfig.remap_layer_name() to optionally rewrite exclude_layers via a weights mapper.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`atom/plugin/vllm/model_wrapper.py`	Calls quant-config remapping before constructing the ATOM model in vLLM plugin mode, using model-provided mappings.
`atom/models/qwen3_5.py`	Adds a packed-module mapping entry intended to split fused `gate_up_proj` weights (BF16) into `gate_proj`/`up_proj`.
`atom/config.py`	Adds an optional weights-mapper step to rewrite `exclude_layers` during layer-name remapping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: ganyi <ygan@amd.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

atom/config.py:413

remap_layer_name assigns self.packed_modules_mapping directly to the passed-in packed_modules_mapping dict. In plugin mode this dict is often a class attribute (e.g., model_cls.packed_modules_mapping), and later logic may mutate it (e.g., adding gate_up_proj for some model_types). This can unintentionally mutate shared class-level state across model instances/process lifetime. Consider copying the mapping (e.g., dict(packed_modules_mapping) / .copy()) before storing/mutating it.

        self,
        hf_config: PretrainedConfig,
        packed_modules_mapping: dict | None = None,
        weights_mapper={},
    ):
        model_type = hf_config.model_type
        self.packed_modules_mapping = (
            packed_modules_mapping if packed_modules_mapping is not None else {}
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Need merge this for urgent task, will refine the code later

* fix qwen3.5 fp8 load and accuracy issue Signed-off-by: ganyi <ygan@amd.com> * black Signed-off-by: ganyi <ygan@amd.com> * ci fix Signed-off-by: ganyi <ygan@amd.com> * fix mutimodal Signed-off-by: ganyi <ygan@amd.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * update accuracy threshold for qwen3.5 35B fp8 Signed-off-by: ganyi <ygan@amd.com> --------- Signed-off-by: ganyi <ygan@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…nt SGLang V3.2 Follow-up to the citation audit, two more verified corrections in the plugin sections: - vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the misattributed #532, but Qwen3.5 does have real vLLM-plugin support. Restore it with the correct PRs: #448 (fp8 functionality/accuracy, touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4 nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772 (Qwen3-Next MTP). - SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614, FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…itations in v0.1.3 (#1061) * docs(release-notes): fix misattributed plugin PR citations in v0.1.3 Four citations in the vLLM-ATOM sections referenced PRs that actually belong to SGLang-ATOM or the native ATOM path. Verified each PR title against GitHub before correcting. - Model Support / DeepSeek V4 / R1 FP4: dropped the bullet. #650 is the native DeepSeek V4 triton-MoE path (already cited under ATOM Server) and #614 is a SGLang-ATOM R1 FP4 PR (already cited under SGLang-ATOM); neither supports a vLLM-ATOM V4 / R1 FP4 claim. - Model Support / Qwen3.5 / Qwen3-Next: dropped #532 (it adds Qwen3.5 / Qwen3-Next to SGLang, not vLLM); keep #772 (Qwen3-Next MTP for vLLM). - H&P / vLLM-ATOM: dropped #528 + the "Q/K norm-quant fusion" claim; #528 is the SGLang+ATOM qk-norm fusion PR (already cited under SGLang). - H&P / vLLM-ATOM: dropped #614 from the DeepSeek FP4 validation bullet (SGLang-ATOM PR), leaving the genuine #639 TP8/EP8 case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(release-notes): correct Qwen3.5 vLLM citation and drop nonexistent SGLang V3.2 Follow-up to the citation audit, two more verified corrections in the plugin sections: - vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the misattributed #532, but Qwen3.5 does have real vLLM-plugin support. Restore it with the correct PRs: #448 (fp8 functionality/accuracy, touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4 nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772 (Qwen3-Next MTP). - SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614, FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(release-notes): fix 3 more cross-section PR misattributions Verified each PR's changed files to confirm which engine path it belongs to: - vLLM-ATOM Engine Core: drop #793 + "handles scalar KV scales". #793 only touches atom/model_ops/{attention_mha,base_attention}.py (native, no plugin files) and is already cited correctly in the native section. - vLLM-ATOM H&P: drop the DeepSeek FP4 TP8/EP8 bullet and move #639 to SGLang-ATOM H&P. #639 only touches sglang_benchmark_models.json and atom-sglang-benchmark.yaml -> it is a SGLang benchmark PR, not vLLM. - vLLM-ATOM H&P: drop "V4 DP benchmark coverage (#949)". #949 touches the native benchmark (.github/benchmark/models.json, atom-benchmark.yaml), already cited under ATOM Server; it is not a plugin PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 30, 2026 07:07

Copilot started reviewing on behalf of ganyi1996ppo March 30, 2026 07:09 View session

ganyi1996ppo added 2 commits March 30, 2026 07:09

fix qwen3.5 fp8 load and accuracy issue

0886bed

Signed-off-by: ganyi <ygan@amd.com>

black

a86f928

Signed-off-by: ganyi <ygan@amd.com>

wuhuikx reviewed Mar 30, 2026

View reviewed changes

Comment thread atom/models/qwen3_5.py Outdated

ci fix

f7143da

Signed-off-by: ganyi <ygan@amd.com>

ganyi1996ppo force-pushed the ganyi/fix_qwen3.5 branch from 1906428 to f7143da Compare March 30, 2026 07:12

Copilot AI reviewed Mar 30, 2026

View reviewed changes

Comment thread atom/models/qwen3_5.py

Comment thread atom/config.py

Comment thread atom/config.py

Comment thread atom/plugin/vllm/model_wrapper.py Outdated

ganyi1996ppo and others added 2 commits March 30, 2026 07:24

fix mutimodal

322cdd7

Signed-off-by: ganyi <ygan@amd.com>

Apply suggestions from code review

9f6689a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 30, 2026 07:33

Copilot started reviewing on behalf of ganyi1996ppo March 30, 2026 07:34 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

Comment thread atom/plugin/vllm/model_wrapper.py

update accuracy threshold for qwen3.5 35B fp8

4e77c86

Signed-off-by: ganyi <ygan@amd.com>

valarLip previously approved these changes Mar 30, 2026

View reviewed changes

thpereir previously requested changes Mar 30, 2026

View reviewed changes

Comment thread atom/plugin/vllm/model_wrapper.py

Comment thread atom/config.py

thpereir mentioned this pull request Mar 30, 2026

[QUARK-403] Add MiniMax-2.1 support #237

Merged

1 task

Merge branch 'main' into ganyi/fix_qwen3.5

34b8475

Copilot AI review requested due to automatic review settings March 31, 2026 09:39

ganyi1996ppo dismissed valarLip’s stale review via 34b8475 March 31, 2026 09:40

Copilot started reviewing on behalf of ganyi1996ppo March 31, 2026 09:41 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Comment thread atom/models/qwen3_5.py

Comment thread atom/models/qwen3_5.py

Comment thread .github/workflows/atom-vllm-oot-test.yaml

Merge branch 'main' into ganyi/fix_qwen3.5

5281017

wuhuikx approved these changes Apr 1, 2026

View reviewed changes

ganyi1996ppo merged commit 3bb7684 into main Apr 1, 2026
38 of 42 checks passed

ganyi1996ppo deleted the ganyi/fix_qwen3.5 branch April 1, 2026 07:40

zufayu mentioned this pull request May 8, 2026

[Bug]: Qwen3.5-35B-A3B / 27B BF16 accuracy regression at TP4 / TP8 #719

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OOT Plugin] Fix qwen3.5 fp8 functionality and accuracy issue#448

[OOT Plugin] Fix qwen3.5 fp8 functionality and accuracy issue#448
ganyi1996ppo merged 8 commits into
mainfrom
ganyi/fix_qwen3.5

ganyi1996ppo commented Mar 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

ganyi1996ppo commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ganyi1996ppo commented Mar 30, 2026 •

edited

Loading