Skip to content

[OOT Plugin] Fix qwen3.5 fp8 functionality and accuracy issue#448

Merged
ganyi1996ppo merged 8 commits into
mainfrom
ganyi/fix_qwen3.5
Apr 1, 2026
Merged

[OOT Plugin] Fix qwen3.5 fp8 functionality and accuracy issue#448
ganyi1996ppo merged 8 commits into
mainfrom
ganyi/fix_qwen3.5

Conversation

@ganyi1996ppo

@ganyi1996ppo ganyi1996ppo commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

Motivation

This PR fix both fp8 and bf16 qwen3.5 moe model

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings March 30, 2026 07:07
Signed-off-by: ganyi <ygan@amd.com>
Signed-off-by: ganyi <ygan@amd.com>
Comment thread atom/models/qwen3_5.py Outdated
Signed-off-by: ganyi <ygan@amd.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the ATOM vLLM plugin path to better handle Qwen3.5 quantized checkpoints by remapping quantization layer patterns/exclusions using model-specific weight-name mappings, and extends Qwen3.5 MoE packed-module remapping to support fused BF16 checkpoints.

Changes:

  • Remap QuantizationConfig layer patterns / exclude_layers in vLLM plugin mode using model-provided packed_modules_mapping and hf_to_atom_mapper.
  • Add gate_up_proj fused→split mapping for Qwen3.5 MoE packed weights (BF16 fused format).
  • Extend QuantizationConfig.remap_layer_name() to optionally rewrite exclude_layers via a weights mapper.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
atom/plugin/vllm/model_wrapper.py Calls quant-config remapping before constructing the ATOM model in vLLM plugin mode, using model-provided mappings.
atom/models/qwen3_5.py Adds a packed-module mapping entry intended to split fused gate_up_proj weights (BF16) into gate_proj/up_proj.
atom/config.py Adds an optional weights-mapper step to rewrite exclude_layers during layer-name remapping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/models/qwen3_5.py
Comment thread atom/config.py
Comment thread atom/config.py
Comment thread atom/plugin/vllm/model_wrapper.py Outdated
ganyi1996ppo and others added 2 commits March 30, 2026 07:24
Signed-off-by: ganyi <ygan@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 30, 2026 07:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

atom/config.py:413

  • remap_layer_name assigns self.packed_modules_mapping directly to the passed-in packed_modules_mapping dict. In plugin mode this dict is often a class attribute (e.g., model_cls.packed_modules_mapping), and later logic may mutate it (e.g., adding gate_up_proj for some model_types). This can unintentionally mutate shared class-level state across model instances/process lifetime. Consider copying the mapping (e.g., dict(packed_modules_mapping) / .copy()) before storing/mutating it.
        self,
        hf_config: PretrainedConfig,
        packed_modules_mapping: dict | None = None,
        weights_mapper={},
    ):
        model_type = hf_config.model_type
        self.packed_modules_mapping = (
            packed_modules_mapping if packed_modules_mapping is not None else {}
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/plugin/vllm/model_wrapper.py
Signed-off-by: ganyi <ygan@amd.com>
valarLip
valarLip previously approved these changes Mar 30, 2026
Comment thread atom/plugin/vllm/model_wrapper.py
Comment thread atom/config.py
Copilot AI review requested due to automatic review settings March 31, 2026 09:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/models/qwen3_5.py
Comment thread atom/models/qwen3_5.py
Comment thread .github/workflows/atom-vllm-oot-test.yaml
@ganyi1996ppo ganyi1996ppo dismissed thpereir’s stale review April 1, 2026 06:54

Need merge this for urgent task, will refine the code later

@ganyi1996ppo ganyi1996ppo merged commit 3bb7684 into main Apr 1, 2026
38 of 42 checks passed
@ganyi1996ppo ganyi1996ppo deleted the ganyi/fix_qwen3.5 branch April 1, 2026 07:40
Jasen2201 pushed a commit to Jasen2201/ATOM that referenced this pull request Apr 10, 2026
* fix qwen3.5 fp8 load and accuracy issue

Signed-off-by: ganyi <ygan@amd.com>

* black

Signed-off-by: ganyi <ygan@amd.com>

* ci fix

Signed-off-by: ganyi <ygan@amd.com>

* fix mutimodal

Signed-off-by: ganyi <ygan@amd.com>

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update accuracy threshold for qwen3.5 35B fp8

Signed-off-by: ganyi <ygan@amd.com>

---------

Signed-off-by: ganyi <ygan@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
zejunchen-zejun added a commit that referenced this pull request Jun 4, 2026
…nt SGLang V3.2

Follow-up to the citation audit, two more verified corrections in the
plugin sections:

- vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the
  misattributed #532, but Qwen3.5 does have real vLLM-plugin support.
  Restore it with the correct PRs: #448 (fp8 functionality/accuracy,
  touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4
  nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772
  (Qwen3-Next MTP).
- SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang
  DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614,
  FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
zejunchen-zejun added a commit that referenced this pull request Jun 4, 2026
…itations in v0.1.3 (#1061)

* docs(release-notes): fix misattributed plugin PR citations in v0.1.3

Four citations in the vLLM-ATOM sections referenced PRs that actually
belong to SGLang-ATOM or the native ATOM path. Verified each PR title
against GitHub before correcting.

- Model Support / DeepSeek V4 / R1 FP4: dropped the bullet. #650 is the
  native DeepSeek V4 triton-MoE path (already cited under ATOM Server)
  and #614 is a SGLang-ATOM R1 FP4 PR (already cited under SGLang-ATOM);
  neither supports a vLLM-ATOM V4 / R1 FP4 claim.
- Model Support / Qwen3.5 / Qwen3-Next: dropped #532 (it adds Qwen3.5 /
  Qwen3-Next to SGLang, not vLLM); keep #772 (Qwen3-Next MTP for vLLM).
- H&P / vLLM-ATOM: dropped #528 + the "Q/K norm-quant fusion" claim;
  #528 is the SGLang+ATOM qk-norm fusion PR (already cited under SGLang).
- H&P / vLLM-ATOM: dropped #614 from the DeepSeek FP4 validation bullet
  (SGLang-ATOM PR), leaving the genuine #639 TP8/EP8 case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(release-notes): correct Qwen3.5 vLLM citation and drop nonexistent SGLang V3.2

Follow-up to the citation audit, two more verified corrections in the
plugin sections:

- vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the
  misattributed #532, but Qwen3.5 does have real vLLM-plugin support.
  Restore it with the correct PRs: #448 (fp8 functionality/accuracy,
  touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4
  nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772
  (Qwen3-Next MTP).
- SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang
  DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614,
  FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(release-notes): fix 3 more cross-section PR misattributions

Verified each PR's changed files to confirm which engine path it belongs
to:

- vLLM-ATOM Engine Core: drop #793 + "handles scalar KV scales". #793
  only touches atom/model_ops/{attention_mha,base_attention}.py (native,
  no plugin files) and is already cited correctly in the native section.
- vLLM-ATOM H&P: drop the DeepSeek FP4 TP8/EP8 bullet and move #639 to
  SGLang-ATOM H&P. #639 only touches sglang_benchmark_models.json and
  atom-sglang-benchmark.yaml -> it is a SGLang benchmark PR, not vLLM.
- vLLM-ATOM H&P: drop "V4 DP benchmark coverage (#949)". #949 touches the
  native benchmark (.github/benchmark/models.json, atom-benchmark.yaml),
  already cited under ATOM Server; it is not a plugin PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants