[OOT Plugin] Fix qwen3.5 fp8 functionality and accuracy issue#448
Conversation
Signed-off-by: ganyi <ygan@amd.com>
1906428 to
f7143da
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the ATOM vLLM plugin path to better handle Qwen3.5 quantized checkpoints by remapping quantization layer patterns/exclusions using model-specific weight-name mappings, and extends Qwen3.5 MoE packed-module remapping to support fused BF16 checkpoints.
Changes:
- Remap
QuantizationConfiglayer patterns /exclude_layersin vLLM plugin mode using model-providedpacked_modules_mappingandhf_to_atom_mapper. - Add
gate_up_projfused→split mapping for Qwen3.5 MoE packed weights (BF16 fused format). - Extend
QuantizationConfig.remap_layer_name()to optionally rewriteexclude_layersvia a weights mapper.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
atom/plugin/vllm/model_wrapper.py |
Calls quant-config remapping before constructing the ATOM model in vLLM plugin mode, using model-provided mappings. |
atom/models/qwen3_5.py |
Adds a packed-module mapping entry intended to split fused gate_up_proj weights (BF16) into gate_proj/up_proj. |
atom/config.py |
Adds an optional weights-mapper step to rewrite exclude_layers during layer-name remapping. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: ganyi <ygan@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
atom/config.py:413
remap_layer_nameassignsself.packed_modules_mappingdirectly to the passed-inpacked_modules_mappingdict. In plugin mode this dict is often a class attribute (e.g.,model_cls.packed_modules_mapping), and later logic may mutate it (e.g., addinggate_up_projfor some model_types). This can unintentionally mutate shared class-level state across model instances/process lifetime. Consider copying the mapping (e.g.,dict(packed_modules_mapping)/.copy()) before storing/mutating it.
self,
hf_config: PretrainedConfig,
packed_modules_mapping: dict | None = None,
weights_mapper={},
):
model_type = hf_config.model_type
self.packed_modules_mapping = (
packed_modules_mapping if packed_modules_mapping is not None else {}
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: ganyi <ygan@amd.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Need merge this for urgent task, will refine the code later
* fix qwen3.5 fp8 load and accuracy issue Signed-off-by: ganyi <ygan@amd.com> * black Signed-off-by: ganyi <ygan@amd.com> * ci fix Signed-off-by: ganyi <ygan@amd.com> * fix mutimodal Signed-off-by: ganyi <ygan@amd.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * update accuracy threshold for qwen3.5 35B fp8 Signed-off-by: ganyi <ygan@amd.com> --------- Signed-off-by: ganyi <ygan@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…nt SGLang V3.2 Follow-up to the citation audit, two more verified corrections in the plugin sections: - vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the misattributed #532, but Qwen3.5 does have real vLLM-plugin support. Restore it with the correct PRs: #448 (fp8 functionality/accuracy, touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4 nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772 (Qwen3-Next MTP). - SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614, FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…itations in v0.1.3 (#1061) * docs(release-notes): fix misattributed plugin PR citations in v0.1.3 Four citations in the vLLM-ATOM sections referenced PRs that actually belong to SGLang-ATOM or the native ATOM path. Verified each PR title against GitHub before correcting. - Model Support / DeepSeek V4 / R1 FP4: dropped the bullet. #650 is the native DeepSeek V4 triton-MoE path (already cited under ATOM Server) and #614 is a SGLang-ATOM R1 FP4 PR (already cited under SGLang-ATOM); neither supports a vLLM-ATOM V4 / R1 FP4 claim. - Model Support / Qwen3.5 / Qwen3-Next: dropped #532 (it adds Qwen3.5 / Qwen3-Next to SGLang, not vLLM); keep #772 (Qwen3-Next MTP for vLLM). - H&P / vLLM-ATOM: dropped #528 + the "Q/K norm-quant fusion" claim; #528 is the SGLang+ATOM qk-norm fusion PR (already cited under SGLang). - H&P / vLLM-ATOM: dropped #614 from the DeepSeek FP4 validation bullet (SGLang-ATOM PR), leaving the genuine #639 TP8/EP8 case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(release-notes): correct Qwen3.5 vLLM citation and drop nonexistent SGLang V3.2 Follow-up to the citation audit, two more verified corrections in the plugin sections: - vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the misattributed #532, but Qwen3.5 does have real vLLM-plugin support. Restore it with the correct PRs: #448 (fp8 functionality/accuracy, touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4 nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772 (Qwen3-Next MTP). - SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614, FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(release-notes): fix 3 more cross-section PR misattributions Verified each PR's changed files to confirm which engine path it belongs to: - vLLM-ATOM Engine Core: drop #793 + "handles scalar KV scales". #793 only touches atom/model_ops/{attention_mha,base_attention}.py (native, no plugin files) and is already cited correctly in the native section. - vLLM-ATOM H&P: drop the DeepSeek FP4 TP8/EP8 bullet and move #639 to SGLang-ATOM H&P. #639 only touches sglang_benchmark_models.json and atom-sglang-benchmark.yaml -> it is a SGLang benchmark PR, not vLLM. - vLLM-ATOM H&P: drop "V4 DP benchmark coverage (#949)". #949 touches the native benchmark (.github/benchmark/models.json, atom-benchmark.yaml), already cited under ATOM Server; it is not a plugin PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Motivation
This PR fix both fp8 and bf16 qwen3.5 moe model
Technical Details
Test Plan
Test Result
Submission Checklist