Qwen3Next MTP for vLLM plugin mode by ganyi1996ppo · Pull Request #772 · ROCm/ATOM

ganyi1996ppo · 2026-05-13T08:26:01Z

Motivation

server script:

export VLLM_TORCH_PROFILER_DIR=./vllm_profile
export ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=1
export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export HIP_VISIBLE_DEVICES=0,1,2,3
export ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=0
export ATOM_USE_CUSTOM_ALL_GATHER=0
export ATOM_DISABLE_VLLM_PLUGIN=0
MODEL=/mnt/data/pretrained_model/Qwen/Qwen3-Next-80B-A3B-Instruct-FP8


vllm serve $MODEL\
  --port 8200 \
  --no-enable-prefix-caching \
  --tensor-parallel-size 1 \
  --gpu_memory_utilization 0.8 \
  --max-num-batched-tokens 32768 \
  --kv-cache-dtype fp8 \
  --compilation-config '{ "cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile", "torch_profiler_with_stack": "True"}' \
  --speculative-config '{"num_speculative_tokens":1, "method": "mtp"}'\

verify script

MODEL_ID=/mnt/data/pretrained_model/Qwen/Qwen3-Next-80B-A3B-Instruct-FP8
lm_eval \
  --model local-completions \
  --model_args model=$MODEL_ID,base_url=http://localhost:8200/v1/completions,num_concurrent=256,max_retries=10,timeout=3000,seed=1234,max_gen_toks=2048,temperature=0,tokenized_requests=False,trust_remote_code=True \
  --batch_size auto \
  --tasks gsm8k \
  --num_fewshot 5 \

result

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9212|±  |0.0074|
|     |       |strict-match    |     5|exact_match|↑  |0.9143|±  |0.0077|

SpecDecoding metrics: Mean acceptance length: 1.91, Accepted throughput: 371.69 tokens/s, Drafted throughput: 410.38 tokens/s, Accepted: 3717 tokens, Drafted: 4104 tokens, Per-position acceptance rate: 0.906, Avg Draft acceptance rate: 90.6%

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR adds support for running Qwen3Next MTP (multi-token prediction / EAGLE-style speculative decoding) under vLLM plugin mode, including draft-model construction, KV-cache indexing fixes, and attention/metadata handling for multi-token verification.

Changes:

Register Qwen3NextMTP for vLLM plugin mode and add model-class routing to the ATOM vLLM wrapper.
Teach the vLLM wrapper to detect draft-model construction, load draft weights correctly (spec_decode=True), and swap the global atom_config during forward() to keep layer lookups consistent across target/draft alternation.
Update plugin attention metadata + paged attention implementations to correctly handle multi-token decode layouts used by MTP/EAGLE.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`atom/plugin/vllm/register.py`	Registers `Qwen3NextMTP` architecture override for vLLM plugin mode.
`atom/plugin/vllm/model_wrapper.py`	Detects draft vs target, routes draft architecture, swaps/restores global `atom_config` for forwards, and passes `spec_decode` into weight loading.
`atom/plugin/vllm/attention_backend/attention_gdn.py`	Fixes GDN attention output writeback for speculative decode and adjusts imports/code paths.
`atom/plugin/attention.py`	Adjusts attention metadata builder thresholds/logic for MTP/EAGLE multi-token verification and async spec-decode metadata.
`atom/plugin/attention_mha.py`	Updates paged-attention decode kernels and buffer sizing to support MTP multi-token decode layout; fixes extend block-table slicing.
`atom/models/qwen3_next.py`	Adds explicit `layer_num` for attention KV slot isolation in MTP, fixes speculative_config fallback for vLLM, and exposes `embed_tokens` for sharing.
`atom/models/qwen3_next_mtp.py`	Implements Qwen3Next MTP draft model with correct layer indexing, quant prefixing, and expert mapping for shared-expert fusion.
`atom/model_loader/loader.py`	Plumbs `spec_decode` through plugin-mode loading so draft models can load `mtp.*` weights and apply MTP remapping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

atom/plugin/vllm/model_wrapper.py:421

The draft-model detection in load_weights checks for "Qwen3NextMTP", but other parts of the repo still use the architecture key "Qwen3NextMTPModel" for Qwen3-Next MTP. If the draft model’s HF config reports "Qwen3NextMTPModel", spec_decode-specific loading (hf_config_override / weight filtering) won’t activate. Please align this set with the actual HF architecture string used for the draft model.

        is_mtp_draft_model = self.model_arch in {
            "DeepSeekMTPModel",
            "Qwen3NextMTP",
        }

Signed-off-by: ganyi <ygan@amd.com>

zejunchen-zejun · 2026-05-14T11:20:48Z

Hi @ganyi1996ppo
Could you help add qwen3next MTP into atom-vllm nightly and benchmark workflow, so that the acc and perf can be tracked when merged?

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

atom/plugin/vllm/model_wrapper.py:196

_expose_spec_decode_attrs() is now only executed when model_arch in _MTP_MASK_INPUT_ARCH (currently only DeepSeekMTPModel). The new Qwen3NextMTP model has the same extra .model nesting and does not expose embed_tokens/layers on the outer module, so vLLM speculative decoding weight/embedding sharing is likely to fail. Suggest calling _expose_spec_decode_attrs() for all MTP draft models that wrap an inner .model (and keep _adapt_mtp_layers_for_vllm() gated separately if it’s DeepSeek-specific), or add Qwen3NextMTP to the relevant allowlist.

        logger.info(f"Construct ATOM model {model_arch} for vLLM plugin mode")
        self.model = model_cls(self.atom_config)

        if model_arch in _MTP_MASK_INPUT_ARCH:
            self._adapt_mtp_layers_for_vllm()
            # Mirror nested attributes required by vLLM speculative decoding.
            self._expose_spec_decode_attrs()

atom/plugin/vllm/model_wrapper.py:422

Draft-model detection only checks self.model_arch against { "DeepSeekMTPModel", "Qwen3NextMTP" }. If the HF draft config still reports Qwen3NextMTPModel (as referenced elsewhere in the repo), this branch won’t treat it as spec-decode, and hf_config_override won’t be applied. Consider accepting both Qwen3NextMTP and Qwen3NextMTPModel here (and in _ATOM_MODEL_CLASSES) so both draft-arch spellings work.

        is_mtp_draft_model = self.model_arch in {
            "DeepSeekMTPModel",
            "Qwen3NextMTP",
        }

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

recipes/atom_vllm/Qwen3.5.md:137

The "Key Environment Variables" list no longer includes ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=1, but the earlier text still refers to three required variables. Please ensure this section stays consistent with the intended required/optional env var set for Qwen3.5.


## Key Environment Variables

- `ATOM_USE_CUSTOM_ALL_GATHER=0`: **Required** - disables custom all-gather for compatibility with Qwen3.5 model architecture
- `AITER_QUICK_REDUCE_QUANTIZATION=INT4`: **Performance optimization** - enables INT4 quantization for quick reduce operations
  - **Benefit**: Significantly improves TTFT (Time To First Token) performance by reducing communication overhead during tensor parallelism all-reduce operations

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

atom/plugin/vllm/model_wrapper.py:432

atom.config.SpeculativeConfig does not expose draft_model_config, so draft_model_config = getattr(self.atom_config.speculative_config, "draft_model_config", None) will always be None and hf_config_override will not be applied for MTP draft-model weight loading. This can cause the draft model to load with the target model's HF config. Use self.atom_config.speculative_config.draft_model_hf_config (or fall back to self.vllm_config.speculative_config.draft_model_config.hf_config) when building draft_hf_config.

        is_mtp_draft_model = self.model_arch in {
            "DeepSeekMTPModel",
            "Qwen3NextMTP",
        }
        draft_hf_config = None
        if is_mtp_draft_model:
            draft_model_config = getattr(
                getattr(self.atom_config, "speculative_config", None),
                "draft_model_config",
                None,
            )
            if draft_model_config is not None:
                draft_hf_config = getattr(
                    draft_model_config, "hf_config", draft_model_config
                )

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

atom/plugin/vllm/model_wrapper.py:433

atom.config.SpeculativeConfig does not have a draft_model_config attribute (it exposes draft_model_hf_config). As written, draft_hf_config will always stay None, so load_model_in_plugin_mode(..., hf_config_override=...) won’t apply the draft HF-config overrides needed for MTP (e.g., architecture rewrite / expert backfill). Please fetch the draft config from self.atom_config.speculative_config.draft_model_hf_config (or keep reading it from self.vllm_config.speculative_config.draft_model_config.hf_config).

            draft_model_config = getattr(
                getattr(self.atom_config, "speculative_config", None),
                "draft_model_config",
                None,
            )
            if draft_model_config is not None:
                draft_hf_config = getattr(
                    draft_model_config, "hf_config", draft_model_config
                )

Signed-off-by: ganyi <ygan@amd.com>

Add ATOM_FP8_BLOCKSCALE_WEIGHT_PRESHUFFLE=0 to all Qwen3.5 and Qwen3-Next model configs across benchmark, nightly accuracy, and recipe files. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

atom/plugin/vllm/model_wrapper.py:433

In load_weights, draft MTP hf_config override is fetched via self.atom_config.speculative_config.draft_model_config, but ATOM's SpeculativeConfig stores the draft config as draft_model_hf_config (see atom/config.py). As written, draft_hf_config will stay None, so the draft model will be loaded using the target hf_config, which can break MTP weight name/architecture overrides.

Update this to read the draft config from self.atom_config.speculative_config.draft_model_hf_config (or fall back to vLLM's vllm_config.speculative_config.draft_model_config.hf_config), and pass that object as hf_config_override.

        is_mtp_draft_model = self.model_arch in {
            "DeepSeekMTPModel",
            "Qwen3NextMTP",
        }
        draft_hf_config = None
        if is_mtp_draft_model:
            draft_model_config = getattr(
                getattr(self.atom_config, "speculative_config", None),
                "draft_model_config",
                None,
            )
            if draft_model_config is not None:
                draft_hf_config = getattr(
                    draft_model_config, "hf_config", draft_model_config
                )

 **Important**: The following three environment variables are required for Qwen3.5:

- `ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=1`: Disables ATOM attention plugin to use vLLM's implementation for full attention layers (required because Qwen3.5 uses a hybrid architecture with both linear attention (GatedDeltaNet) and full attention layers)
 - `ATOM_USE_CUSTOM_ALL_GATHER=0`: Disables custom all-gather for compatibility with Qwen3.5 model architecture
 - `AITER_QUICK_REDUCE_QUANTIZATION=INT4`: **Performance optimization** - enables INT4 quantization for quick reduce operations, which can significantly improve TTFT (Time To First Token) performance. **Note**: This optimization may introduce a risk of accuracy degradation. For accuracy-critical workloads, consider validating with your specific use case.



Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Remove stale "three" count (now variable list), add ATOM_FP8_BLOCKSCALE_WEIGHT_PRESHUFFLE=0 to both the Important section and Key Environment Variables section. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

        is_mtp_draft_model = self.model_arch in {
            "DeepSeekMTPModel",
-            "Qwen3NextMTPModel",
+            "Qwen3NextMTP",
        }
        draft_hf_config = None
        if is_mtp_draft_model:
            draft_model_config = getattr(
-                getattr(self.vllm_config, "speculative_config", None),
+                getattr(self.atom_config, "speculative_config", None),
                "draft_model_config",
                None,
            )


+def _build_atom_speculative_config_from_vllm(vllm_spec_config: Any):
+    """Translate vLLM's SpeculativeConfig into ATOM's SpeculativeConfig.
+
+    Reuses vLLM's already-loaded draft hf_config (skips a second disk fetch
+    in ATOM SpeculativeConfig.__post_init__) but still runs ATOM's
+    hf_config_override on it — so MTP model_type remap, n_routed_experts
+    backfill (Qwen families), and architecture rewrite all land on the
+    draft config in one place. Mirrors how standalone ATOM MTP exposes
+    the draft hf_config via atom_config.speculative_config.
+
+    The draft hf_config is deepcopied first because hf_config_override
+    mutates `architectures` to ATOM's standalone naming (e.g.
+    "Qwen3NextMTPModel"), which differs from vLLM's registry name
+    ("Qwen3NextMTP"). Mutating in place would make vLLM's later draft
+    architecture lookup fail.
+    """
+    if vllm_spec_config is None:
+        return None
+
+    from atom.config import SpeculativeConfig
+
+    draft_model_config = getattr(vllm_spec_config, "draft_model_config", None)
+    draft_hf_config = getattr(draft_model_config, "hf_config", None)
+    if draft_hf_config is not None:
+        draft_hf_config = copy.deepcopy(draft_hf_config)
+    model_path = getattr(draft_model_config, "model", None) or getattr(
+        vllm_spec_config, "model", None
+    )
+
+    return SpeculativeConfig(
+        method=getattr(vllm_spec_config, "method", "") or "",
+        model=model_path,
+        num_speculative_tokens=getattr(
+            vllm_spec_config, "num_speculative_tokens", None
+        ),
+        draft_model_hf_config=draft_hf_config,
+    )
+


* mtp 1 acc right Signed-off-by: ganyi <ygan@amd.com> * add recipe for qwen3-next-mtp Signed-off-by: ganyi <ygan@amd.com> * modify some qwen3.5 recipe Signed-off-by: ganyi <ygan@amd.com> * black Signed-off-by: ganyi <ygan@amd.com> * remove redundant code Signed-off-by: ganyi <ygan@amd.com> * remove redundant code Signed-off-by: ganyi <ygan@amd.com> * add spec decode convert for vllm plugin Signed-off-by: ganyi <ygan@amd.com> * remove vllm related branch Signed-off-by: ganyi <ygan@amd.com> * use atom spec decode config for plugin loading Signed-off-by: ganyi <ygan@amd.com> * remove unnecessary changes in modeling Signed-off-by: ganyi <ygan@amd.com> * format Signed-off-by: ganyi <ygan@amd.com> * add qwen3next mtp into benchmark Signed-off-by: ganyi <ygan@amd.com> * [ci] disable FP8 blockscale weight preshuffle for Qwen3.5/Qwen3-Next Add ATOM_FP8_BLOCKSCALE_WEIGHT_PRESHUFFLE=0 to all Qwen3.5 and Qwen3-Next model configs across benchmark, nightly accuracy, and recipe files. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> * [ci] fix Qwen3-Next MTP benchmark label from MET to AW Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> * [docs] fix Qwen3.5 recipe: update env var count and add preshuffle doc Remove stale "three" count (now variable list), add ATOM_FP8_BLOCKSCALE_WEIGHT_PRESHUFFLE=0 to both the Important section and Key Environment Variables section. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> --------- Signed-off-by: ganyi <ygan@amd.com> Co-authored-by: zejunchen-zejun <zejun.chen@amd.com> Co-authored-by: Claude Opus 4 <noreply@anthropic.com>

…nt SGLang V3.2 Follow-up to the citation audit, two more verified corrections in the plugin sections: - vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the misattributed #532, but Qwen3.5 does have real vLLM-plugin support. Restore it with the correct PRs: #448 (fp8 functionality/accuracy, touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4 nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772 (Qwen3-Next MTP). - SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614, FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…itations in v0.1.3 (#1061) * docs(release-notes): fix misattributed plugin PR citations in v0.1.3 Four citations in the vLLM-ATOM sections referenced PRs that actually belong to SGLang-ATOM or the native ATOM path. Verified each PR title against GitHub before correcting. - Model Support / DeepSeek V4 / R1 FP4: dropped the bullet. #650 is the native DeepSeek V4 triton-MoE path (already cited under ATOM Server) and #614 is a SGLang-ATOM R1 FP4 PR (already cited under SGLang-ATOM); neither supports a vLLM-ATOM V4 / R1 FP4 claim. - Model Support / Qwen3.5 / Qwen3-Next: dropped #532 (it adds Qwen3.5 / Qwen3-Next to SGLang, not vLLM); keep #772 (Qwen3-Next MTP for vLLM). - H&P / vLLM-ATOM: dropped #528 + the "Q/K norm-quant fusion" claim; #528 is the SGLang+ATOM qk-norm fusion PR (already cited under SGLang). - H&P / vLLM-ATOM: dropped #614 from the DeepSeek FP4 validation bullet (SGLang-ATOM PR), leaving the genuine #639 TP8/EP8 case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(release-notes): correct Qwen3.5 vLLM citation and drop nonexistent SGLang V3.2 Follow-up to the citation audit, two more verified corrections in the plugin sections: - vLLM-ATOM / Qwen3.5: the prior pass dropped Qwen3.5 along with the misattributed #532, but Qwen3.5 does have real vLLM-plugin support. Restore it with the correct PRs: #448 (fp8 functionality/accuracy, touches atom/plugin/vllm/model_wrapper.py) and #593 (Qwen3.5 FP4 nightly + benchmark, recipes/atom_vllm/Qwen3.5.md), keeping #772 (Qwen3-Next MTP). - SGLang-ATOM: dropped "V3.2" from the DeepSeek model list. No SGLang DeepSeek V3.2 PR landed in v0.1.2..v0.1.3 (V3 MTP=#643, R1 FP4=#614, FP4 MTP=#834/#846); the cited PRs only cover V3 and R1 FP4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(release-notes): fix 3 more cross-section PR misattributions Verified each PR's changed files to confirm which engine path it belongs to: - vLLM-ATOM Engine Core: drop #793 + "handles scalar KV scales". #793 only touches atom/model_ops/{attention_mha,base_attention}.py (native, no plugin files) and is already cited correctly in the native section. - vLLM-ATOM H&P: drop the DeepSeek FP4 TP8/EP8 bullet and move #639 to SGLang-ATOM H&P. #639 only touches sglang_benchmark_models.json and atom-sglang-benchmark.yaml -> it is a SGLang benchmark PR, not vLLM. - vLLM-ATOM H&P: drop "V4 DP benchmark coverage (#949)". #949 touches the native benchmark (.github/benchmark/models.json, atom-benchmark.yaml), already cited under ATOM Server; it is not a plugin PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 13, 2026 08:26

Copilot started reviewing on behalf of ganyi1996ppo May 13, 2026 08:27 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 13, 2026 09:25

Copilot started reviewing on behalf of ganyi1996ppo May 13, 2026 09:26 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread atom/models/qwen3_next.py Outdated

mtp 1 acc right

3af7ccb

Signed-off-by: ganyi <ygan@amd.com>

ganyi1996ppo force-pushed the ganyi/qwen3next_mtp branch from f38481f to 3af7ccb Compare May 14, 2026 08:01

add recipe for qwen3-next-mtp

580f0fd

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings May 14, 2026 08:05

zejunchen-zejun requested a review from whx-sjtu May 14, 2026 08:05

Copilot started reviewing on behalf of ganyi1996ppo May 14, 2026 08:06 View session

modify some qwen3.5 recipe

598be9a

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread atom/plugin/vllm/register.py

Comment thread atom/plugin/vllm/model_wrapper.py

Comment thread atom/plugin/vllm/model_wrapper.py

black

9a6381e

Signed-off-by: ganyi <ygan@amd.com>

remove redundant code

ce84444

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings May 14, 2026 13:55

Copilot started reviewing on behalf of ganyi1996ppo May 14, 2026 13:56 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread atom/plugin/vllm/model_wrapper.py

Comment thread atom/plugin/vllm/register.py

ganyi1996ppo added 2 commits May 14, 2026 14:16

remove redundant code

8f83eb7

Signed-off-by: ganyi <ygan@amd.com>

add spec decode convert for vllm plugin

8b53857

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings May 14, 2026 14:57

Copilot started reviewing on behalf of ganyi1996ppo May 14, 2026 14:59 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread atom/plugin/vllm/model_wrapper.py

Comment thread recipes/atom_vllm/Qwen3.5.md Outdated

ganyi1996ppo added 2 commits May 14, 2026 15:09

remove vllm related branch

885c329

Signed-off-by: ganyi <ygan@amd.com>

use atom spec decode config for plugin loading

a706309

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings May 14, 2026 15:22

Copilot started reviewing on behalf of ganyi1996ppo May 14, 2026 15:23 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread atom/plugin/vllm/model_wrapper.py

Comment thread recipes/atom_vllm/Qwen3.5.md

Comment thread atom/plugin/config.py

ganyi1996ppo added 2 commits May 15, 2026 01:59

remove unnecessary changes in modeling

4dca135

Signed-off-by: ganyi <ygan@amd.com>

format

809e931

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings May 15, 2026 02:00

Copilot started reviewing on behalf of ganyi1996ppo May 15, 2026 02:02 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread atom/plugin/vllm/model_wrapper.py

Comment thread atom/plugin/config.py

add qwen3next mtp into benchmark

d104a46

Signed-off-by: ganyi <ygan@amd.com>

valarLip previously approved these changes May 15, 2026

View reviewed changes

zejunchen-zejun previously approved these changes May 15, 2026

View reviewed changes

[ci] disable FP8 blockscale weight preshuffle for Qwen3.5/Qwen3-Next

e262d7e

Add ATOM_FP8_BLOCKSCALE_WEIGHT_PRESHUFFLE=0 to all Qwen3.5 and Qwen3-Next model configs across benchmark, nightly accuracy, and recipe files. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 15, 2026 06:01

zejunchen-zejun dismissed stale reviews from valarLip and themself via e262d7e May 15, 2026 06:01

Copilot started reviewing on behalf of zejunchen-zejun May 15, 2026 06:03 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

zejunchen-zejun and others added 2 commits May 15, 2026 14:42

[ci] fix Qwen3-Next MTP benchmark label from MET to AW

f3d17e7

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

[docs] fix Qwen3.5 recipe: update env var count and add preshuffle doc

9b9a77c

Remove stale "three" count (now variable list), add ATOM_FP8_BLOCKSCALE_WEIGHT_PRESHUFFLE=0 to both the Important section and Key Environment Variables section. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 15, 2026 06:47

Copilot started reviewing on behalf of zejunchen-zejun May 15, 2026 06:49 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

zejunchen-zejun merged commit cc3539e into main May 15, 2026
27 of 33 checks passed

zejunchen-zejun deleted the ganyi/qwen3next_mtp branch May 15, 2026 07:32

zejunchen-zejun mentioned this pull request Jun 4, 2026

[to hattie branch] docs(release-notes): fix misattributed plugin PR citations in v0.1.3 #1061

Merged

Uh oh!

Conversation

ganyi1996ppo commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zejunchen-zejun commented May 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ganyi1996ppo commented May 13, 2026 •

edited

Loading