Skip to content

Bump vLLM version for DSV4 B300 disagg#1952

Merged
adibarra merged 2 commits into
mainfrom
dsv4-fp4-b300-dynamo-vllm-image-bump
Jun 29, 2026
Merged

Bump vLLM version for DSV4 B300 disagg#1952
adibarra merged 2 commits into
mainfrom
dsv4-fp4-b300-dynamo-vllm-image-bump

Conversation

@xinli-sw

Copy link
Copy Markdown
Collaborator

Bumps the DeepSeek-V4-Pro B300 disaggregated Dynamo-vLLM recipes from vllm/vllm-openai:v0.20.1 to v0.23.0.

Mirrors the B200 image bump from #1899 for the B300 equivalents. Updates model.container, identity.container.image, and identity.frameworks.vllm across all five B300 recipes:

  • disagg-b300-low-latency.yaml
  • disagg-b300-low-middle-curve.yaml
  • disagg-b300-high-tpt-megamoe.yaml
  • disagg-b300-mid-curve-megamoe.yaml
  • disagg-b300-max-tpt-megamoe.yaml

Also updates the dsv4-fp4-b300-dynamo-vllm image in nvidia-master.yaml.

The max-num-batched-tokens and gpu-memory-utilization reductions applied to B200 in #1899 are not mirrored here — B300 recipes already carry more conservative values (0.85/0.8 vs B200's pre-bump 0.95) reflecting the different hardware.

Updates the DeepSeek-V4-Pro B300 disaggregated Dynamo-vLLM benchmark
from vllm/vllm-openai:v0.20.1 to v0.23.0 across all five recipes
(low-latency, low-middle-curve, high-tpt-megamoe, mid-curve-megamoe,
max-tpt-megamoe) and the nvidia-master.yaml image entry.
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

@github-actions

Copy link
Copy Markdown
Contributor

Comment thread perf-changelog.yaml Outdated
- dsv4-fp4-b300-dynamo-vllm
description:
- "Update the DeepSeek-V4-Pro B300 disaggregated Dynamo-vLLM benchmark to the vllm/vllm-openai:v0.23.0 image"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1899

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry for dsv4-fp4-b300-dynamo-vllm (lines 4306-4310) sets pr-link to PR #1899, but that is the B200 PR being mirrored — the link should point to this PR (#1952). Every adjacent entry in the file links to the PR that introduced it (#1921/#1939/#1941/#1942 on lines 4282/4290/4297/4304), so this looks like a copy-paste artifact from the description ("Mirrors the B200 image bump from #1899"). Fix by changing the pr-link to https://github.com/SemiAnalysisAI/InferenceX/pull/1952.

Extended reasoning...

What the bug is

The new entry appended to perf-changelog.yaml at lines 4306-4310 records the B300 vLLM image bump from v0.20.1 to v0.23.0 for dsv4-fp4-b300-dynamo-vllm. Its pr-link is set to https://github.com/SemiAnalysisAI/InferenceX/pull/1899, but this PR is #1952 — "Bump vLLM version for DSV4 B300 disagg." PR #1899 is the prior B200 image bump that this PR is mirroring, as the PR description itself states ("Mirrors the B200 image bump from #1899 for the B300 equivalents").

How it manifests / what the impact is

perf-changelog.yaml is a curated record of configuration changes, where each entry's pr-link is meant to point reviewers to the PR that introduced that specific change. The convention is unambiguous across the surrounding entries:

Each of these links to the PR that actually added the entry. The new B300 entry breaks the convention: a future reader tracing the B300 image bump via the changelog would be redirected to PR #1899 (the B200 bump for a different image and recipes), not the PR that performed the B300 bump. This defeats the entire point of the pr-link field for this entry.

Why existing code doesn't prevent it

perf-changelog.yaml is plain YAML metadata — there is no schema validation or automated check that the pr-link value matches the PR adding the entry. The mistake therefore slips through anything short of human review, which is exactly what this comment is for.

Step-by-step proof

  1. PR metadata shows this is PR Bump vLLM version for DSV4 B300 disagg #1952, title "Bump vLLM version for DSV4 B300 disagg."
  2. The PR description explicitly says: "Mirrors the B200 image bump from Bump vLLM version for DSV4 B200 disagg #1899 for the B300 equivalents."
  3. The diff against perf-changelog.yaml appends one new entry whose pr-link field is https://github.com/SemiAnalysisAI/InferenceX/pull/1899.
  4. The four immediately preceding entries (still visible in the same file, lines ~4279-4304) each link to their own introducing PR ([NV]Add Qwen3.5-397B-A17B-NVFP4 GB300 disagg multinode SGLang via Dynamo #1921, [codex] add MiniMax M3 FP4 MI355X vLLM MTP benchmark #1939, [codex] update MiniMax M3 FP8 MI355X vLLM MTP image #1941, [codex] update MiniMax M3 FP8 MI355X vLLM image #1942) — establishing the convention.
  5. Conclusion: the value #1899 is a copy-paste from the description's source PR, not the PR introducing this entry. It should be #1952.

How to fix

Change the last line of the new entry from

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1899

to

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1952

No runtime behavior is affected — this is purely a changelog-traceability correction, but worth catching before merge while the fix is trivial.

@github-actions

Copy link
Copy Markdown
Contributor

2 similar comments
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@Ankur-singh

Copy link
Copy Markdown
Collaborator

As a PR reviewer and CODEOWNER, I have reviewed this and have:

  • Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
  • Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
  • Verified that this PR has passed PR validation. Please link to GitHub Action workflow that shows this. Link
  • Verified that this PR passes evals. Please link to GitHub Action workflow that shows this. Link
  • Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
  • If an company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.
  • Verified that the single-node recipes are similar to the official vLLM recipes and/or theSGLang cookbook:
    • If they are not, I have verified that a PR has been opened in vLLM recipe repo or SGLang repo and linked it below in the additional detail section:
  • If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

Additional detail section:

  • This is a Dis-agg submission, no recipe updated required.

Signed: ankur-singh

@Klaud-Cold

Copy link
Copy Markdown
Collaborator

@Ankur-singh Not ready to merge: Check 3 fails — the sign-off provides no recipe link.

  • Check 0 — PASS: signer is a named owner of .github/configs/nvidia-master.yaml (@ankur-singh @kedarpotdar-nv @jgangani); the benchmarks/multi_node/** and perf-changelog.yaml paths fall to catch-all * @InferenceX/core, covered.
  • Check 1 — PASS: pinned head 127b5303 has green executed multi-node 8k1k / and multi-node eval / check-runs (single-node jobs correctly skipped; this is a multi-node disagg PR). https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28380189316
  • Check 2 — PASS: gsm8k em_strict 0.964 / 0.968 / 0.964 across the three disagg configs, on the matching vllm/vllm-openai:v0.23.0 image.
  • Check 3 — FAIL: no recipe link in the Additional detail section — only "This is a Dis-agg submission, no recipe update required." A link is required (vllm-project/recipes, sgl-project/sglang cookbook, or the published recipe page); a bare waiver does not satisfy the standard. Major server args are otherwise unchanged (image-tag-only bump v0.20.1 -> v0.23.0), so 3(b) is not the blocker — only the missing link is.

@adibarra

Copy link
Copy Markdown
Collaborator

/reuse-sweep-run

@adibarra adibarra merged commit 36c0f66 into main Jun 29, 2026
89 of 91 checks passed
@adibarra adibarra deleted the dsv4-fp4-b300-dynamo-vllm-image-bump branch June 29, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

5 participants