Bump vLLM version for DSV4 B300 disagg#1952
Conversation
Updates the DeepSeek-V4-Pro B300 disaggregated Dynamo-vLLM benchmark from vllm/vllm-openai:v0.20.1 to v0.23.0 across all five recipes (low-latency, low-middle-curve, high-tpt-megamoe, mid-curve-megamoe, max-tpt-megamoe) and the nvidia-master.yaml image entry.
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. 感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致 如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢
PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow 一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。 如需更多帮助,PR 作者可通过 Slack 联系核心维护者。 |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. 感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致 如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢
PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow 一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。 如需更多帮助,PR 作者可通过 Slack 联系核心维护者。 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28380082309 |
| - dsv4-fp4-b300-dynamo-vllm | ||
| description: | ||
| - "Update the DeepSeek-V4-Pro B300 disaggregated Dynamo-vLLM benchmark to the vllm/vllm-openai:v0.23.0 image" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1899 |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry for dsv4-fp4-b300-dynamo-vllm (lines 4306-4310) sets pr-link to PR #1899, but that is the B200 PR being mirrored — the link should point to this PR (#1952). Every adjacent entry in the file links to the PR that introduced it (#1921/#1939/#1941/#1942 on lines 4282/4290/4297/4304), so this looks like a copy-paste artifact from the description ("Mirrors the B200 image bump from #1899"). Fix by changing the pr-link to https://github.com/SemiAnalysisAI/InferenceX/pull/1952.
Extended reasoning...
What the bug is
The new entry appended to perf-changelog.yaml at lines 4306-4310 records the B300 vLLM image bump from v0.20.1 to v0.23.0 for dsv4-fp4-b300-dynamo-vllm. Its pr-link is set to https://github.com/SemiAnalysisAI/InferenceX/pull/1899, but this PR is #1952 — "Bump vLLM version for DSV4 B300 disagg." PR #1899 is the prior B200 image bump that this PR is mirroring, as the PR description itself states ("Mirrors the B200 image bump from #1899 for the B300 equivalents").
How it manifests / what the impact is
perf-changelog.yaml is a curated record of configuration changes, where each entry's pr-link is meant to point reviewers to the PR that introduced that specific change. The convention is unambiguous across the surrounding entries:
- line 4282 → [NV]Add Qwen3.5-397B-A17B-NVFP4 GB300 disagg multinode SGLang via Dynamo #1921
- line 4290 → [codex] add MiniMax M3 FP4 MI355X vLLM MTP benchmark #1939
- line 4297 → [codex] update MiniMax M3 FP8 MI355X vLLM MTP image #1941
- line 4304 → [codex] update MiniMax M3 FP8 MI355X vLLM image #1942
Each of these links to the PR that actually added the entry. The new B300 entry breaks the convention: a future reader tracing the B300 image bump via the changelog would be redirected to PR #1899 (the B200 bump for a different image and recipes), not the PR that performed the B300 bump. This defeats the entire point of the pr-link field for this entry.
Why existing code doesn't prevent it
perf-changelog.yaml is plain YAML metadata — there is no schema validation or automated check that the pr-link value matches the PR adding the entry. The mistake therefore slips through anything short of human review, which is exactly what this comment is for.
Step-by-step proof
- PR metadata shows this is PR Bump vLLM version for DSV4 B300 disagg #1952, title "Bump vLLM version for DSV4 B300 disagg."
- The PR description explicitly says: "Mirrors the B200 image bump from Bump vLLM version for DSV4 B200 disagg #1899 for the B300 equivalents."
- The diff against
perf-changelog.yamlappends one new entry whosepr-linkfield ishttps://github.com/SemiAnalysisAI/InferenceX/pull/1899. - The four immediately preceding entries (still visible in the same file, lines ~4279-4304) each link to their own introducing PR ([NV]Add Qwen3.5-397B-A17B-NVFP4 GB300 disagg multinode SGLang via Dynamo #1921, [codex] add MiniMax M3 FP4 MI355X vLLM MTP benchmark #1939, [codex] update MiniMax M3 FP8 MI355X vLLM MTP image #1941, [codex] update MiniMax M3 FP8 MI355X vLLM image #1942) — establishing the convention.
- Conclusion: the value
#1899is a copy-paste from the description's source PR, not the PR introducing this entry. It should be#1952.
How to fix
Change the last line of the new entry from
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1899to
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1952No runtime behavior is affected — this is purely a changelog-traceability correction, but worth catching before merge while the fix is trivial.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28380189316 |
2 similar comments
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28380189316 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28380189316 |
|
As a PR reviewer and CODEOWNER, I have reviewed this and have:
Additional detail section:
Signed: |
|
@Ankur-singh Not ready to merge: Check 3 fails — the sign-off provides no recipe link.
|
|
/reuse-sweep-run |
Bumps the DeepSeek-V4-Pro B300 disaggregated Dynamo-vLLM recipes from
vllm/vllm-openai:v0.20.1tov0.23.0.Mirrors the B200 image bump from #1899 for the B300 equivalents. Updates
model.container,identity.container.image, andidentity.frameworks.vllmacross all five B300 recipes:disagg-b300-low-latency.yamldisagg-b300-low-middle-curve.yamldisagg-b300-high-tpt-megamoe.yamldisagg-b300-mid-curve-megamoe.yamldisagg-b300-max-tpt-megamoe.yamlAlso updates the
dsv4-fp4-b300-dynamo-vllmimage innvidia-master.yaml.The
max-num-batched-tokensandgpu-memory-utilizationreductions applied to B200 in #1899 are not mirrored here — B300 recipes already carry more conservative values (0.85/0.8 vs B200's pre-bump 0.95) reflecting the different hardware.