Add DSV4 B200 TRT MTP benchmark#1294
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
d510ac7 to
a4488d9
Compare
| - "Add DeepSeek-V4-Pro FP4 B200 TensorRT-LLM MTP coverage using ghcr.io/semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-9aa3715" | ||
| - "Mirror the B200 TRT STP search space with spec-decoding: mtp and TensorRT-LLM MTP num_nextn_predict_layers=2" | ||
| - "Benchmark serving uses the DeepSeek-V4 chat template for MTP acceptance-rate correctness" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO |
There was a problem hiding this comment.
🔴 The new dsv4-fp4-b200-trt-mtp entry sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO (perf-changelog.yaml:2282), an unfilled placeholder that 404s. Every other entry in this file uses a real numeric PR id (e.g. /pull/1276, /pull/1291, /pull/1292) — please replace TODO with this PR's number (1294) before merging.
Extended reasoning...
What the bug is
perf-changelog.yaml line 2282 contains:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODOThe literal token TODO is a placeholder that was never replaced. After merge this URL resolves to https://github.com/SemiAnalysisAI/InferenceX/pull/TODO, which is not a valid PR number and returns 404 — breaking the changelog's traceability link back to the PR that introduced this benchmark.
Why it matters / repo convention
Every other pr-link entry in perf-changelog.yaml (lines 2118–2274 — 19 prior entries) uses a real numeric PR id (e.g. /pull/1276, /pull/1291, /pull/1292, /pull/1293). AGENTS.md documents the placeholder convention with XXX that must be filled in before merge. The pattern in this repo is unambiguous: changelog entries always carry a working PR link.
This is also a known recurring foot-gun: the sister PR #1291 ("Add DSV4 B300 TRT MTP benchmark") had the same placeholder issue and the author fixed it before merge — so the correct fix is to do the same here.
Step-by-step proof
- Open the diff for
perf-changelog.yamland locate the new entry appended at the bottom. - Read line 2282:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO. - Compare to line 2274 (the immediately preceding entry, from PR Tune MiniMax MI355X vLLM scheduling thresholds #1276):
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1276— a real numeric id. - Compare to line 2257 (the sister B300 MTP entry from PR Add DSV4 B300 TRT MTP benchmark #1291):
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1291— also a real numeric id. - Visit
https://github.com/SemiAnalysisAI/InferenceX/pull/TODO→ GitHub returns 404 (no such PR). - This PR is Add DSV4 B200 TRT MTP benchmark #1294, so the link should be
https://github.com/SemiAnalysisAI/InferenceX/pull/1294.
Impact
Low-runtime / metadata-only impact, but the changelog is the canonical historical record for perf-relevant changes. A 404 link is a permanent papercut for anyone bisecting a regression to find the PR that introduced this config.
How to fix
One-character edit on line 2282:
- pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO
+ pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1294|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25468838278 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25472669948 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25472688699 |
Description
Adds DeepSeek-V4-Pro FP4 B200 TensorRT-LLM MTP coverage. The new config mirrors the existing B200 TRT STP search space with
spec-decoding: mtpand adds a matchingdsv4_fp4_b200_trt_mtp.shlauncher.Related Issue
N/A
Type of Change
Checklist
perf-changelog.yamlperf-changelog.yamlentries are appended to the end of the file (the file is chronological: oldest at top, newest at bottom)Validation
bash -n benchmarks/single_node/dsv4_fp4_b200_trt_mtp.shpython utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-trt-mtp.github/configs/nvidia-master.yamlandperf-changelog.yaml