Add DSV4 B200 TRT MTP benchmark by Oseltamivir · Pull Request #1294 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-05-07T00:09:14Z

Description

Adds DeepSeek-V4-Pro FP4 B200 TensorRT-LLM MTP coverage. The new config mirrors the existing B200 TRT STP search space with spec-decoding: mtp and adds a matching dsv4_fp4_b200_trt_mtp.sh launcher.

Related Issue

N/A

Type of Change

Checklist

I have tested my changes locally
I have updated documentation if necessary
If I changed a container image or config, I have already updated perf-changelog.yaml
- New perf-changelog.yaml entries are appended to the end of the file (the file is chronological: oldest at top, newest at bottom)

Validation

bash -n benchmarks/single_node/dsv4_fp4_b200_trt_mtp.sh
python utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-trt-mtp
YAML parse check for .github/configs/nvidia-master.yaml and perf-changelog.yaml

github-actions · 2026-05-07T00:09:22Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-07T00:09:22Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-07T00:18:34Z

+    - "Add DeepSeek-V4-Pro FP4 B200 TensorRT-LLM MTP coverage using ghcr.io/semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-9aa3715"
+    - "Mirror the B200 TRT STP search space with spec-decoding: mtp and TensorRT-LLM MTP num_nextn_predict_layers=2"
+    - "Benchmark serving uses the DeepSeek-V4 chat template for MTP acceptance-rate correctness"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO


🔴 The new dsv4-fp4-b200-trt-mtp entry sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO (perf-changelog.yaml:2282), an unfilled placeholder that 404s. Every other entry in this file uses a real numeric PR id (e.g. /pull/1276, /pull/1291, /pull/1292) — please replace TODO with this PR's number (1294) before merging.

Extended reasoning...

What the bug is

perf-changelog.yaml line 2282 contains:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO

The literal token TODO is a placeholder that was never replaced. After merge this URL resolves to https://github.com/SemiAnalysisAI/InferenceX/pull/TODO, which is not a valid PR number and returns 404 — breaking the changelog's traceability link back to the PR that introduced this benchmark.

Why it matters / repo convention

Every other pr-link entry in perf-changelog.yaml (lines 2118–2274 — 19 prior entries) uses a real numeric PR id (e.g. /pull/1276, /pull/1291, /pull/1292, /pull/1293). AGENTS.md documents the placeholder convention with XXX that must be filled in before merge. The pattern in this repo is unambiguous: changelog entries always carry a working PR link.

This is also a known recurring foot-gun: the sister PR #1291 ("Add DSV4 B300 TRT MTP benchmark") had the same placeholder issue and the author fixed it before merge — so the correct fix is to do the same here.

Step-by-step proof

Open the diff for perf-changelog.yaml and locate the new entry appended at the bottom.

Read line 2282: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO.

Compare to line 2274 (the immediately preceding entry, from PR Tune MiniMax MI355X vLLM scheduling thresholds #1276): pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1276 — a real numeric id.

Compare to line 2257 (the sister B300 MTP entry from PR Add DSV4 B300 TRT MTP benchmark #1291): pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1291 — also a real numeric id.

Visit https://github.com/SemiAnalysisAI/InferenceX/pull/TODO → GitHub returns 404 (no such PR).

This PR is Add DSV4 B200 TRT MTP benchmark #1294, so the link should be https://github.com/SemiAnalysisAI/InferenceX/pull/1294.

Impact

Low-runtime / metadata-only impact, but the changelog is the canonical historical record for perf-relevant changes. A 404 link is a permanent papercut for anyone bisecting a regression to find the PR that introduced this config.

How to fix

One-character edit on line 2282:

- pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO + pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1294

github-actions · 2026-05-07T02:08:13Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25468838278
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25468838278

github-actions · 2026-05-07T02:30:05Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25472669948
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25472669948

github-actions · 2026-05-07T02:33:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25472688699
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25472688699

Oseltamivir requested a review from a team May 7, 2026 00:09

Oseltamivir requested review from jgangani and kedarpotdar-nv as code owners May 7, 2026 00:09

github-project-automation Bot added this to InferenceMAX Board May 7, 2026

Add DSV4 B200 TRT MTP benchmark

a4488d9

Oseltamivir force-pushed the dsv4-b200-trt-mtp branch from d510ac7 to a4488d9 Compare May 7, 2026 00:09

claude Bot reviewed May 7, 2026

View reviewed changes

Oseltamivir added the full-sweep-enabled label May 7, 2026

Oseltamivir added 2 commits May 6, 2026 19:29

Update nvidia-master.yaml

48ce311

Merge branch 'main' into dsv4-b200-trt-mtp

332d0a3

Oseltamivir merged commit f205e52 into main May 7, 2026
9 of 32 checks passed

Oseltamivir deleted the dsv4-b200-trt-mtp branch May 7, 2026 02:30

github-project-automation Bot moved this to Done in InferenceMAX Board May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DSV4 B200 TRT MTP benchmark#1294

Add DSV4 B200 TRT MTP benchmark#1294
Oseltamivir merged 3 commits into
mainfrom
dsv4-b200-trt-mtp

Oseltamivir commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

claude Bot May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Oseltamivir commented May 7, 2026

Description

Related Issue

Type of Change

Checklist

Validation

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

claude Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant