Skip to content

Add DSV4 B200 TRT MTP benchmark#1294

Merged
Oseltamivir merged 3 commits into
mainfrom
dsv4-b200-trt-mtp
May 7, 2026
Merged

Add DSV4 B200 TRT MTP benchmark#1294
Oseltamivir merged 3 commits into
mainfrom
dsv4-b200-trt-mtp

Conversation

@Oseltamivir

Copy link
Copy Markdown
Collaborator

Description

Adds DeepSeek-V4-Pro FP4 B200 TensorRT-LLM MTP coverage. The new config mirrors the existing B200 TRT STP search space with spec-decoding: mtp and adds a matching dsv4_fp4_b200_trt_mtp.sh launcher.

Related Issue

N/A

Type of Change

  • Bug fix
  • New feature
  • Configuration change
  • Documentation update
  • Other (please describe)

Checklist

  • I have tested my changes locally
  • I have updated documentation if necessary
  • If I changed a container image or config, I have already updated perf-changelog.yaml
    • New perf-changelog.yaml entries are appended to the end of the file (the file is chronological: oldest at top, newest at bottom)

Validation

  • bash -n benchmarks/single_node/dsv4_fp4_b200_trt_mtp.sh
  • python utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-trt-mtp
  • YAML parse check for .github/configs/nvidia-master.yaml and perf-changelog.yaml

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@Oseltamivir Oseltamivir force-pushed the dsv4-b200-trt-mtp branch from d510ac7 to a4488d9 Compare May 7, 2026 00:09
Comment thread perf-changelog.yaml Outdated
- "Add DeepSeek-V4-Pro FP4 B200 TensorRT-LLM MTP coverage using ghcr.io/semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-9aa3715"
- "Mirror the B200 TRT STP search space with spec-decoding: mtp and TensorRT-LLM MTP num_nextn_predict_layers=2"
- "Benchmark serving uses the DeepSeek-V4 chat template for MTP acceptance-rate correctness"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new dsv4-fp4-b200-trt-mtp entry sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO (perf-changelog.yaml:2282), an unfilled placeholder that 404s. Every other entry in this file uses a real numeric PR id (e.g. /pull/1276, /pull/1291, /pull/1292) — please replace TODO with this PR's number (1294) before merging.

Extended reasoning...

What the bug is

perf-changelog.yaml line 2282 contains:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO

The literal token TODO is a placeholder that was never replaced. After merge this URL resolves to https://github.com/SemiAnalysisAI/InferenceX/pull/TODO, which is not a valid PR number and returns 404 — breaking the changelog's traceability link back to the PR that introduced this benchmark.

Why it matters / repo convention

Every other pr-link entry in perf-changelog.yaml (lines 2118–2274 — 19 prior entries) uses a real numeric PR id (e.g. /pull/1276, /pull/1291, /pull/1292, /pull/1293). AGENTS.md documents the placeholder convention with XXX that must be filled in before merge. The pattern in this repo is unambiguous: changelog entries always carry a working PR link.

This is also a known recurring foot-gun: the sister PR #1291 ("Add DSV4 B300 TRT MTP benchmark") had the same placeholder issue and the author fixed it before merge — so the correct fix is to do the same here.

Step-by-step proof

  1. Open the diff for perf-changelog.yaml and locate the new entry appended at the bottom.
  2. Read line 2282: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO.
  3. Compare to line 2274 (the immediately preceding entry, from PR Tune MiniMax MI355X vLLM scheduling thresholds #1276): pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1276 — a real numeric id.
  4. Compare to line 2257 (the sister B300 MTP entry from PR Add DSV4 B300 TRT MTP benchmark #1291): pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1291 — also a real numeric id.
  5. Visit https://github.com/SemiAnalysisAI/InferenceX/pull/TODO → GitHub returns 404 (no such PR).
  6. This PR is Add DSV4 B200 TRT MTP benchmark #1294, so the link should be https://github.com/SemiAnalysisAI/InferenceX/pull/1294.

Impact

Low-runtime / metadata-only impact, but the changelog is the canonical historical record for perf-relevant changes. A 404 link is a permanent papercut for anyone bisecting a regression to find the PR that introduced this config.

How to fix

One-character edit on line 2282:

-  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TODO
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1294

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

@Oseltamivir Oseltamivir merged commit f205e52 into main May 7, 2026
9 of 32 checks passed
@Oseltamivir Oseltamivir deleted the dsv4-b200-trt-mtp branch May 7, 2026 02:30
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant