[NVIDIA][GB300] update DSR1 FP8 GB300 TRTLLM image to latest#1767
Conversation
….1-cuda13, fix gsm8k accuracy
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27523259199 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27523630547 |
|
might need to do a quick update to 1k1k config , please do not enable sweep yet |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9031353. Configure here.
68615eb to
3a9d26c
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27590368246 |
2 similar comments
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27590368246 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27590368246 |
Updated description for gsm8k accuracy fix to include config updates.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27595478969 |
functionstackx
left a comment
There was a problem hiding this comment.
overall lgtm besides more detailed perfchange plz, explaining the onesidealltoall error vs twosidealltoall
|
|
||
| - config-keys: | ||
| - dsr1-fp8-gb300-dynamo-trt | ||
| description: |
There was a problem hiding this comment.
@xinli-sw can u add an new more of an description of what was broken and what was the fix?
Updated description for gsm8k accuracy fix and config updates.
|
@functionstackx done |
|
/reuse-sweep-run |
Add a CPU-only artifact recovery path for PR #1767 using successful source run 27595478969.

The release of
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-dev.1-cuda13contains a fix for NVIDIA/srt-slurm#51, where accuracy for one MTP config drops to 88% on GSM8k. While the point still passes evals on InferenceX, 95% is usually expected.Dropped these configs
In the latest container, NVLinkOneSided is the preferred backend and also the default, removing these to recover perf.
Note
Medium Risk
Changes published inference benchmark image and hardware runner for a flagship multinode TRT config, which directly affects accuracy and throughput numbers but does not alter application runtime code paths.
Overview
Updates
dsr1-fp8-gb300-dynamo-trtto usetensorrtllm-runtime:1.3.0-dev.1-cuda13(from0.8.1.post2) and switches the runner fromgb300togb300-nv, aligning the GB300 DeepSeek-R1 FP8 Dynamo/TRT benchmark with the latest container that fixes GSM8K accuracy and MTP concurrency issues.Documents the change in
perf-changelog.yaml: restored ~95% GSM8K accuracy (vs 88% on one point), resolution of numeric issues affecting some MTP points, and note that DSR1 TRTLLM FP8 configs should use the new image.Reviewed by Cursor Bugbot for commit 4b8d282. Bugbot is set up for automated code reviews on this repo. Configure here.