[NVIDIA] Add DSR1 TensorRT Support and Enhanced Plotting#7
Merged
Conversation
Collaborator
|
Thank you for the PR. Can you revert the |
Collaborator
Author
|
Done. |
Oseltamivir
added a commit
that referenced
this pull request
Jun 23, 2026
Add summarize.py (compact NCCL/DeepEP results table, printed at end of every job) and make it the result gate. Fix review findings: benchmark failures/skipped-deepep now fail the job instead of reporting green (#1); DeepEP nodes from SLURM_NNODES not world_size//8 (#3); apply Buffer.set_num_sms so num_comm_sms is real (#8); nccl-tests -c 1 with a missing check footer is now invalid (#7); use context managers for file reads (#4,#5); launchers export COLLECTIVEX_IMAGE/_DIGEST for provenance (#9); trim workflow_dispatch sku options to launcher-backed pools (#2). Artifact-path finding (#6) already fixed via cx_collect_results.
Oseltamivir
added a commit
that referenced
this pull request
Jun 25, 2026
… rate, run links Addresses review #3 frontend critiques (backward-compatible with v2 docs): - Percentile selector p50/p90/p99 (p99 default); reads pooled-trial percentiles. - Suite selector backend-default vs resource-constrained — kept distinct, never read as one fair contest (#5). dtype/mode/resource/contract are all in the per-line label + hover; lines are uniquely colored (SKU family) + dashed-fp8 (#10). - Bandwidth axis renamed "Logical routed payload rate" using SEPARATE dispatch/combine bytes; serial bandwidth removed; serial relabeled "Σ isolated medians" (#6,#7). - Hover shows p50/p90/p99, contract, suite, and the WORKFLOW RUN (run id + sha) that produced the point (#1). Provenance text no longer claims a single dtype (the "bf16 while fp8 shown" bug); states routing-identity-proven, pooled-sample count, logical-rate caveat, suite-separation, and correctness-is-smoke (#9 fix).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds TensorRT-LLM FP4 support for DSR1 model and enhances the plotting system to better handle multiple model variants and precision types.
Changes Made
1. DSR1 Template Enhancements
✅ Added precision parameter to DSR1 template with fp8 default
✅ Added b200-trt job for DSR1 with TensorRT support
✅ Set b200-trt to use fp4 precision specifically
✅ Updated collect-results to include b200-trt job
2. TRT-LLM Configuration
✅ Created dsr1_b200_trt_slurm.sh benchmark script
✅ Uses TensorRT-LLM with trtllm-serve command
✅ Configured for DSR1 FP4 model (nvidia/DeepSeek-R1-0528-FP4)
✅ MTP support
3. Plotting System Improvements
✅ Enhanced model grouping - groups by model family (70b, dsr1) instead of full model names
✅ Added precision distinction - different markers for fp8 (circles) vs fp4 (squares)
✅ Improved legend labels - shows precision in labels (e.g., "B200-TRT (fp4)")
New Features