Skip to content

[NVIDIA] Add DSR1 TensorRT Support and Enhanced Plotting#7

Merged
kimbochen merged 7 commits into
mainfrom
kepotdar-fix-chart-add-dsr1-trt
Sep 8, 2025
Merged

[NVIDIA] Add DSR1 TensorRT Support and Enhanced Plotting#7
kimbochen merged 7 commits into
mainfrom
kepotdar-fix-chart-add-dsr1-trt

Conversation

@kedarpotdar-nv

Copy link
Copy Markdown
Collaborator

Summary

This PR adds TensorRT-LLM FP4 support for DSR1 model and enhances the plotting system to better handle multiple model variants and precision types.

Changes Made

1. DSR1 Template Enhancements

✅ Added precision parameter to DSR1 template with fp8 default
✅ Added b200-trt job for DSR1 with TensorRT support
✅ Set b200-trt to use fp4 precision specifically
✅ Updated collect-results to include b200-trt job

2. TRT-LLM Configuration

✅ Created dsr1_b200_trt_slurm.sh benchmark script
✅ Uses TensorRT-LLM with trtllm-serve command
✅ Configured for DSR1 FP4 model (nvidia/DeepSeek-R1-0528-FP4)
✅ MTP support

3. Plotting System Improvements

✅ Enhanced model grouping - groups by model family (70b, dsr1) instead of full model names
✅ Added precision distinction - different markers for fp8 (circles) vs fp4 (squares)
✅ Improved legend labels - shows precision in labels (e.g., "B200-TRT (fp4)")

New Features

  • DSR1 TensorRT benchmarking with fp4 precision
  • Visual precision distinction in plots (circles vs squares)
  • Improved model grouping for better chart organization

@kimbochen

Copy link
Copy Markdown
Collaborator

Thank you for the PR. Can you revert the tp-list to full sweep?

@kedarpotdar-nv

Copy link
Copy Markdown
Collaborator Author

Done.

@kimbochen kimbochen merged commit e4e60be into main Sep 8, 2025
@kedarpotdar-nv kedarpotdar-nv deleted the kepotdar-fix-chart-add-dsr1-trt branch September 18, 2025 00:57
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
@cquil11 cquil11 changed the title Add DSR1 TensorRT Support and Enhanced Plotting [NVIDIA] Add DSR1 TensorRT Support and Enhanced Plotting Apr 8, 2026
Oseltamivir added a commit that referenced this pull request Jun 23, 2026
Add summarize.py (compact NCCL/DeepEP results table, printed at end of every job) and make it the result gate. Fix review findings: benchmark failures/skipped-deepep now fail the job instead of reporting green (#1); DeepEP nodes from SLURM_NNODES not world_size//8 (#3); apply Buffer.set_num_sms so num_comm_sms is real (#8); nccl-tests -c 1 with a missing check footer is now invalid (#7); use context managers for file reads (#4,#5); launchers export COLLECTIVEX_IMAGE/_DIGEST for provenance (#9); trim workflow_dispatch sku options to launcher-backed pools (#2). Artifact-path finding (#6) already fixed via cx_collect_results.
Oseltamivir added a commit that referenced this pull request Jun 25, 2026
… rate, run links

Addresses review #3 frontend critiques (backward-compatible with v2 docs):
- Percentile selector p50/p90/p99 (p99 default); reads pooled-trial percentiles.
- Suite selector backend-default vs resource-constrained — kept distinct, never read as
  one fair contest (#5). dtype/mode/resource/contract are all in the per-line label +
  hover; lines are uniquely colored (SKU family) + dashed-fp8 (#10).
- Bandwidth axis renamed "Logical routed payload rate" using SEPARATE dispatch/combine
  bytes; serial bandwidth removed; serial relabeled "Σ isolated medians" (#6,#7).
- Hover shows p50/p90/p99, contract, suite, and the WORKFLOW RUN (run id + sha) that
  produced the point (#1). Provenance text no longer claims a single dtype (the
  "bf16 while fp8 shown" bug); states routing-identity-proven, pooled-sample count,
  logical-rate caveat, suite-separation, and correctness-is-smoke (#9 fix).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants