Add new FW (TRT) and precision support by kedarpotdar-nv · Pull Request #5 · SemiAnalysisAI/InferenceX

kedarpotdar-nv · 2025-09-05T02:18:33Z

Overview

This PR adds TensorRT-LLM (TRT-LLM) as a new inference framework for LLaMA 70B benchmarking on NVIDIA H200 and B200 GPUs, alongside the existing vLLM framework. This enables direct performance comparison between vLLM and TRT-LLM on the same hardware.

Key Features

Multi-framework support: vLLM and TRT-LLM for LLaMA 70B
Precision support: FP8 (default) and FP4 (future-ready)
Unified plotting: All frameworks and hardware on single performance plots
Framework-specific Docker images: Prevents image conflicts between frameworks
Clean job naming: Clear identification in GitHub Actions UI

🔧 Core Workflow Updates

.github/workflows/benchmark-tmpl.yml

Added framework and precision as required inputs
Updated RESULT_FILENAME to include framework and precision
Modified result processing to pass framework and precision to process_result.py
Fixed job naming to prevent duplication
Added hardware extraction logic for proper result processing

.github/workflows/70b-tmpl.yml

Added bmk-h200-trt and bmk-b200-trt jobs for TRT-LLM
Configured TRT-LLM jobs with nvidia/tensorrt-llm Docker image
Set precision to empty string for FP8 (default)
Updated all jobs to include framework and precision inputs

🚀 New Benchmark Scripts

benchmarks/70b_h200_trt_slurm.sh and benchmarks/70b_b200_trt_slurm.sh

TRT-LLM server setup for H200 using mpirun trtllm-serve
Inline llama-config.yml configuration
Client benchmarking with benchmark_serving.py

🔄 Launcher Script Updates

Updated SLURM Launchers
runners/launch_h200-nv.sh
runners/launch_h200-cw.sh
runners/launch_h200-nb.sh
runners/launch_b200-nv.sh
Key improvements:

Framework-specific SQSH file naming to prevent Docker image conflicts
Dynamic script selection based on framework (VLLM/SGLang use base scripts, TRT uses _trt scripts)
Proper MODEL_CODE environment variable passing to containers
Framework reset logic for VLLM and SGLang to use default script names

📊 Result Processing & Visualization

utils/process_result.py

Added framework and precision command-line arguments
Updated output data structure to include framework and precision
Default precision handling (empty string → 'fp8')

utils/plot_perf.py

Added distinct colors for TRT-LLM results:
h200-trt: dark green
b200-trt: gray
Unified plotting: all frameworks and hardware on single plots
Updated plot titles and legend handling
Model-specific plot generation

🧪 Testing Configuration

.github/workflows/workflow-scheduler.yml

Commented out concurrency and schedule blocks for manual testing
Disabled DSR1 jobs as requested

kimbochen

Thank you for the PR. lgtm

Add summarize.py (compact NCCL/DeepEP results table, printed at end of every job) and make it the result gate. Fix review findings: benchmark failures/skipped-deepep now fail the job instead of reporting green (#1); DeepEP nodes from SLURM_NNODES not world_size//8 (#3); apply Buffer.set_num_sms so num_comm_sms is real (#8); nccl-tests -c 1 with a missing check footer is now invalid (#7); use context managers for file reads (#4,#5); launchers export COLLECTIVEX_IMAGE/_DIGEST for provenance (#9); trim workflow_dispatch sku options to launcher-backed pools (#2). Artifact-path finding (#6) already fixed via cx_collect_results.

… rate, run links Addresses review #3 frontend critiques (backward-compatible with v2 docs): - Percentile selector p50/p90/p99 (p99 default); reads pooled-trial percentiles. - Suite selector backend-default vs resource-constrained — kept distinct, never read as one fair contest (#5). dtype/mode/resource/contract are all in the per-line label + hover; lines are uniquely colored (SKU family) + dashed-fp8 (#10). - Bandwidth axis renamed "Logical routed payload rate" using SEPARATE dispatch/combine bytes; serial bandwidth removed; serial relabeled "Σ isolated medians" (#6,#7). - Hover shows p50/p90/p99, contract, suite, and the WORKFLOW RUN (run id + sha) that produced the point (#1). Provenance text no longer claims a single dtype (the "bf16 while fp8 shown" bug); states routing-identity-proven, pooled-sample count, logical-rate caveat, suite-separation, and correctness-is-smoke (#9 fix).

kedarpotdar-nv added 13 commits September 4, 2025 16:59

added framework and precision vars. added initial support for trtllm 70b

accd2da

only run tp2

2859243

disable dsr1

b6939da

fix runner names

44b2c65

update enroot paths

04a764d

update runner script to account for model name

19e6dde

if framework==vllm, run base script. else run trt script

948b8d8

fix cw and nb

b8be39e

fix b200 runner name.

d0cdbcb

re-add dsr1 and h100

bc7af6b

fix sglang launch

9c06800

temp remove b200 vllm

3a635d0

re-add all configs

0a23d95

kedarpotdar-nv requested a review from kimbochen September 5, 2025 02:19

kimbochen reviewed Sep 5, 2025

View reviewed changes

kimbochen merged commit 0ef8128 into main Sep 5, 2025

kimbochen deleted the kepotdar-trt-init branch September 5, 2025 04:51

claude-code-infmax Bot mentioned this pull request Jan 21, 2026

[NV] Update DSR1 GB200 FP4 Disagg Submission #510

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new FW (TRT) and precision support#5

Add new FW (TRT) and precision support#5
kimbochen merged 13 commits into
mainfrom
kepotdar-trt-init

kedarpotdar-nv commented Sep 5, 2025

Uh oh!

kimbochen left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kedarpotdar-nv commented Sep 5, 2025

Overview

Key Features

🔧 Core Workflow Updates

🚀 New Benchmark Scripts

🔄 Launcher Script Updates

📊 Result Processing & Visualization

🧪 Testing Configuration

Uh oh!

kimbochen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants