Skip to content

[NVIDIA] Added gpt-oss for H200 TRT.#10

Merged
kimbochen merged 2 commits into
mainfrom
nv-gptoss-h200-trt
Sep 15, 2025
Merged

[NVIDIA] Added gpt-oss for H200 TRT.#10
kimbochen merged 2 commits into
mainfrom
nv-gptoss-h200-trt

Conversation

@kimbochen

Copy link
Copy Markdown
Collaborator

This PR with conflicts resolved

@kimbochen

Copy link
Copy Markdown
Collaborator Author

Verified the script works Run link
Thank you @kedarpotdar-nv

@kimbochen kimbochen merged commit e631e3f into main Sep 15, 2025
@kimbochen kimbochen deleted the nv-gptoss-h200-trt branch September 15, 2025 18:56
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
@cquil11 cquil11 changed the title Added gpt-oss for H200 TRT. [NVIDIA] Added gpt-oss for H200 TRT. Apr 8, 2026
Oseltamivir added a commit that referenced this pull request Jun 25, 2026
… rate, run links

Addresses review #3 frontend critiques (backward-compatible with v2 docs):
- Percentile selector p50/p90/p99 (p99 default); reads pooled-trial percentiles.
- Suite selector backend-default vs resource-constrained — kept distinct, never read as
  one fair contest (#5). dtype/mode/resource/contract are all in the per-line label +
  hover; lines are uniquely colored (SKU family) + dashed-fp8 (#10).
- Bandwidth axis renamed "Logical routed payload rate" using SEPARATE dispatch/combine
  bytes; serial bandwidth removed; serial relabeled "Σ isolated medians" (#6,#7).
- Hover shows p50/p90/p99, contract, suite, and the WORKFLOW RUN (run id + sha) that
  produced the point (#1). Provenance text no longer claims a single dtype (the
  "bf16 while fp8 shown" bug); states routing-identity-proven, pooled-sample count,
  logical-rate caveat, suite-separation, and correctness-is-smoke (#9 fix).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants