Skip to content

[AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to 256#1053

Merged
seungrokj merged 5 commits into
SemiAnalysisAI:mainfrom
ramineroane:gptoss-fp4-mi300x-expand-conc
Apr 17, 2026
Merged

[AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to 256#1053
seungrokj merged 5 commits into
SemiAnalysisAI:mainfrom
ramineroane:gptoss-fp4-mi300x-expand-conc

Conversation

@ramineroane

Copy link
Copy Markdown
Contributor

Summary

Expand the search space for GPT-OSS 120B FP4 on MI300X TP=1 from conc=64 to conc=256 for the 1k1k configuration.

Motivation

With 128 experts and top-4 routing, larger batch sizes significantly improve MoE weight amortization across HBM. At batch=64, nearly all 111/128 unique experts are loaded per decode step — increasing concurrency amortizes this weight loading cost across more tokens.

Results (single MI300X, vllm v0.17.0, ISL/OSL=1024)

Concurrency Output TPS Total TPS Median TPOT vs Baseline
64 (current) 2,008 4,016 31.4 ms
96 2,561 5,105 36.6 ms +27%
128 2,990 5,981 41.3 ms +49%
256 4,271 8,552 58.3 ms +113%

Changes

  • .github/configs/amd-master.yaml: TP=1 conc-end from 64 → 256
  • perf-changelog.yaml: Added changelog entry

No benchmark script changes required.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Expand the search space for GPT-OSS 120B FP4 on MI300X TP=1 from
conc=64 to conc=256 for the 1k1k configuration.

With 128 experts (top-4 routing), larger batch sizes significantly
improve MoE weight amortization across HBM. Measured results on a
single MI300X:

  conc=64:  4,016 total TPS (baseline)
  conc=96:  5,105 total TPS (+27%)
  conc=128: 5,981 total TPS (+49%)
  conc=256: 8,552 total TPS (+113%)

The existing benchmark script requires no changes — only the search
space upper bound is adjusted.
@ramineroane ramineroane force-pushed the gptoss-fp4-mi300x-expand-conc branch from c3013cd to c1fdc98 Compare April 17, 2026 07:34
@seungrokj seungrokj added the AMD label Apr 17, 2026
@functionstackx

Copy link
Copy Markdown
Collaborator

hi @ramineroane thanks for the PR!

@seungrokj and @chunfangamd , etc is the CODEOWNER of amd inferencex code. they can review & run validation on ur PR and merge it

@seungrokj

Copy link
Copy Markdown
Collaborator

hi @ramineroane thanks for the PR!

@seungrokj and @chunfangamd , etc is the CODEOWNER of amd inferencex code. they can review & run validation on ur PR and merge it

Sure. working on it.

@seungrokj

Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm

@github-actions

Copy link
Copy Markdown
Contributor

@seungrokj Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24554407956
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm
Pinned ref: 9b1b3a4
Approval: not required (trusted collaborator).

@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot Apr 17, 2026
@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot Apr 17, 2026
@seungrokj

Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm

@github-actions

Copy link
Copy Markdown
Contributor

@seungrokj Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24557535782
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm
Pinned ref: ee0acc1
Approval: not required (trusted collaborator).

@seungrokj seungrokj left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@seungrokj

Copy link
Copy Markdown
Collaborator

hi @functionstackx @cquil11
can you plz merge this ?

@seungrokj

seungrokj commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

@functionstackx functionstackx left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seungrokj approved. feel free to merge

@seungrokj seungrokj merged commit 31f066c into SemiAnalysisAI:main Apr 17, 2026
13 checks passed
cquil11 added a commit that referenced this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

4 participants