[AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to 256 by ramineroane · Pull Request #1053 · SemiAnalysisAI/InferenceX

ramineroane · 2026-04-17T07:34:07Z

Summary

Expand the search space for GPT-OSS 120B FP4 on MI300X TP=1 from conc=64 to conc=256 for the 1k1k configuration.

Motivation

With 128 experts and top-4 routing, larger batch sizes significantly improve MoE weight amortization across HBM. At batch=64, nearly all 111/128 unique experts are loaded per decode step — increasing concurrency amortizes this weight loading cost across more tokens.

Results (single MI300X, vllm v0.17.0, ISL/OSL=1024)

Concurrency	Output TPS	Total TPS	Median TPOT	vs Baseline
64 (current)	2,008	4,016	31.4 ms	—
96	2,561	5,105	36.6 ms	+27%
128	2,990	5,981	41.3 ms	+49%
256	4,271	8,552	58.3 ms	+113%

Changes

.github/configs/amd-master.yaml: TP=1 conc-end from 64 → 256
perf-changelog.yaml: Added changelog entry

No benchmark script changes required.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Expand the search space for GPT-OSS 120B FP4 on MI300X TP=1 from conc=64 to conc=256 for the 1k1k configuration. With 128 experts (top-4 routing), larger batch sizes significantly improve MoE weight amortization across HBM. Measured results on a single MI300X: conc=64: 4,016 total TPS (baseline) conc=96: 5,105 total TPS (+27%) conc=128: 5,981 total TPS (+49%) conc=256: 8,552 total TPS (+113%) The existing benchmark script requires no changes — only the search space upper bound is adjusted.

functionstackx · 2026-04-17T07:46:39Z

hi @ramineroane thanks for the PR!

@seungrokj and @chunfangamd , etc is the CODEOWNER of amd inferencex code. they can review & run validation on ur PR and merge it

seungrokj · 2026-04-17T07:49:18Z

hi @ramineroane thanks for the PR!

@seungrokj and @chunfangamd , etc is the CODEOWNER of amd inferencex code. they can review & run validation on ur PR and merge it

Sure. working on it.

seungrokj · 2026-04-17T07:56:41Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm

github-actions · 2026-04-17T07:56:51Z

@seungrokj Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24554407956
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm
Pinned ref: 9b1b3a4
Approval: not required (trusted collaborator).

seungrokj · 2026-04-17T09:15:26Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm

github-actions · 2026-04-17T09:15:37Z

@seungrokj Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24557535782
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm
Pinned ref: ee0acc1
Approval: not required (trusted collaborator).

seungrokj

lgtm

seungrokj · 2026-04-17T10:34:31Z

hi @functionstackx @cquil11
can you plz merge this ?

seungrokj · 2026-04-17T10:36:11Z

@andyluo7 @ramineroane @chunfangamd

functionstackx

@seungrokj approved. feel free to merge

…256 (#1053)" [slip-sweep] This reverts commit 31f066c.

…256 (#1053)" [slip-sweep] (#1060) This reverts commit 31f066c.

ramineroane requested a review from a team April 17, 2026 07:34

ramineroane requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners April 17, 2026 07:34

github-project-automation Bot added this to InferenceMAX Board Apr 17, 2026

claude Bot reviewed Apr 17, 2026

View reviewed changes

ramineroane force-pushed the gptoss-fp4-mi300x-expand-conc branch from c3013cd to c1fdc98 Compare April 17, 2026 07:34

seungrokj added the AMD label Apr 17, 2026

Update amd-master.yaml

9b1b3a4

seungrokj added 2 commits April 17, 2026 18:12

Update amd-master.yaml

9462acb

Merge branch 'main' into gptoss-fp4-mi300x-expand-conc

ee0acc1

SemiAnalysisAI deleted a comment from github-actions Bot Apr 17, 2026

seungrokj approved these changes Apr 17, 2026

View reviewed changes

Merge branch 'main' into gptoss-fp4-mi300x-expand-conc

f11449e

functionstackx approved these changes Apr 17, 2026

View reviewed changes

seungrokj merged commit 31f066c into SemiAnalysisAI:main Apr 17, 2026
13 checks passed

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 17, 2026

cquil11 added a commit that referenced this pull request Apr 17, 2026

Revert "[AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to …

695b0cc

…256 (#1053)" [slip-sweep] This reverts commit 31f066c.

cquil11 mentioned this pull request Apr 17, 2026

Revert "[AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to 256 (#1053)" [slip-sweep] #1060

Merged

cquil11 added a commit that referenced this pull request Apr 17, 2026

Revert "[AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to …

ee05824

…256 (#1053)" [slip-sweep] (#1060) This reverts commit 31f066c.

cquil11 mentioned this pull request Apr 17, 2026

[AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to 256 #1061

Merged

claude Bot mentioned this pull request Apr 17, 2026

Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) #1071

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to 256#1053

[AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to 256#1053
seungrokj merged 5 commits into
SemiAnalysisAI:mainfrom
ramineroane:gptoss-fp4-mi300x-expand-conc

ramineroane commented Apr 17, 2026

Uh oh!

claude Bot left a comment

Uh oh!

functionstackx commented Apr 17, 2026

Uh oh!

seungrokj commented Apr 17, 2026

Uh oh!

seungrokj commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

seungrokj commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

seungrokj left a comment

Uh oh!

seungrokj commented Apr 17, 2026

Uh oh!

seungrokj commented Apr 17, 2026 •

edited

Loading

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

ramineroane commented Apr 17, 2026

Summary

Motivation

Results (single MI300X, vllm v0.17.0, ISL/OSL=1024)

Changes

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

functionstackx commented Apr 17, 2026

Uh oh!

seungrokj commented Apr 17, 2026

Uh oh!

seungrokj commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

seungrokj commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

seungrokj left a comment

Choose a reason for hiding this comment

Uh oh!

seungrokj commented Apr 17, 2026

Uh oh!

seungrokj commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

seungrokj commented Apr 17, 2026 •

edited

Loading