Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
31b4fbe
[AMD] dsv4-fp4-mi355x-atom: enable DPA TBO at high concurrency, updat…
seungrokj Jun 12, 2026
c566e28
[AMD] perf-changelog: dsv4-fp4-mi355x-atom DPA TBO + image atom0.1.4
seungrokj Jun 12, 2026
7e1aa06
[AMD] perf-changelog: add PR link #1717
seungrokj Jun 12, 2026
65e0fa3
[AMD] dsv4_fp4_mi355x_atom.sh: disable prefix caching
seungrokj Jun 12, 2026
3f3560b
[AMD] dsv4-fp4-mi355x-atom: add max-model-len, eval context, extend c…
seungrokj Jun 12, 2026
c3b3289
[AMD] dsv4-fp4-mi355x-atom: narrow eval to single conc=1024 point, di…
seungrokj Jun 13, 2026
7ffa976
[AMD] dsv4_fp4_mi355x_atom.sh: add cudagraph-capture-sizes and max-nu…
seungrokj Jun 13, 2026
f2677b2
[AMD] dsv4-fp4-mi355x-atom: bump to nightly image, expand search spac…
seungrokj Jun 15, 2026
f5f0d66
[AMD] set GPU_MAX_HW_QUEUES=5 in dsv4_fp4_mi355x_atom.sh
seungrokj Jun 15, 2026
dc5b239
[AMD] dsv4-fp4-mi355x-atom: disable TBO, add TP4 rows for isl=8192, c…
seungrokj Jun 15, 2026
1dbf259
Merge branch 'main' into amd/dsv4_atom_0612
seungrokj Jun 15, 2026
9e18052
[AMD] dsv4_fp4_mi355x_atom.sh: quote SERVER_LOG variable
seungrokj Jun 16, 2026
c1812ed
[AMD] dsv4_fp4_mi355x_atom.sh: comment out dense cudagraph sizes
seungrokj Jun 16, 2026
28bdc6a
[AMD] dsv4_fp4_mi355x_atom.sh: fix --hf-overrides JSON escaping
seungrokj Jun 16, 2026
b36218e
[AMD] dsv4_fp4_mi355x_atom.sh: comment out dense cudagraph sizes
seungrokj Jun 16, 2026
fa47caf
[AMD] dsv4-fp4-mi355x-atom: expand search space, restore isl=1024 rows
seungrokj Jun 16, 2026
1022e0b
Merge branch 'main' into amd/dsv4_atom_0612
seungrokj Jun 16, 2026
af82c27
[AMD] perf-changelog: update dsv4-fp4-mi355x-atom image and search-sp…
seungrokj Jun 16, 2026
1300012
[AMD] dsv4_fp4_mi355x_atom.sh: restore sparse cudagraph capture sizes
seungrokj Jun 16, 2026
f56f877
[AMD] perf-changelog: revert dsv4-fp4-mi355x-atom image/search-space,…
seungrokj Jun 16, 2026
f7c9de8
Merge branch 'main' into amd/dsv4_atom_0612
seungrokj Jun 16, 2026
a4828cb
[AMD] perf-changelog: add dsv4-fp4-mi355x-sglang entry for PR #1762
seungrokj Jun 16, 2026
19b8757
update dsv4-fp4-mi355x-atom: bump image, enable TBO conditionally, fi…
seungrokj Jun 17, 2026
03aaa6b
expand dsv4-fp4-mi355x-atom search space: restore ISL1024 scenarios, …
seungrokj Jun 17, 2026
cf3962f
Merge branch 'main' into amd/dsv4_atom_0612
seungrokj Jun 17, 2026
421313c
Update perf-changelog.yaml
seungrokj Jun 17, 2026
ae77233
Update perf-changelog.yaml
seungrokj Jun 17, 2026
a8f6bd0
Update perf-changelog.yaml
seungrokj Jun 17, 2026
5fbd068
Update perf-changelog.yaml
seungrokj Jun 17, 2026
d080faa
update perf-changelog: move dsv4-fp4-mi355x-atom entry to end
seungrokj Jun 17, 2026
91f6277
narrow dsv4-fp4-mi355x-atom to DPA conc=256-2048 ISL8192, fix TBO bra…
seungrokj Jun 17, 2026
4364ef9
restore full dsv4-fp4-mi355x-atom search space: ISL1024 + ISL8192 TP4…
seungrokj Jun 17, 2026
6644109
Update perf-changelog.yaml
seungrokj Jun 18, 2026
471aff2
Update perf-changelog.yaml
seungrokj Jun 18, 2026
67b052f
fix: resolve PR 1717 changelog conflict
Oseltamivir Jun 18, 2026
bcf0d1f
Merge remote-tracking branch 'origin/main' into amd/dsv4_atom_0612
Oseltamivir Jun 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2261,15 +2261,8 @@ dsv4-fp4-mi355x-vllm-mtp:
search-space:
- { tp: 8, conc-start: 4, conc-end: 512, spec-decoding: mtp }

# Day-0 single-sequence marker for DeepSeek-V4 on ATOM (ROCm/ATOM#650).
# PR1 of the ATOM DSv4 series still uses torch sparse-attention fallbacks
# that OOM once warmup/prefill batches multiple requests; keep CONC=1 until
# the AITER sparse-attention kernel / multi-request path lands upstream.
# --enforce-eager and ATOM_USE_TRITON_MOE=1 are required on gfx950. Image is
# the standard atom0.1.2.post MI355X base (matching qwen3.5-fp8-mi355x-atom);
# the DSv4 PR is overlaid at runtime by dsv4_fp4_mi355x_atom.sh at a pinned SHA.
dsv4-fp4-mi355x-atom:
image: rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3
image: rocm/atom-dev:nightly_202606161823
model: deepseek-ai/DeepSeek-V4-Pro
model-prefix: dsv4
runner: mi355x
Expand All @@ -2281,13 +2274,20 @@ dsv4-fp4-mi355x-atom:
- isl: 1024
osl: 1024
search-space:
# conc4-64, TP8
# conc128-512, DPA
# conc1024-2048, DPA TBO
- { tp: 8, ep: 1, conc-start: 1, conc-end: 64 }
- { tp: 8, ep: 1, dp-attn: true, conc-start: 64, conc-end: 1024 }
- { tp: 8, ep: 1, dp-attn: true, conc-start: 64, conc-end: 2048 }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 1, conc-end: 64 }
- { tp: 8, ep: 1, dp-attn: true, conc-start: 64, conc-end: 512 }
# conc4-64, TP8
# conc128, DPA
# conc256-2048, DPA TBO
- { tp: 4, ep: 1, conc-list: [8, 16, 32, 64] }
- { tp: 8, ep: 1, conc-list: [1, 2, 4, 8, 16, 32, 64] }
- { tp: 8, ep: 1, dp-attn: true, conc-start: 128, conc-end: 2048 }

dsv4-fp4-mi355x-atom-mtp:
image: rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3
Expand Down
32 changes: 26 additions & 6 deletions benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,31 +22,51 @@ echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL, EP_SIZE: $EP_SIZE, DP_ATTENTIO
SERVER_LOG=/workspace/server.log

PARALLEL_ARGS=(-tp "$TP") #TP
CUDAGRAPH_SIZES='[1, 2, 4, 8, 16, 32, 48, 64, 128, 256, 512]'
if [ "$DP_ATTENTION" = "true" ]; then
if [ "$EP_SIZE" -gt 1 ]; then #DP+EP
PARALLEL_ARGS=(-tp "$TP" --enable-expert-parallel --enable-dp-attention )
else #DP+TP
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
else #DPA+TP
#DPA+TP+TBO
if [ "$ISL" -eq 1024 ] && [ "$OSL" -eq 1024 ] && [ "$CONC" -ge 1024 ]; then
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention --enable-tbo)
export GPU_MAX_HW_QUEUES=5
elif [ "$ISL" -eq 8192 ] && [ "$OSL" -eq 1024 ] && [ "$CONC" -ge 256 ]; then
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention --enable-tbo)
export GPU_MAX_HW_QUEUES=5
else
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
fi
fi
fi

BENCHMARK_MAX_MODEL_LEN="$MAX_MODEL_LEN"

if [ "${EVAL_ONLY}" = "true" ]; then
EVAL_MAX_MODEL_LEN=$(compute_eval_context_length "$MODEL" "$BENCHMARK_MAX_MODEL_LEN")
export EVAL_MAX_MODEL_LEN
fi
# Start GPU monitoring (power, temperature, clocks every second)
start_gpu_monitor

set -x
export ATOM_DISABLE_MMAP=true
export AITER_BF16_FP8_MOE_BOUND=0
export ATOM_MOE_GU_ITLV=1
# TODO: add --no-enable_chunked_prefill, when dsv4 prefix caching is supported
Comment thread
seungrokj marked this conversation as resolved.
#https://github.com/ROCm/ATOM/commit/7df93a181da4d3c3250c2441c7d5e2745a03d0cd#diff-61b1ba0b8b74523530d2d5cdc739d4f3a23a43bedf69015a5235844d46e9373bL1127
MEM_FRAC_STATIC=0.9
OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}')

python3 -m atom.entrypoints.openai_server \
--model $MODEL \
--server-port $PORT \
"${PARALLEL_ARGS[@]}" \
--kv_cache_dtype fp8 \
--trust-remote-code \
--gpu-memory-utilization 0.85 \
> $SERVER_LOG 2>&1 &
--gpu-memory-utilization $MEM_FRAC_STATIC \
--no-enable_prefix_caching \
--cudagraph-capture-sizes "${CUDAGRAPH_SIZES}" \
Comment thread
cursor[bot] marked this conversation as resolved.
Comment thread
cursor[bot] marked this conversation as resolved.
"${OPT_ARGS[@]}" \
> "$SERVER_LOG" 2>&1 &

SERVER_PID=$!

Expand Down
10 changes: 9 additions & 1 deletion perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3926,4 +3926,12 @@
- "Container: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-dev.1-cuda13"
- "Recipes sourced from NVIDIA/srt-slurm branch sa-submission-q2-2026 (gb300_nvfp4 STP recipes)"
- "Runner script launch_gb300-nv.sh: added dynamo-trt-specific glm5-fp4 case with SERVED_MODEL_NAME and SRT_SLURM_MODEL_PREFIX=nvidia/GLM-5-NVFP4"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1798
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1798

- config-keys:
- dsv4-fp4-mi355x-atom
description:
- "Update image to rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.4_20260612"
- "Update ISL=8192 search-space: TP8-only from conc=4-64, DPA from conc=128-1024 (previously conc=1-64 and DPA conc=64-512)"
- "Update Applied TBO on high concurrencies"
Comment thread
seungrokj marked this conversation as resolved.
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1717