Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2953,7 +2953,7 @@ dsv4-fp8-h200-vllm-agentic:
# --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.

dsv4-fp8-h200-sglang:
image: lmsysorg/sglang:deepseek-v4-hopper@sha256:7f19c6dc092e47a10fac2e41f47eab78970280d06648b8e50d312a82f0ae722f
image: lmsysorg/sglang:v0.5.12-cu130

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The comment at .github/configs/nvidia-master.yaml:2891-2893 justifies the h200-dgxc-slurm runner pinning on the grounds that 'the deepseek-v4-hopper image needs the /ix mount layout that only launch_h200-dgxc-slurm.sh sets up' — but this PR removes the only references to that image, making the rationale stale and misleading. Please either update/remove the comment, or explicitly note the runner pinning is now retained for a different reason (and consider whether it can be relaxed, since the conditional in runners/launch_h200-dgxc-slurm.sh:308 that gates /ix vs /workspace on the deepseek-v4-hopper substring is now dead code).

Extended reasoning...

What the bug is

The PR bumps both dsv4-fp8-h200-sglang and dsv4-fp8-h200-sglang-mtp from lmsysorg/sglang:deepseek-v4-hopper@sha256:7f19c6dc... to the generic lmsysorg/sglang:v0.5.12-cu130. Sitting just above the first updated entry, at .github/configs/nvidia-master.yaml:2891-2893, is an explanatory comment:

Pinned to the h200-dgxc-slurm runner pool because the deepseek-v4-hopper image needs the /ix mount layout that only launch_h200-dgxc-slurm.sh sets up.

After this PR, no config in the repo references a deepseek-v4-hopper image, so the documented rationale no longer applies. The comment now describes a constraint that does not exist in the current state of the repo.

Why this matters

A future maintainer reading this block will be told the runner pin exists because of a /ix-mount requirement tied to the deepseek-v4-hopper image. That is no longer true. They may either:

  1. Believe the constraint still applies and propagate it to other configs, or
  2. Be unable to safely relax the runner: h200-dgxc pin because they cannot trust the now-incorrect rationale.

The companion runtime code at runners/launch_h200-dgxc-slurm.sh:308 (if [[ "$IMAGE" == *deepseek-v4-hopper* ]]; then CONTAINER_MOUNT_DIR=/ix; else CONTAINER_MOUNT_DIR=/workspace; fi) is also affected: after this PR the *deepseek-v4-hopper* substring will never match any image in the configs, so the /ix branch is dead. These two recipes will now be mounted at /workspace rather than /ix at runtime.

Step-by-step proof

  1. Before this PR: the comment at lines 2891-2893 explains why the next two recipes (dsv4-fp8-h200-sglang and dsv4-fp8-h200-sglang-mtp at lines 2917 and 2941) use runner: h200-dgxc, referencing the deepseek-v4-hopper image's /ix-mount requirement.
  2. This PR changes both image: fields from lmsysorg/sglang:deepseek-v4-hopper@sha256:7f19c6dc... to lmsysorg/sglang:v0.5.12-cu130.
  3. After this PR, grep for deepseek-v4-hopper in .github/configs/nvidia-master.yaml finds only line 2892 — the comment text itself. No config still uses that image.
  4. At runtime, launch_h200-dgxc-slurm.sh:308 does IMAGE=$IMAGE; if [[ "$IMAGE" == *deepseek-v4-hopper* ]]; then CONTAINER_MOUNT_DIR=/ix; else CONTAINER_MOUNT_DIR=/workspace; fi. With IMAGE=lmsysorg/sglang:v0.5.12-cu130, the substring match fails and the else branch runs, so the mount point is now /workspace rather than /ix. The documented /ix behavior no longer occurs for these recipes.

Impact

No functional impact on this sweep — the new generic image works fine with the /workspace mount the else branch provides. This is a documentation-rot / dead-code issue introduced by this PR.

How to fix

Pick one:

  • Remove the comment at lines 2891-2893 (and consider whether runner: h200-dgxc can be relaxed to a less restrictive pool now that the /ix constraint is gone).
  • Rewrite the comment to capture the actual current reason for h200-dgxc pinning (if there is one — e.g. Slurm/enroot infrastructure availability rather than mount layout).
  • Optionally remove the now-dead *deepseek-v4-hopper* branch in runners/launch_h200-dgxc-slurm.sh:308 as a drive-by cleanup.

model: deepseek-ai/DeepSeek-V4-Pro
model-prefix: dsv4
runner: h200-dgxc
Expand All @@ -2977,7 +2977,7 @@ dsv4-fp8-h200-sglang:
# runner pool, search space) and adds EAGLE speculative decoding via
# --speculative-algorithm EAGLE with the (3,1,4) chain matching dsv4-fp4-b300-sglang-mtp.
dsv4-fp8-h200-sglang-mtp:
image: lmsysorg/sglang:deepseek-v4-hopper@sha256:7f19c6dc092e47a10fac2e41f47eab78970280d06648b8e50d312a82f0ae722f
image: lmsysorg/sglang:v0.5.12-cu130
model: deepseek-ai/DeepSeek-V4-Pro
model-prefix: dsv4
runner: h200-dgxc
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/single_node/dsv4_fp8_h200_sglang.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ PYTHONNOUSERSITE=1 sglang serve \
--chunked-prefill-size 4096 \
--disable-flashinfer-autotune \
--disable-radix-cache \
--mem-fraction-static 0.88 \
--mem-fraction-static 0.92 \
--max-running-requests "$(( CONC * 3 / 2 > 8 ? CONC * 3 / 2 : 8 ))" \
$EVAL_CONTEXT_ARGS >> $SERVER_LOG 2>&1 &

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/single_node/dsv4_fp8_h200_sglang_mtp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ PYTHONNOUSERSITE=1 sglang serve \
--chunked-prefill-size 4096 \
--disable-flashinfer-autotune \
--disable-radix-cache \
--mem-fraction-static 0.88 \
--mem-fraction-static 0.92 \
--max-running-requests "$(( CONC * 3 / 2 > 8 ? CONC * 3 / 2 : 8 ))" \
--speculative-algorithm EAGLE \
--speculative-num-steps 3 \
Expand Down
7 changes: 7 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3083,3 +3083,10 @@
description:
- "Update SGLang image from v0.5.10.post1-cu130 / v0.5.11-cu130 (30d old) to v0.5.12-cu130"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1451

- config-keys:
- dsv4-fp8-h200-sglang
- dsv4-fp8-h200-sglang-mtp
description:
- "Update SGLang image from SHA-pinned deepseek-v4-hopper custom build (15/14d old) to v0.5.12-cu130"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1460
Loading