From 23a6399c3982b50e061394c3de0e7d2dcf551f6b Mon Sep 17 00:00:00 2001 From: functionstackx <47992694+functionstackx@users.noreply.github.com> Date: Mon, 18 May 2026 00:16:00 -0400 Subject: [PATCH] KLAUD_DEBUG: B300 is sm_103 (not sm_120) + cross-link upstream issue MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two corrections to §4 (B300 sglang v0.5.12 regressions): 1. **Arch fix.** B300 (Blackwell Ultra datacenter) is compute capability 10.3 / `sm_103`, NOT `sm_120`. sm_120 is for consumer Blackwell (RTX 50 series / GB20x dies). This had propagated through agent diagnoses and into upstream issue sgl-project/sglang#25563 (already corrected there). 2. **§4c reframe.** sm_103 is *nominally inside* the asserted range `sm_100 <= arch <= sm_110f` (since 100 <= 103 <= 110), so the assertion failure is more interesting than "outside the range" — best guess is the cute kernel's `Arch.sm_110f` set only matches the architecture-specific feature-flag variants it was compiled for (sm_100, sm_100f, sm_110, sm_110f) and sm_103/sm_103a isn't in that list. Also cross-linked sgl-project/sglang#25563 under §4b (filed earlier this session for the EAGLE draft graph capture crash on GLM-5-NVFP4 at bs=128 — same B300 v0.5.12 regression family). Co-Authored-By: Claude Opus 4.7 (1M context) --- KLAUD_DEBUG.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/KLAUD_DEBUG.md b/KLAUD_DEBUG.md index 1f3d00876..92eb76bfc 100644 --- a/KLAUD_DEBUG.md +++ b/KLAUD_DEBUG.md @@ -66,7 +66,7 @@ Seen on: #1460 (dsv4-fp8-h200-sglang+mtp). ## 4. Upstream sglang v0.5.12 B300 regressions -Two distinct upstream regressions on NVIDIA B300 (Blackwell, `sm_120`) shipped in `lmsysorg/sglang:v0.5.12-cu130`: +Three distinct upstream regressions on NVIDIA B300 (Blackwell Ultra, `sm_103` — compute capability 10.3) shipped in `lmsysorg/sglang:v0.5.12-cu130`. (sm_120 is for *consumer* Blackwell / RTX 50 series, not B300 — don't propagate that.) ### 4a. DeepGemm TMA-descriptor crash (GLM-5-FP8) **Symptom:** CUDA graph capture aborts with `CUDA_ERROR_ILLEGAL_ADDRESS (700)` at `/deepgemm/csrc/.../runtime_utils.hpp:143` on the **first batch size** for **every TP rank**. Server never serves a prompt. @@ -86,7 +86,7 @@ Filed upstream: sgl-project/sglang#25551. Seen on #1421. 2. Comment out the MTP/EAGLE scenarios on B300 in the recipe. 3. Pin to v0.5.11-cu130. -Seen on #1420. +Filed upstream: sgl-project/sglang#25563. Seen on #1420. ### 4c. flash_attn SM-arch assertion (qwen3.5-bf16) **Symptom:** All 4 TP workers AssertionError on first forward pass: @@ -94,9 +94,9 @@ Seen on #1420. File "/opt/venv/.../sglang/srt/layers/attention/flashattention_backend.py:..." assert sm_100 <= arch <= sm_110f ``` -B300 is `sm_120`, outside the asserted range. Server never becomes healthy; warmup times out at 600s. +B300 is `sm_103` (compute capability 10.3, Blackwell Ultra) — which is *nominally inside* the asserted `sm_100..sm_110f` range, yet the assertion still fires. Best guess is the cute kernel's `Arch.sm_110f` set only matches the architecture-specific feature-flag variants it was compiled for (e.g. `sm_100`, `sm_100f`, `sm_110`, `sm_110f`) and `sm_103` / `sm_103a` isn't in that explicit list. Server never becomes healthy; warmup times out at 600s. -**Fix:** Needs sglang image with flash_attn supporting `sm_120` — no local workaround. Pin to v0.5.11-cu130 in the meantime. +**Fix:** Needs an sglang image with `flash_attn` that recognises `sm_103` / `sm_103a` — no local workaround. Pin to `v0.5.11-cu130` in the meantime. Seen on #1422.