[NVIDIA] Fix vllm & sglang b200 updated containers by kedarpotdar-nv · Pull Request #4 · SemiAnalysisAI/InferenceX

kedarpotdar-nv · 2025-09-03T23:17:16Z

No description provided.

Modify GB200 runs to use test partition

…DSA state-index path amd-master.yaml - Image: rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0402 -> lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523 (matches qwen3.5-fp8-mi355x-sglang-disagg; the older 0.5.9 image is no longer the reference build for hybrid-attention disagg models on MI355X.) - Scenarios: collapse the four legacy "top/middle/bottom/small-scale" search-spaces per ISL into a single 1P+1D TP=8 EP=1 dp-attn=false entry with the standard conc-list [8, 16, 32, 64, 128, 256, 512] for both 1k1k and 8k1k. dp-attn=false avoids the fused_moe_triton/layer.py:209 shared-slot assertion that --enable-dp-attention + --moe-a2a-backend mori triggers for GLM-5 (256 routed + 1 shared expert; (256-1) % 8 = 7 != 0). The collapsed layout mirrors the qwen3.5-fp8-mi355x-sglang-disagg shape so the same CI matrix-expansion logic applies to both. patches/mori_conn.py - Add patch #4: rank + length normalization in MoriKVReceiver._send_swa_dsa_state, immediately before the group_concurrent_contiguous call. For GLM-5 (single DSA component), upstream hands dst_state_indices as a 2-D (1, N) array while src_state_indices is 1-D length 1; the existing [:common_len] slice operates only on the outer axis, leaving the rank mismatched. np.diff then produces (1, N-1) vs (0,), which can't broadcast and crashes with "operands could not be broadcast together with shapes (1,12) (0,)". The fix ravels both indices to 1-D and re-truncates to common length so np.diff outputs compatible 1-D arrays. One-shot log gates the warning to once per receiver class. - Verified end-to-end: glm5-fp8-mi355x-sglang-disagg gsm8k flexible-extract = 0.9704 +/- 0.0047 glm5-fp8-mi355x-sglang-disagg gsm8k strict-match = 0.9712 +/- 0.0046 qwen3.5-fp8-mi355x-sglang-disagg gsm8k (regression) = 0.9780 +/- 0.004 Patch #4 fires zero times on the Qwen3.5 Mamba path (it lives inside _send_swa_dsa_state, never called for Mamba); patches #1-#3 behavior is unchanged. patches/README.md - Document patch #4 alongside the existing three. Cross-link the full bug analysis at scripts/sglang_disagg/docs_glm5/01-bug-analysis.md and the gsm8k verification at scripts/sglang_disagg/docs_glm5/02-fix-and-verification.md.

Add summarize.py (compact NCCL/DeepEP results table, printed at end of every job) and make it the result gate. Fix review findings: benchmark failures/skipped-deepep now fail the job instead of reporting green (#1); DeepEP nodes from SLURM_NNODES not world_size//8 (#3); apply Buffer.set_num_sms so num_comm_sms is real (#8); nccl-tests -c 1 with a missing check footer is now invalid (#7); use context managers for file reads (#4,#5); launchers export COLLECTIVEX_IMAGE/_DIGEST for provenance (#9); trim workflow_dispatch sku options to launcher-backed pools (#2). Artifact-path finding (#6) already fixed via cx_collect_results.

…p99, routing identity Addresses review #3 methodology critiques (schema_version 3): - Explicit measurement contracts (#4): adapters declare SUPPORTED_CONTRACTS and conform, rather than each choosing its own timing boundary. layout-and-dispatch-v1 times get_dispatch_layout INSIDE dispatch (the only contract MoRI can honor — its layout is computed in-kernel); cached-layout-comm-only-v1 hoists layout out (DeepEP normal) so dispatch is pure comm. run_ep.py rejects unsupported contract / ll+cached-layout. The misleading "comm-only-v1" label is gone. - Pooled-trial percentiles (#9, #2): N trials (default 3) x iters, token-order randomized per trial (seeded => identical across ranks; MoRI keeps ascending to avoid cold-jump wedge), per-iteration cross-rank-MAX samples POOLED, then p50/p90/p99 (p99 headline). p99 from ~50 samples was just the max. (#2 aggregation was already Q_p(max_r); verified.) - Routing identity proof (#3): routing_hash now SHA-256 of topk_idx AND gate weights; cross-rank trace-signature MIN==MAX check proves every rank (NVIDIA + AMD) built the identical trace, else status=invalid. Added per-dest-rank send histogram. - Separated logical bytes (#6): dispatch_logical_bytes + combine_logical_bytes recorded at their real dtypes with byte_contract; serial bandwidth removed. serial relabeled "sum of isolated medians". Correctness scope tagged roundtrip-reconstruction-smoke-v1 (#8 honesty). - Run linkage (#1): artifacts record GHA run_id/attempt/source SHA when present.

kedarpotdar-nv added 6 commits September 3, 2025 09:21

fix vllm launch

4f9ee5e

re-enable dsr1 and update image ID to re-fetch

7e6577b

rollback dsr1

22c9710

fix dsr1, remove 70b

492de4c

readd 70b

2e21fe9

re-add other tests

594bc88

kimbochen merged commit 75ec29c into main Sep 4, 2025

kimbochen deleted the fix-vllm-b200 branch September 4, 2025 00:42

claude-code-infmax Bot mentioned this pull request Jan 17, 2026

[NVIDIA] fix: update ep metadata in gb200 dynamo sglang configs to match comments #486

Merged

jthomson04 pushed a commit to jthomson04/InferenceMAX that referenced this pull request Jan 21, 2026

Merge pull request SemiAnalysisAI#4 from NVIDIA/test-runner-gb200

853761f

Modify GB200 runs to use test partition

claude-code-infmax Bot mentioned this pull request Jan 21, 2026

[NV] Update DSR1 GB200 FP4 Disagg Submission #510

Merged

cquil11 added the NVIDIA label Apr 8, 2026

cquil11 changed the title ~~Fix vllm & sglang b200 updated containers~~ [NVIDIA] Fix vllm & sglang b200 updated containers Apr 8, 2026

claude Bot mentioned this pull request May 18, 2026

[Klaud Cold] Add qwen3.5-fp8-mi325x-sglang-mtp recipe #1484

Merged

2 tasks

Oseltamivir added a commit that referenced this pull request May 26, 2026

T4 retrigger #4: runner pool freed

2f6aa0c

claude Bot mentioned this pull request May 28, 2026

short term patch: GLM-5 disagg: port MoRI conn.py overlay to fix PD startup crash #1578

Merged

4 tasks

cursor Bot mentioned this pull request May 28, 2026

[MoRI short term temp patch] GLM-5 FP8 MI355X SGLang disaggregated #1572

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVIDIA] Fix vllm & sglang b200 updated containers#4

[NVIDIA] Fix vllm & sglang b200 updated containers#4
kimbochen merged 6 commits into
mainfrom
fix-vllm-b200

kedarpotdar-nv commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

kedarpotdar-nv commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants