Skip to content

p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang#1255

Closed
chunfangamd wants to merge 8 commits into
mainfrom
chun/dsv4_pro_fp8
Closed

p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang#1255
chunfangamd wants to merge 8 commits into
mainfrom
chun/dsv4_pro_fp8

Conversation

@chunfangamd

Copy link
Copy Markdown
Collaborator
  • Upgrade Image to rocm/sgl-dev:rocm720-mi35x-c924543-20260430-DSv4
  • Enable TileLang Attn/Indexer + CUDA Graph

- bump to c924543 daily image
- enable TileLang attn/indexer + cuda graph
@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@chunfangamd

Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys dsv4-fp8-mi355x-sglang

@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

@chunfangamd Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25219898864
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys dsv4-fp8-mi355x-sglang
Pinned ref: 1730de5
Approval: not required (trusted collaborator).

Comment thread perf-changelog.yaml Outdated
- "Keep SGLANG_TOPK_TRANSFORM_512_TORCH=1 for now: sgl-project/sglang#24143 (topk512 native ROCm kernel) merged 4-30 21:31 UTC, after the c924543 image was built (4-30 08:26 UTC); will flip to 0 once a newer daily image lands"
- "Keep SGLANG_DSV4_FP4_EXPERTS=false and SGLANG_FORCE_TRITON_MOE_FP8=1: required for sgl-project/DeepSeek-V4-Pro-FP8 (FP4 path asserts intermediate_size_per_partition==2048 in fp8.py; swiglu_limit clamp lives in fused_moe_triton)"
- "Expected speedup over the previous PR #23608 day-0 torch-fallback recipe: ~5.4-5.8x at conc 1-8 (matches the '+ indexer tilelang attn' tier in the AMD DSv4-Flash-FP8 reference table)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry for dsv4-fp8-mi355x-sglang has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder — the literal token Placeholder was never substituted with this PR's number (#1255). That URL 404s and breaks the file's universal convention of using a real numeric PR id. Replace Placeholder with 1255 before merge.

Extended reasoning...

Bug

The new perf-changelog.yaml entry added by this PR ends with:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder

The trailing Placeholder is a literal string — not a numeric PR id. Every other one of the 240+ entries in this file uses a real numeric PR number (the immediately preceding entry, for example, uses /pull/1242). The current PR is #1255, so this should read /pull/1255.

Why this matters

  • The URL https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder resolves to a 404, so any human (or doc tool) clicking through from the changelog gets a broken link.
  • The pr-link field is the documented mechanism that ties a config-keys change to the PR that introduced it. Any internal tooling that scrapes pr-link to attribute config-key changes to PRs (release notes, blame-style audit, regression triage) will either fail or attribute this entry to a non-existent PR.
  • It breaks the file's universal convention — this is the only entry in 2000+ lines of perf-changelog.yaml that does not point to a real PR.

Why nothing caught it

The Pydantic validator at utils/matrix_logic/validation.py declares pr_link as a plain str with no regex/numeric constraint, so a literal Placeholder passes schema validation. The CI is therefore green even though the metadata is wrong.

Fix

One-character change — replace Placeholder with 1255 on the last line of perf-changelog.yaml:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1255

Step-by-step proof

  1. Open perf-changelog.yaml and look at line 2083 (the last line of the diff). It reads: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Placeholder.
  2. Take that URL and resolve it: GitHub does not have a PR named Placeholder in this repo, so the page 404s.
  3. Look at the immediately preceding pr-link entry (PR Add GB200 DSV4 Dynamo vLLM MTP2 recipes #1242 entry) and every other pr-link in the file — all use a numeric id. This entry is the lone exception.
  4. The PR description / metadata identifies this PR as p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang #1255, so the intended substitution is unambiguous.

Severity: nit — this is metadata-only, doesn't affect benchmark correctness or execution, but should be fixed before merge to preserve the changelog convention and keep pr-link-scraping tooling working.

@functionstackx functionstackx changed the title Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang p0: Update Image and Enable TileLang Attn/Indexer+CUDA Graph for DSv4 FP8 SGLang May 1, 2026
Comment thread .github/configs/amd-master.yaml
@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot May 1, 2026
@github-actions

github-actions Bot commented May 3, 2026

Copy link
Copy Markdown
Contributor

@github-actions

github-actions Bot commented May 3, 2026

Copy link
Copy Markdown
Contributor

@functionstackx

Copy link
Copy Markdown
Collaborator

this one is superceded by @Oseltamivir fp4+fp8 hybrid pto update right? Now that AMD sglang supports that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants