forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 49
Pull requests: ROCm/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[ROCm][Kernel] W4A16 skinny GEMM: hand-asm kernel for gfx1151
#1045
opened Jul 1, 2026 by
mgehre-amd
•
Draft
[bench] vit_attention/bench.py: add --backend, default to running all
#1044
opened Jul 1, 2026 by
mgehre-amd
•
Draft
feat(mi450): switch to nightlies index, update version pins, bake gfx1250 env vars
#1043
opened Jun 30, 2026 by
kiran-thumma
Collaborator
Loading…
3 tasks
[CI] Upload staging wheels to S3 for PR/dispatch builds
#1040
opened Jun 30, 2026 by
marcusr-amd
Loading…
5 tasks
gpt-oss gfx1250 ATOM-parity perf patches
#1035
opened Jun 29, 2026 by
dllehr-amd
Collaborator
Loading…
[ROCm] Split skinny_gemms_int8.cu into per-N translation units
#1028
opened Jun 26, 2026 by
marcusr-amd
•
Draft
3 of 5 tasks
tune Qwen3-VL-4B prefill unified-attention on gfx1150
#1024
opened Jun 26, 2026 by
qingxuamd
Loading…
[ROCm][MoE] W4A16 MoE routing-distribution benchmark suite for the gfx11 prefill GEMM
#1020
opened Jun 25, 2026 by
roberteg16
•
Draft
[ROCm][MoE] Custom W4A16 MoE prefill WMMA GEMM for gfx11 (default-on)
#1015
opened Jun 22, 2026 by
roberteg16
Loading…
[ROCm][MoE] Modular MoE: alias fused_out with output to skip finalize copy
#940
opened May 19, 2026 by
mgehre-amd
Loading…
2 tasks done
feat: Add NPU+GPU async pipelining for vision-language models
#936
opened May 14, 2026 by
liangliangchang
•
Draft
4 of 5 tasks
Annotate VLM/audio tower nn.Linear calls in PyTorch profiles
#934
opened May 13, 2026 by
mgehre-amd
Loading…
[bench] wvSplitK skinny GEMM: capture timed iters into a CUDA graph
#928
opened May 8, 2026 by
mgehre-amd
•
Draft
Auto-build flash-attn wheels on push, upload to S3
#910
opened Apr 30, 2026 by
mgehre-amd
•
Draft
1 task
[ROCm][DSv4] Share AITER decode dequant + fp8-cast buffers across layers (rebased, stacked on #902)
#903
opened Apr 27, 2026 by
ChuanLi1101
•
Draft
2 of 4 tasks
[ROCm][DSv4] Make AITER sparse decode cudagraph-clean (rebased, stacked on #901)
#902
opened Apr 27, 2026 by
ChuanLi1101
•
Draft
2 of 5 tasks
[ROCm][DSv4] AITER-accelerated MLA decode for DeepSeek V4 on MI355X (rebased on tj/dsv4prrebase)
#901
opened Apr 27, 2026 by
ChuanLi1101
•
Draft
1 of 4 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.