[CUDA] Allow dynamic shmem of size > 48K in runtime#11478
Conversation
|
|
||
| if (fcache_[device_id] == nullptr) { | ||
| fcache_[device_id] = m_->GetFunc(device_id, func_name_); | ||
| if (wl.dyn_shmem_size >= (48 << 10)) { |
There was a problem hiding this comment.
if dynamic memory is too large, will it pass VerifyGPUCode check?
There was a problem hiding this comment.
Haven't tested but yeah, it seems VerifyGPUCode checks the static alloc size against max_shared_memory_per_block, which would fail if dyn_shmem_size >= (48 << 10)
tvm/src/tir/analysis/verify_gpu_code.cc
Lines 70 to 71 in 534205b
There was a problem hiding this comment.
Can we defer this issue later? I need this to demonstrate that a multi-stage pipeline with depth > 2 works on a semi-realistic cuda schedule.
There was a problem hiding this comment.
Yeah let's defer this particular issue
| fcache_[device_id] = m_->GetFunc(device_id, func_name_); | ||
| if (wl.dyn_shmem_size >= (48 << 10)) { | ||
| // Assumption: dyn_shmem_size doesn't change across different invocations of | ||
| // fcache_[device_id] |
There was a problem hiding this comment.
This assumption could be controversial, but this should be mostly ok in practice. To support a kernel which uses different big shmem sizes depending on input, we need to call cuFuncSetAttribute on every invocation.
Currently, we have functioning dynamic shared memory support on cuda. But we haven't actually explored allocating more than 48KB of dynamic shmem.
This PR updates the cuda runtime to support launching a kernel which wants to use dyn shmem of size > 48KB. This is already useful for manually rewritten schedules, but to integrate this feature into tuning requires more work (see the discussion on
VerifyGPUCodebelow).I'll add a test which actually uses a big dyn shmem in the next PR (need to fix one bug in software pipelining transform).
Reference in cutlass code:
https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/gemm/device/gemm.h#L479-L482
@vinx13 @junrushao1994 @tqchen @yzh119 @Hzfengsy