[CUDA] [Codegen] Ensuring atleast one thread block to handle empty tensor#7273
Conversation
779e437 to
8c20809
Compare
|
@kevinthesun @masahi @mbrookhart @zhiics @trevor-m Please review. |
|
hmm, I think I've already added a fix for such cases, here: tvm/python/tvm/tir/ir_builder.py Lines 205 to 206 in 82942fb Do you know why it is not working? cc @mbrookhart |
Is this because this the lines that you suggested are specific to IR Builder, while the failure that I see is for injective schedule? My failures was coming for an injective schedule. |
|
Yeah, I think this change catches it at a lower level. We might not need the ir_builder change after this. |
Topk was failing on CUDA when k is a var and its value is 0 at runtime. At closer inspection I found that there are 0 thread blocks at runtime. This PR ensures that there is atleast 1 thread block.