[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore#7831
Conversation
Hzfengsy
left a comment
There was a problem hiding this comment.
The code LTGM. But would you like to show some performance results for int4?
yes, I'm testing some combinations of the removed knobs and will show perf results once the parity reaches the results from #6121 |
|
LGTM. Thanks @hypercubestart ! |
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
|
@hypercubestart Hello, I'd like to reproduce your table on T4+int4, but it seems that the current AutoTVM tutorial code (link) cannot utilize the int4+tensorcore template. It would be appreciated if you could share a sample code. Thank you. |
hi @jlimmm! Unfortunately I don't have the code anymore but the PR has an example of creating a network consisting of a single int4 conv2d tvm/tests/python/topi/python/test_topi_conv2d_hwnc_tensorcore.py Lines 149 to 167 in f8b1df4 AutoTVM will then be able to automatically infer the int4+tensorcore template |
|
Hi @hypercubestart , Great work! I'm doing with 4bit in TVM now. But I found there are two points can be improved in asnumpy.
|
@liubowen520 good points! makes sense to me, feel free to create a PR and cc me and some other people to review |
adds support for int4 in AutoTVM and fixes bugs, done with @ZihengJiang
The current schedule for conv2d hwnc tensorcore is unsearchable by AutoTVM because of the error
ValueError: could not broadcast input array from shape (1850) into shape (1748)due to feature length mismatch between different instantiated templatesNarrowing the search space fixes the problem, and we ran a few experiments over different schedule fixes on T4:
the results for int4 HWNC in #6121 are not reproducible in AutoTVM because of the feature length mismatch
cc: @Laurawly @Hzfengsy @anijain2305 @tqchen @masahi