Skip to content

[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore#7831

Merged
ZihengJiang merged 14 commits into
apache:mainfrom
hypercubestart:tc-fix
May 1, 2021
Merged

[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore#7831
ZihengJiang merged 14 commits into
apache:mainfrom
hypercubestart:tc-fix

Conversation

@hypercubestart

@hypercubestart hypercubestart commented Apr 12, 2021

Copy link
Copy Markdown
Contributor

adds support for int4 in AutoTVM and fixes bugs, done with @ZihengJiang

The current schedule for conv2d hwnc tensorcore is unsearchable by AutoTVM because of the error ValueError: could not broadcast input array from shape (1850) into shape (1748) due to feature length mismatch between different instantiated templates

Narrowing the search space fixes the problem, and we ran a few experiments over different schedule fixes on T4:

Workload (batch_size, in_channels, in_size, out_channels, kernel_size, stride, padding) HWNC int4 time (#6121) AS-ko, WS-kw AS-ki, WS-kw AS-kh, WS-ko AS-ko, WS-kh
(8, 64, 56, 64, 3, 1, 1) 0.1723 0.17988 0.19138 0.18075 0.18399
(8, 64, 56, 128, 3, 2, 1) 0.10278 0.10783 0.11104 0.13839 0.10446
(8, 64, 56, 64, 1, 2, 0) 0.0333 0.0187 0.01997 0.01933 0.0183
(8, 128, 28, 128, 3, 1, 1) 0.15088 0.1784 0.2296 0.21108 0.20623
(8, 128, 28, 256, 3, 2, 1) 0.11548 0.11616 0.1305 0.12947 0.15934
(8, 128, 28, 256, 1, 2, 0) 0.04219 0.02374 0.02575 0.02332 0.0223
(8, 256, 14, 256, 3, 1, 1) 0.05695 0.21981 0.24931 0.24194 0.27055
(8, 256, 14, 512, 3, 2, 1) 0.14456 0.14939 0.15589 0.14812 0.20209
(8, 256, 14, 512, 1, 2, 0) 0.0475 0.02659 0.02778 0.02531 0.02641
(8, 512, 7, 512, 3, 1, 1) 0.147156 0.245 0.27005 0.25863 0.25255

the results for int4 HWNC in #6121 are not reproducible in AutoTVM because of the feature length mismatch

cc: @Laurawly @Hzfengsy @anijain2305 @tqchen @masahi

Comment thread python/tvm/topi/cuda/conv2d_hwnc_tensorcore.py
Comment thread src/runtime/contrib/random/mt_random_engine.cc Outdated

@Hzfengsy Hzfengsy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LTGM. But would you like to show some performance results for int4?

@hypercubestart

hypercubestart commented Apr 14, 2021

Copy link
Copy Markdown
Contributor Author

The code LTGM. But would you like to show some performance results for int4?

yes, I'm testing some combinations of the removed knobs and will show perf results once the parity reaches the results from #6121

@ZihengJiang

Copy link
Copy Markdown
Contributor

LGTM. Thanks @hypercubestart !

@ZihengJiang ZihengJiang merged commit dc1f189 into apache:main May 1, 2021
@hypercubestart hypercubestart deleted the tc-fix branch May 2, 2021 00:21
umangyadav pushed a commit to umangyadav/tvm that referenced this pull request May 5, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
@jlimmm

jlimmm commented Sep 8, 2021

Copy link
Copy Markdown

@hypercubestart Hello, I'd like to reproduce your table on T4+int4, but it seems that the current AutoTVM tutorial code (link) cannot utilize the int4+tensorcore template. It would be appreciated if you could share a sample code. Thank you.

@hypercubestart

hypercubestart commented Sep 8, 2021

Copy link
Copy Markdown
Contributor Author

@hypercubestart Hello, I'd like to reproduce your table on T4+int4, but it seems that the current AutoTVM tutorial code (link) cannot utilize the int4+tensorcore template. It would be appreciated if you could share a sample code. Thank you.

hi @jlimmm! Unfortunately I don't have the code anymore but the PR has an example of creating a network consisting of a single int4 conv2d

def get_mod():
x = relay.var("x", relay.TensorType(input_shape, "float32"))
y = relay.var("y", relay.TensorType(kernel_shape, "float32"))
f = relay.Function(
[x, y], relay.nn.conv2d(x, y, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3])
)
mod = tvm.IRModule()
mod["main"] = f
mod = relay.transform.InferType()(mod)
return mod, {}
mod, params = get_mod()
layout_config = relay.transform.LayoutConfig()
desired_layouts = {"nn.conv2d": ["HWNC", "default"]}
with layout_config:
seq = tvm.transform.Sequential([relay.transform.ConvertLayout(desired_layouts)])
with tvm.transform.PassContext(opt_level=3):
mod = seq(mod)
mod = relay.transform.recast(mod, "int4", "int32")
using the utilities from #6748, so you could reuse most of the AutoTVM tutorial code, but simply replace the network with the network shown above

AutoTVM will then be able to automatically infer the int4+tensorcore template

@liubowen520

Copy link
Copy Markdown

Hi @hypercubestart , Great work! I'm doing with 4bit in TVM now. But I found there are two points can be improved in asnumpy.

  1. Asnumpy don't support the conversion of negative numbers.
  2. Asnumpy loss a 4 bit data when shape is odd.
    Am I right? I have modified these parts locally. Could you review it?

@hypercubestart

Copy link
Copy Markdown
Contributor Author

Hi @hypercubestart , Great work! I'm doing with 4bit in TVM now. But I found there are two points can be improved in asnumpy.

  1. Asnumpy don't support the conversion of negative numbers.
  2. Asnumpy loss a 4 bit data when shape is odd.
    Am I right? I have modified these parts locally. Could you review it?

@liubowen520 good points! makes sense to me, feel free to create a PR and cc me and some other people to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants