[ONNX] Collect quant params of pre-quantized ONNX and generate qnn op by huochaitiantang · Pull Request #7937 · apache/tvm

huochaitiantang · 2021-04-28T03:43:09Z

For the import of pre-quantized ONNX models, nodes (like Conv, Gemm, Add, Mul, Sub) between QuantizeLinear and DequantizeLinear should be quantized.

This PR is summarized as follows:
1, Collect quantize params for nodes to be quantized, which locate between QuantizeLinear and DequantizeLinear.
2, Generate corresponding qnn ops (like qnn.conv2d, qnn.dense, qnn.add, qnn.mul, qnn.subtract) for nodes that can be quantized.

mbrookhart · 2021-04-28T16:30:52Z

This is awesome, thanks for the submission!

It looks like the ONNX and ORT we have in CI might be too out of date to support this? Could you point me at the documentation for this behavior? I wasn't aware ONNX did this, I thought they primarily used ops like ConvInteger? I'm hoping to add support for importing those ops in the coming weeks.

Is there a more recent version of ORT that supports this? If so, could you add the test line and comment it out with a TODO for renabling once we update to some minimum ORT? I'm hoping to update ORT and ONNX versions in CI in the next month or two.

Thanks!

huochaitiantang · 2021-04-29T09:45:12Z

@mbrookhart Thanks for your comment!
I have checked the supported input types of operators in ORT:
https://github.com/microsoft/onnxruntime/blob/master/docs/OperatorKernels.md

Conv: tensor(float)
Gemm: tensor(double), tensor(float)
Add, Mul, Sub: tensor(double), tensor(float), tensor(int32), tensor(int64)

It shows that the above operators cannot accept int8 tensor in ORT, so these operators between QuantizeLinear and DequantizeLinear cannot run in ORT, even if they should be quantized.

My question is that, even if ORT cannot run these models successfully, should TVM support the import of them and generate correct qnn ops? It will determine whether this PR is necessary.

In addition, ORT quantization operators like ConvInteger and QLinearConv can be run successfully with uint8 tensors as input. It may be natural to import these operators later.

mbrookhart · 2021-04-29T14:39:31Z

Hi @huochaitiantang ,

That all makes sense, I'm really just wondering if we've ever seen an example of this in the wild, i.e. q->conv->dq instead of q->convinteger->dq. Do you have an example model defined this way somewhere? I mostly just want to make sure we don't implement a feature that's out of spec.

Thanks,
Matthew

jwfromm · 2021-04-29T16:26:52Z

Yeah i just wanted to echo @mbrookhart that it would be great to include a test that operates on a real quantized model. Ideally you could start at a prequantized tf or pytorch model, export it to onnx, then import that using the changes in this PR.

huochaitiantang · 2021-05-08T07:59:15Z

Hi, @mbrookhart @jwfromm Thanks for your advice.

We have tried to export a real pre-quantized ONNX model from popular frameworks. But it seems difficult.

The pytorch fails to export quantized model to ONNX because of the error: AttributeError: 'torch.dtype' object has no attribute 'detach'. Please see https://discuss.pytorch.org/t/onnx-export-failed-int8-model/59754/17
The pre-quantized ONNX model exported by tflite is weird, which is not the real quantized model.
The pre-quantized ONNX model by onnxruntime looks good, it contains QuantizeLinear -> QLinearConv -> DequantizeLinear, so we submit a new PR to support the QLinearConv: [ONNX] QLinearConv Support #8007.

The pattern QuantizeLinear -> Conv -> DequantizeLinear may not appear in pre-quantized ONNX models exported by pytorch, tflite, or onnxruntime. So we can close this PR.

mbrookhart · 2021-05-10T17:53:19Z

Sounds good. We found the "fake quantization" in the tflite exports a couple of weeks ago, I'm currently working on a pass to convert that into QNN after import. I'll close this, but thanks for the QLinearConv PR!

Collect quant params of pre-quantized ONNX and generate qnn op

9a88ea9

jwfromm requested review from jwfromm and mbrookhart April 28, 2021 15:57

mbrookhart closed this May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ONNX] Collect quant params of pre-quantized ONNX and generate qnn op#7937

[ONNX] Collect quant params of pre-quantized ONNX and generate qnn op#7937
huochaitiantang wants to merge 1 commit into
apache:mainfrom
huochaitiantang:onnx_prequantize

huochaitiantang commented Apr 28, 2021

Uh oh!

mbrookhart commented Apr 28, 2021

Uh oh!

huochaitiantang commented Apr 29, 2021

Uh oh!

mbrookhart commented Apr 29, 2021

Uh oh!

jwfromm commented Apr 29, 2021

Uh oh!

huochaitiantang commented May 8, 2021

Uh oh!

mbrookhart commented May 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

huochaitiantang commented Apr 28, 2021

Uh oh!

mbrookhart commented Apr 28, 2021

Uh oh!

huochaitiantang commented Apr 29, 2021

Uh oh!

mbrookhart commented Apr 29, 2021

Uh oh!

jwfromm commented Apr 29, 2021

Uh oh!

huochaitiantang commented May 8, 2021

Uh oh!

mbrookhart commented May 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants