Skip to content

Add a conversion of individual operations in FQ2I pass.#10239

Merged
AndrewZhaoLuo merged 3 commits into
apache:mainfrom
Deelvin:avoronov/update_fq2i_individual_ops
Feb 17, 2022
Merged

Add a conversion of individual operations in FQ2I pass.#10239
AndrewZhaoLuo merged 3 commits into
apache:mainfrom
Deelvin:avoronov/update_fq2i_individual_ops

Conversation

@Icemist

@Icemist Icemist commented Feb 14, 2022

Copy link
Copy Markdown
Contributor

This conversion happens after the basic part of FQ2I pass. The main idea of the conversion is to find and transform operations with dequantized inputs one by one individually. Only operations from the allowed list are allowed.

For example, if on the above general pattern op2 is not registered with the FTVMFakeQuantizationToInteger attribute, op1 operation can still be converted. Converted pattern below:

   x    w       x   w
   |    |       |   |
   dq   dq      \   /
    \   /        op1
     op1          |
      |     =>   dq
     op2          |
      |          op2
      |           |
      q           q

Measured accuracy on a BERT model bert_large_v1_1_fake_quant.onnx:

Model type Accuracy
without old fq2i {"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with old fq2i {"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with new fq2i {"exact_match": 82.73415326395458, "f1": 90.17821060674615}

@masahi

masahi commented Feb 14, 2022

Copy link
Copy Markdown
Member

Can you show some accuracy results on QAT models (BERT etc)?

@masahi

masahi commented Feb 14, 2022

Copy link
Copy Markdown
Member

Please also show the reference result (accuracy without fq2i).

@Icemist

Icemist commented Feb 15, 2022

Copy link
Copy Markdown
Contributor Author

Please also show the reference result (accuracy without fq2i).

Model type Accuracy
without old fq2i {"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with old fq2i {"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with new fq2i {"exact_match": 82.73415326395458, "f1": 90.17821060674615}

@mbrookhart mbrookhart left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall very nice, only comments are around improved documentation and an optional flag.

@masahi and @elvin-n and I talked a couple of months ago and agreed that the QAT version of the pass (which you're adding here) should probably be a separate pass from the explicit fake quantized model ala tflite and TensorRT. I go back and forth on what the right final form is (I've found edge cases for both versions), but I'd love some improved comments on why this particular design

const bool hard_fail_;
};

bool is_op_enabled_for_optional_fq2i(const CallNode* call_node) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some comments and advice about what ops should be included in this list?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment of how I selected these operations.

Comment on lines 518 to 538
Expr FakeQuantizationToInteger(const Expr& expr, const IRModule& mod, bool hard_fail) {
return FakeQuantizationRewriter(hard_fail).Mutate(expr);
auto fq_expr = FakeQuantizationRewriter(hard_fail).Mutate(expr);
auto fq_inferred_expr = tvm::relay::InferType(fq_expr);
auto ofq_expr = OptionalFakeQuantizationRewriter(hard_fail).Mutate(fq_inferred_expr);
return ofq_expr;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how problematic this will be in non-QAT models, but would it make sense to add another bool to make the "Optional" part of the pass actually optional?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add parameter enableQAT to the FakeQuantizationToInteger pass with default value not to call QAT transformation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the parameter, but called it use_qat. I'm not sure if CamelCase fits here, and I've seen qat in other libraries shortened to lower case.

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to read pass a little more carefully to understand difference between this and existing fq2i passes

Comment thread src/relay/transforms/fake_quantization_to_integer.cc Outdated
Comment thread include/tvm/relay/qnn/op/dequantize.h Outdated
Comment thread src/relay/transforms/type_infer.cc
Comment thread tests/python/relay/test_pass_fake_quantization_to_integer.py Outdated
Comment thread tests/python/relay/test_pass_fake_quantization_to_integer.py Outdated
Comment thread tests/python/relay/test_pass_fake_quantization_to_integer.py Outdated
Comment thread tests/python/relay/test_pass_fake_quantization_to_integer.py Outdated
@masahi

masahi commented Feb 15, 2022

Copy link
Copy Markdown
Member

Need to read pass a little more carefully to understand difference between this and existing fq2i passes

The key is that in the models that existing pass expects, there is always matching dq and q. This PR handles cases such assumption doesnt hold.

@AndrewZhaoLuo

Copy link
Copy Markdown
Contributor

That is understood, my main concern is a lot of code looks similar and I'm wondering if this "optional" pass can supersede the original pass.

@elvin-n

elvin-n commented Feb 16, 2022

Copy link
Copy Markdown
Contributor

That is understood, my main concern is a lot of code looks similar

Ideologically transformations look similar, but it is unable to extend current transformation exactly due to its expectation of DQ->subgraph->Q pattern. And it is not a lot of duplication - all op conversion defined through FTVMFakeQuantizationToInteger mechanism is reused, that is great

I'm wondering if this "optional" pass can supersede the original pass.

Interesting question. Probably it can work in that way as well, but should not be so important.

@elvin-n

elvin-n commented Feb 16, 2022

Copy link
Copy Markdown
Contributor

@mbrookhart

agreed that the QAT version of the pass (which you're adding here) should probably be a separate pass from the explicit fake quantized model ala tflite and TensorRT

As I see we did exactly what we agreed - the transformation itself is independent from the current one in opposite to the previous my PR (FakeQuantizationRewriter and OptionalFakeQuantizationRewriter). Then we agreed, if I am not mistaken, to have the only pass to simplify user's life. User will be aware about only one function - FakeQuantizeToInteger that he have to call in his python code. On the other hand inside the pass, there will be two transformations - current one and new one. If we want to have QAT transformation optional, ok, let's add parameter to the FakeQuantizeToInteger with default value not to call QAT

@Icemist Icemist force-pushed the avoronov/update_fq2i_individual_ops branch 3 times, most recently from a92696c to 469888c Compare February 17, 2022 00:45
@Icemist Icemist force-pushed the avoronov/update_fq2i_individual_ops branch from 469888c to aa378d2 Compare February 17, 2022 00:58
@Icemist

Icemist commented Feb 17, 2022

Copy link
Copy Markdown
Contributor Author

That is understood, my main concern is a lot of code looks similar and I'm wondering if this "optional" pass can supersede the original pass.

I guess this is a slightly different approach. Not all operations are suitable for the new pass.
Some operations like "add" require an explicit out_scale, which in the existing pass is taken from the quantization operation.
This is possible for the existing pass because it goes from the end to the beginning of the selected subgraph.
In the new pass we go from the beginning to the end. Starting from the dequantization operations and have no access to the quantization operation info at the moments of converting. Or maybe even stop halfway before we get the info.

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one more small q

Comment thread src/relay/transforms/fake_quantization_to_integer.cc Outdated
@AndrewZhaoLuo AndrewZhaoLuo merged commit 7f24954 into apache:main Feb 17, 2022
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
* Add a conversion of individual operations in FQ2I pass.

* apply review comments

* apply review comments 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants