When the microTVM team was working on our submission for MLPerf Tiny, we noticed the kws model had slightly worse accuracy than expected (we classified 897/1000 samples correctly, while folks are expected to classify at least 900/1000 correctly).
The bug has been remarkably hard to track down. We have a bunch of end-to-end operator tests, and our results match exactly with those in other parts of TVM. I originally thought this was a padding issue, but that didn't pan out.
It turns out the issue is caused by a bugged optimization for re-quantization. This issue only occurs in quantization (which is why our tests didn't catch it - those don't include quantization steps), and only on the edges of padded tensors (which explains why it hurt accuracy slightly, but not a ton).
I've written up an explanation of the issue below.
cc @alanmacd @areusch @gromero @mehrdadh @mkatanbaf
When the microTVM team was working on our submission for MLPerf Tiny, we noticed the
kwsmodel had slightly worse accuracy than expected (we classified897/1000samples correctly, while folks are expected to classify at least900/1000correctly).The bug has been remarkably hard to track down. We have a bunch of end-to-end operator tests, and our results match exactly with those in other parts of TVM. I originally thought this was a padding issue, but that didn't pan out.
It turns out the issue is caused by a bugged optimization for re-quantization. This issue only occurs in quantization (which is why our tests didn't catch it - those don't include quantization steps), and only on the edges of padded tensors (which explains why it hurt accuracy slightly, but not a ton).
I've written up an explanation of the issue below.
cc @alanmacd @areusch @gromero @mehrdadh @mkatanbaf