[tosa] : Add option to enable/disable patterns selectively.#4485
Conversation
|
@sahas3 I'm curious as to whether https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/Tosa/Transforms/TosaReduceTransposes.cpp can help you here. The original PR lists significant performance gains from this pass, since further optimized by @Hanumanth04 in llvm/llvm-project#148755 . |
|
Hi @sjarus, I think it's not possible to optimize this particular scenario in TOSA. Running the full Torch->TOSA pipeline produces: Running the One possibility is to optimize |
Yeah, your fundamental problem is that the source and target dialects - |
Yes, exactly. This is only a problem for small models targeting small hardware where we want to minimize buffers needed as much as possible. We may have to write some optimizations at the linalg level in the long term but the change in this PR enables us to target such small hardware without many pattern specific |
|
This is also hardware dependent. Not all hardware can handle both NHWC and NCHW. Some of them make a call either way. The TOSA baking in of NHWC was such a call. The idea was that if underlying hardware was NCHW, the backend ought to identify the transpose->conv/pool pair and swap dims. Of course, in future the hard restriction on NHWC may go away, but that's not currently the case. |
Consider the source IR:
When lowered through TOSA path we get
When lowered through linalg path we get:
Because of layout mismatch between PyTorch (NCHW) and TOSA (NHWC), there will be two additional transpose operations in the TOSA path. This requires two additional buffers which leads to a problem for resource-constrained embedded HW which don't have enough memory.
This change adds an option to selectively enable/disable legalizations through the TOSA path, so that for the
tosa_linalgpath we can choose to not lower some ops (depending on the target HW) through TOSA and instead let it lower through the linalg path that runs after TOSA path.