[Docs] Refactor BYOC example NPU tutorial #19439
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the 'Bring Your Own Codegen' (BYOC) tutorial and the example NPU backend. Key changes include adding a fused MatMul+ReLU pattern, updating the runtime dispatch logic to handle fused operations, and reordering dispatch checks to prevent incorrect substring matches (e.g., ensuring 'depthwise_conv2d' is checked before 'conv2d'). The tutorial is also expanded with execution examples and clearer explanations of the partitioning process. Feedback suggests improving consistency in the runtime by adding the 'is_fused' parameter to all convolution dispatch functions, even if fused patterns for them are not yet registered.
| } else if (op_name.find("depthwise") != std::string::npos) { | ||
| ExecuteDepthwiseConv2D(node, engine); |
There was a problem hiding this comment.
While ExecuteMatMul and ExecuteConv2D have been updated to accept the is_fused flag, ExecuteDepthwiseConv2D (and ExecuteConv1D below) still lack this parameter. Although no fused patterns for depthwise or 1D convolution are currently registered in patterns.py, adding the parameter here would improve consistency across the runtime's dispatch logic and make it more robust for future extensions.
This pr refactors the BYOC tutorial for the example NPU backend so the full pipeline (register → partition → codegen → VM execute) actually runs and visibly demonstrates fusion.
Also picks up several latent bugs in the example backend that the original tutorial was implicitly papering over.