[microNPU][2c] Add performance modelling to cascader#9778
Merged
Conversation
Contributor
Author
77a5b22 to
366aa80
Compare
1124baf to
35d9164
Compare
12 tasks
35d9164 to
0fa664d
Compare
6366031 to
426d0ae
Compare
426d0ae to
c4a4b5a
Compare
mbaret
reviewed
Jan 12, 2022
Comment on lines
71
to
121
| for (size_t i = 0; i < block_shape.size(); i++) { | ||
| if (!is_rolling) { | ||
| num_blocks *= output_stripe_config->GetShape()[i] * output_stripe_config->GetStripes()[i] / | ||
| if (buffer_mode == BufferMode::RECOMPUTE) { | ||
| num_blocks *= static_cast<float>(output_stripe_config->GetShape()[i] * | ||
| output_stripe_config->GetStripes()[i]) / | ||
| block_shape[i]; | ||
| } else { | ||
| num_blocks *= output_stripe_config->GetExtent()[i] / block_shape[i]; | ||
| num_blocks *= static_cast<float>(output_stripe_config->GetExtent()[i]) / block_shape[i]; | ||
| } | ||
| } |
Contributor
There was a problem hiding this comment.
Just to mention that this logic is placeholder and will be replaced in a later patch.
Contributor
|
Just a quick note on the test coverage of this feature. The results of the performance model are not explicitly tested against the FVP because we don’t have performance instrumentation available in CI. We will however be testing this component downstream where such instrumentation is available. |
* Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig.
be05493 to
e1daf76
Compare
mbaret
approved these changes
Jan 17, 2022
Contributor
|
cc @manupa-arm could you take a look and merge if everything's OK? Thanks |
Contributor
|
Thanks! @jacobbohlin @mbaret |
yuanfz98
pushed a commit
to yuanfz98/tvm
that referenced
this pull request
Jan 24, 2022
* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing
crazydemo
pushed a commit
to crazydemo/tvm
that referenced
this pull request
Jan 27, 2022
* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Feb 16, 2022
* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RFC: apache/tvm-rfcs#37
Issue: #9429
NOTE: This PR builds on top of #9469 and #9471 and therefore includes those changes. This PR will remain as 'draft' until both dependencies are merged.
The algorithm described in the RFC uses two metrics for pareto culling, performance and memory usage. This commit addresses the former and introduces the basis of performance estimation for the Parts. It also includes performance estimation code that is specific to ethosu_conv2d.
The output of the performance model is only meant to be consumed by the cascader.