non_max_suppression works much faster on GraphExecutor in comparison with VirtualMachine.
Expected behavior
I suppose that the performance should be the same for VM and GE.
Actual behavior
On my CPU (Intel Core i7-7700K) non_max_suppression works 3 times slower on VM (1066.29 ms) vs GE (359.79 ms).
Tried to analyze this problem by using VTune Amplifier. And saw that about 70% of the execution time some work was done in lib.so (the name of the compiled model).

In GE we don't have such overhead.
Environment
Linux OS, latest mainline.
Steps to reproduce
You can use the following script to reproduce this problem. I changed its extension to .txt because .py file cannot be uploaded to GitHub.
reproducer.txt
On the top of the source code, you can change the value of variable USE_VM to specify if the layer should be inferred on VM or on GE.
Triage
Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).
non_max_suppressionworks much faster on GraphExecutor in comparison with VirtualMachine.Expected behavior
I suppose that the performance should be the same for VM and GE.
Actual behavior
On my CPU (Intel Core i7-7700K)

non_max_suppressionworks 3 times slower on VM (1066.29 ms) vs GE (359.79 ms).Tried to analyze this problem by using VTune Amplifier. And saw that about 70% of the execution time some work was done in
lib.so(the name of the compiled model).In GE we don't have such overhead.
Environment
Linux OS, latest mainline.
Steps to reproduce
You can use the following script to reproduce this problem. I changed its extension to
.txtbecause.pyfile cannot be uploaded to GitHub.reproducer.txt
On the top of the source code, you can change the value of variable
USE_VMto specify if the layer should be inferred on VM or on GE.Triage
Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).