[Bug][VM] Performance degradation of non_max_suppression layer on VM

`non_max_suppression` works much faster on GraphExecutor in comparison with VirtualMachine.

### Expected behavior
I suppose that the performance should be the same for VM and GE.

### Actual behavior
On my CPU (Intel Core i7-7700K) `non_max_suppression` works 3 times slower on VM (1066.29 ms) vs GE (359.79 ms).
Tried to analyze this problem by using VTune Amplifier. And saw that about 70% of the execution time some work was done in `lib.so` (the name of the compiled model).
![image](https://github.com/apache/tvm/assets/5525113/b919c6ab-2ea2-4a89-aa9f-08697a11a491)

In GE we don't have such overhead.

### Environment
Linux OS, latest mainline.

### Steps to reproduce
You can use the following script to reproduce this problem. I changed its extension to `.txt` because `.py` file cannot be uploaded to GitHub.
[reproducer.txt](https://github.com/apache/tvm/files/12162179/reproducer.txt)

On the top of the source code, you can change the value of variable `USE_VM` to specify if the layer should be inferred on VM or on GE.

### Triage

Please refer to the list of label tags [here](https://github.com/apache/tvm/wiki/Issue-Triage-Labels) to find the relevant tags and add them below in a bullet format (example below).

* flow:vm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug][VM] Performance degradation of non_max_suppression layer on VM #15405

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug][VM] Performance degradation of non_max_suppression layer on VM #15405

Description

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions