Skip to content
hertasecurity edited this page Aug 21, 2020 · 3 revisions

GPU-NMS Benchmarking Framework

This repository holds the CUDA source code implementing the algorithm described in the paper "Work-Efficient Parallel Non-Maximum Suppression Kernels". The proposed NMS CUDA kernels are designed for GPU-only video processing pipelines, and should be executed after the inference of a convolutional neural network returning the coordinates of localized objects.

Build Instructions

Edit the Makefile and update the SM_ARCH and GPU_ARCH variables with the architecture matching your NVIDIA GPU platform. By default, the Makefile targets an NVIDIA Tesla T4 (sm_75 / compute_75), but it has been extensively tested on other platforms such as an NVIDIA Jetson TK1 (sm_32 / compute_32), TX1 (sm_53 / compute_53), TX2 (sm_62 / compute_62), and an NVIDIA GeForce GTX1060 (sm_61 / compute_61).

After having updated the Makefile, you can now compile and execute a test application that runs the NMS kernels on the GPU to merge the candidate windows of all detected objects.

$ make

nvcc -o nms.o -c nms.cu -O3 -gencode=arch=compute_75,code=sm_75
gcc -I/usr/local/cuda/include -onmstest nmstest.o nms.o -L/usr/local/cuda/lib64 -lcudart -lcuda

$ ./nmstest detections.txt output.txt

CUDA Runtime Version 11000
Device 0# Tesla T4	 [1.59 GHz - 40 Multiprocessors - Core sm_75 - 15109 MB]
Device 0# has been selected for CUDA computation

Detections read from input file (detections.txt): 2997
NMS-MAP elapsed time: 0.653 ms
NMS-REDUCE elapsed time: 0.139 ms
Detections after NMS: 145

Finally, execute the drawrectangles script to generate a PNG file (oscarsdets.png) containing the merged candidate windows:

$ ./drawrectangles output.txt

$ eog oscarsdets.png

Troubleshooting / FAQ

Q: The test application execution displays the message CUDA Error: no kernel image is available for execution on the device. What am I doing wrong?

A: This issue arises when the CUDA kernels have been compiled for an architecture that do not match the GPU in which the test application is executed. Please, update the SM_ARCH and GPU_ARCH variables in the Makefile with the sm_XX and compute_XX architecture matching your GPU platform. Finally, recompile the code and execute again the test application.

Clone this wiki locally