-
Notifications
You must be signed in to change notification settings - Fork 5
Home
This repository holds the CUDA source code implementing the algorithm described in the paper "Work-Efficient Parallel Non-Maximum Suppression Kernels". The proposed NMS CUDA kernels are designed for GPU-only video processing pipelines, and should be executed after the inference of a convolutional neural network returning the coordinates of localized objects.
Edit the Makefile and update the SM_ARCH and GPU_ARCH variables with the architecture matching your NVIDIA GPU platform. By default, the Makefile targets an NVIDIA Tesla T4 (sm_75 / compute_75), but it has been extensively tested on other platforms such as an NVIDIA Jetson TK1 (sm_32 / compute_32), TX1 (sm_53 / compute_53), TX2 (sm_62 / compute_62), and an NVIDIA GeForce GTX1060 (sm_61 / compute_61).
After having updated the Makefile, you can now compile and execute a test application that runs the NMS kernels on the GPU to merge the candidate windows of all detected objects.
$ make
nvcc -o nms.o -c nms.cu -O3 -gencode=arch=compute_75,code=sm_75
gcc -I/usr/local/cuda/include -onmstest nmstest.o nms.o -L/usr/local/cuda/lib64 -lcudart -lcuda
$ ./nmstest detections.txt output.txt
CUDA Runtime Version 11000
Device 0# Tesla T4 [1.59 GHz - 40 Multiprocessors - Core sm_75 - 15109 MB]
Device 0# has been selected for CUDA computation
Detections read from input file (detections.txt): 2997
NMS-MAP elapsed time: 0.653 ms
NMS-REDUCE elapsed time: 0.139 ms
Detections after NMS: 145
Finally, execute the drawrectangles script to generate a PNG file (oscarsdets.png) containing the merged candidate windows:
$ ./drawrectangles output.txt
$ eog oscarsdets.png

Q: The test application execution displays the message CUDA Error: no kernel image is available for execution on the device. What am I doing wrong?
A: This issue arises when the CUDA kernels have been compiled for an architecture that do not match the GPU in which the test application is executed. Please, update the SM_ARCH and GPU_ARCH variables in the Makefile with the sm_XX and compute_XX architecture matching your GPU platform. Finally, recompile the code and execute again the test application.