This repository is forked from the original serial version.
We improve a standalone serial SIFT implementation with parallel techniques, including SIMD, OpenMP, and CUDA.We focus on 1) comparing the different parallel characteristics between CPU and GPU, 2) exploring I/O optimization for CUDA kernel function, and 3) discussing the parallel trade-off between power and latency based on an embedded platform. The experiments show that on NVIDIA TX2, GPU is more power-efficient than CPU with a suitable setting. As a result, our fastest version achieves 3X speedup when processing a 1920 x 1080 image. The report can be found here.
Power and latency comparison between CPU-only and heterogeneous computingTo reproduce our result, try the examples below.
- Cmake: 3.9
- GCC: 8.4.0
- NVCC: 10.2
Follow the following instructions:
cd platforms/desktop/
mkdir build
cd build
cmake ..
makeThen you can find the built binary under build/bin directory.
Remember to rebuild after each checkout!!
git checkout c6bc9b7
./image_match img1.pgm img2.pgm 4 # 4 for 4 threadsgit checkout 1a5e01e
./image_match img1.pgm img2.pgm 4 # 4 for 4 threadsgit checkout ddff0ab
./image_match img1.pgm img2.pgmCopyright 2013 Guohui Wang
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

