全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Optimized CUDA Implementation to Improve the Performance of Bundle Adjustment Algorithm on GPUs

DOI: 10.4236/jsea.2024.174010, PP. 172-201

Keywords: Scene Reconstruction, Bundle Adjustment, Levenberg-Marquardt, Non-Linear Least Squares, Memory Throughput, Computational Throughput, Contiguous Memory Access, CUDA Optimization

Full-Text   Cite this paper   Add to My Lib

Abstract:

The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its performance by implementing the algorithm on GPUs. In the previous research work, “Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm using GPUs,” the authors demonstrated first the Bundle Adjustment algorithmic performance improvement by reducing the mean square error using an additional radial distorting parameter and explicitly computed analytical derivatives and reducing the computational burden of the Bundle Adjustment algorithm using GPUs. The naïve implementation of the CUDA code, a speedup of 10× for the largest dataset of 13,678 cameras, 4,455,747 points, and 28,975,571 projections was achieved. In this paper, we present the optimization of the Bundle Adjustment algorithm CUDA code on GPUs to achieve higher speedup. We propose a new data memory layout for the parameters in the Bundle Adjustment algorithm, resulting in contiguous memory access. We demonstrate that it improves the memory throughput on the GPUs, thereby improving the overall performance. We also demonstrate an increase in the computational throughput of the algorithm by optimizing the CUDA kernels to utilize the GPU resources effectively. A comparative performance study of explicitly computing an algorithm parameter versus using the Jacobians instead is presented. In the previous work, the Bundle Adjustment algorithm failed to converge for certain datasets due to several block matrices of the cameras in the augmented normal equation, resulting in rank-deficient matrices. In this work, we identify the cameras that cause rank-deficient matrices and preprocess the datasets to ensure the convergence of the BA algorithm. Our optimized CUDA implementation achieves convergence of the Bundle Adjustment algorithm in around 22 seconds for the largest dataset compared to 654 seconds for the sequential implementation, resulting in a speedup of 30×. Our optimized CUDA implementation presented in this paper has achieved a 3× speedup for the largest dataset compared to the previous naïve CUDA implementation.

References

[1]  Kommera, P.R., Muknahallipatna, S.S. and McInroy, J.E. (2023) Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm Using GPUs. Engineering, 15, 663-690.
https://doi.org/10.4236/eng.2023.1510046
[2]  Agarwal, S., Snavely, N., Seitz, S.M. and Szeliski, R. (2010) Bundle Adjustment in the Large. In European Conference on Computer Vision, Springer, Berlin, Heidelberg, 29-42.
https://doi.org/10.1007/978-3-642-15552-9_3
[3]  Choudhary, S., Gupta, S. and Narayanan, P.J. (2010) Practical Time Bundle Adjustment for 3d Reconstruction on the GPU. In European Conference on Computer Vision, Springer, Berlin, Heidelberg, 423-435.
https://doi.org/10.1007/978-3-642-35740-4_33
[4]  Lourakis, M.I.A. and Argyros, A.A. (2009) SBA: A Software Package for Generic Sparse Bundle Adjustment. ACM Transactions on Mathematical Software (TOMS), 36, 2.
https://doi.org/10.1145/1486525.1486527
[5]  Tomov, S., Dongarra, J., Volkov, V. and Demmel, J. (2009) MAGMA Library. University of Tennessee and University of California, Knoxville, TN, and Berkeley, CA.
https://icl.utk.edu/magma/
[6]  Agarwal, S. and Mierle, K. (2012) Ceres Solver.
http://ceres-solver.org/
[7]  https://developer.apple.com/documentation/accelerate
[8]  https://eigen.tuxfamily.org/dox/group__TopicSparseSystems.html
[9]  Byröd, M. and Åström, K. (2009) Bundle Adjustment Using Conjugate Gradients with Multiscale Preconditioning. 7-10 September 2009, British Machine Vision Conference, BMVC 2009, London, 1-10.
[10]  Byröd, M. and Åström, K. (2010) Conjugate Gradient Bundle Adjustment. European Conference on Computer Vision, Springer, Berlin, Heidelberg, 114-127.
https://doi.org/10.1007/978-3-642-15552-9_9
[11]  Wu, C.C., Sameer, A., Brian, C. and Seitz, S.M. (2011) Multicore Bundle Adjustment. In Computer Vision and Pattern Recognition (CVPR), 20-25 June 2011, Colorado Springs, 3057-3064.
https://doi.org/10.1109/CVPR.2011.5995552
[12]  Zheng, M.T. Zhou, S.P., Xiong, X.D. and Zhu, J.F. (2017) A New GPU Bundle Adjustment Method for Large-Scale Data. Photogrammetric Engineering & Remote Sensing, 83, 633-641.
https://doi.org/10.14358/PERS.83.9.633
[13]  https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF__DEPRECATED.html
[14]  MathWorks, I. (2022) Symbolic Math Toolbox. Massachusetts.
https://www.mathworks.com/help/symbolic/
[15]  https://technical.city/en/video/A30-PCIe-vs-H100-PCIe
[16]  https://www.nvidia.com/en-us/data-center/h100/
[17]  https://docs.nvidia.com/cuda/cublas/index.html
[18]  Blelloch, G.E. (1990) Prefix Sums and Their Applications.
[19]  https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413