全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A Visual Indoor Localization Method Based on Efficient Image Retrieval

DOI: 10.4236/jcc.2024.122004, PP. 47-66

Keywords: Visual Indoor Positioning, Feature Point Matching, Image Retrieval, Position Calculation, Five-Point Method

Full-Text   Cite this paper   Add to My Lib

Abstract:

The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor localization technologies generally used scene-specific 3D representations or were trained on specific datasets, making it challenging to balance accuracy and cost when applied to new scenes. Addressing this issue, this paper proposed a universal indoor visual localization method based on efficient image retrieval. Initially, a Multi-Layer Perceptron (MLP) was employed to aggregate features from intermediate layers of a convolutional neural network, obtaining a global representation of the image. This approach ensured accurate and rapid retrieval of reference images. Subsequently, a new mechanism using Random Sample Consensus (RANSAC) was designed to resolve relative pose ambiguity caused by the essential matrix decomposition based on the five-point method. Finally, the absolute pose of the queried user image was computed, thereby achieving indoor user pose estimation. The proposed indoor localization method was characterized by its simplicity, flexibility, and excellent cross-scene generalization. Experimental results demonstrated a positioning error of 0.09 m and 2.14° on the 7Scenes dataset, and 0.15 m and 6.37° on the 12Scenes dataset. These results convincingly illustrated the outstanding performance of the proposed indoor localization method.

References

[1]  Lim, C.H., Wan, Y., Ng, B.P. and See, C.M.S. (2007) A Real-Time Indoor Wifi Localization System Utilizing Smart Antennas. IEEE Transactions on Consumer Electronics, 53, 618-622.
https://doi.org/10.1109/TCE.2007.381737
[2]  Hazas, M. and Hopper, A. (2006) Broadband Ultrasonic Location Systems for Improved Indoor Positioning. IEEE Transactions on Mobile Computing, 5, 536-547.
https://doi.org/10.1109/TMC.2006.57
[3]  De Angelis, A., Dwivedi, S. and Händel, P. (2013) Characterization of a Flexible Uwb Sensor for Indoor Localization. IEEE Transactions on Instrumentation and Measurement, 62, 905-913.
https://doi.org/10.1109/TIM.2013.2243501
[4]  Subbu, K.P., Gozick, B. and Dantu, R. (2013) LocateMe: Magnetic-Fields-Based Indoor Localization Using Smartphones. ACM Transactions on Intelligent Systems and Technology, 4, 1-27.
https://doi.org/10.1145/2508037.2508054
[5]  Liu, L., Li, H. and Dai, Y. (2017) Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 2391-2400.
https://doi.org/10.1109/ICCV.2017.260
[6]  Perronnin, F. and Dance, C. (2007) Fisher Kernels on Visual Vocabularies for Image Categorization. 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, 17-22 June 2007, 1-8.
https://doi.org/10.1109/CVPR.2007.383266
[7]  Jégou, H., Douze, M., Schmid, C. and Pérez, P. (2010) Aggregating Local Descriptors into a Compact Image Representation. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 13-18 June 2010, 3304-3311.
https://doi.org/10.1109/CVPR.2010.5540039
[8]  Lowe, D.G. (2004) Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, 91-110.
https://doi.org/10.1023/B:VISI.0000029664.99615.94
[9]  Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T. and Sivic, J. (2017) NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 1437-1451.
https://doi.org/10.1109/TPAMI.2017.2711011
[10]  Babenko, A. and Lempitsky, V. (2015) Aggregating Local Deep Features for Image Retrieval. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 1269-1277.
[11]  Radenović, F., Tolias, G. and Chum, O. (2018) Fine-Tuning CNN Image Retrieval with No Human Annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 1655-1668.
https://doi.org/10.1109/TPAMI.2018.2846566
[12]  Melekhov, I., Tiulpin, A., Sattler, T., et al. (2019) DGC-NET: Dense Geometric Correspondence Network. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 7-11 January 2019, 1034-1042.
https://doi.org/10.1109/WACV.2019.00115
[13]  Luo, Z., Zhou, L., Bai, X., et al. (2020) ASLFeat: Learning Local Features of Accurate Shape and Localization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 6588-6597.
https://doi.org/10.1109/CVPR42600.2020.00662
[14]  Shi, C., Li, J., Gong, J., Yang, B. and Zhang, G. (2022) An Improved Lightweight Deep Neural Network with Knowledge Distillation for Local Feature Extraction and Visual Localization Using Images and LiDAR Point Clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 184, 177-188.
https://doi.org/10.1016/j.isprsjprs.2021.12.011
[15]  Kendall, A., Grimes, M. and Cipolla, R. (2015) PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 2938-2946.
https://doi.org/10.1109/ICCV.2015.336
[16]  Bui, M., Baur, C., Navab, N., Ilic, S. and Albarqouni, S. (2019) Adversarial Networks for Camera Pose Regression and Refinement. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, 27-28 October 2019, 3778-3787.
https://doi.org/10.1109/ICCVW.2019.00470
[17]  Shavit, Y. and Keller, Y. (2022) Camera Pose Auto-Encoders for Improving Pose Regression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M. and Hassner, T., Eds., Computer Vision—ECCV 2022, Springer, Cham, 140-157.
https://doi.org/10.1007/978-3-031-20080-9_9
[18]  Bach, T.B., Dinh, T.T. and Lee, J.H. (2022) FeatLoc: Absolute Pose Regressor for Indoor 2D Sparse Features with Simplistic View Synthesizing. ISPRS Journal of Photogrammetry and Remote Sensing, 189, 50-62.
https://doi.org/10.1016/j.isprsjprs.2022.04.021
[19]  Laskar, Z., Melekhov, I., Kalia, S. and Kannala, J. (2017) Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, 22-29 October 2017, 920-929.
https://doi.org/10.1109/ICCVW.2017.113
[20]  Zhou, Q., Sattler, T., Pollefeys, M. and Leal-Taixe, L. (2020) To Learn or Not to Learn: Visual Localization from Essential Matrices. 2020 IEEE International Conference on Robotics and Automation, Paris, 31 May-31 August 2020, 3319-3326.
https://doi.org/10.1109/ICRA40945.2020.9196607
[21]  Turkoglu, M.O., Brachmann, E., Schindler, K., Brostow, G.J. and Monszpart, A. (2021) Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision. 2021 International Conference on 3D Vision (3DV), London, 1-3 December 2021, 145-155.
https://doi.org/10.1109/3DV53792.2021.00025
[22]  Ali-Bey, A., Chaib-Draa, B. and Giguere, P. (2023) Mixvpr: Feature Mixing for Visual Place Recognition. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, 2-7 January 2023, 2997-3006.
https://doi.org/10.1109/WACV56688.2023.00301
[23]  Alibey, A., Chaibdraa, B. and Giguère, P. (2022) GSV-Cities: Toward Appropriate Supervised Visual Place Recognition. Neurocomputing, 513, 194-203.
https://doi.org/10.1016/j.neucom.2022.09.127
[24]  DeTone, D., Malisiewicz, T. and Rabinovich, A. (2018) Superpoint: Self-Supervised Interest Point Detection and Description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, 18-22 June 2018, 224-236.
https://doi.org/10.1109/CVPRW.2018.00060
[25]  Sarlin, P.E., DeTone, D., Malisiewicz, T. and Rabinovich, A. (2020) Superglue: Learning Feature Matching with Graph Neural Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 4937-4946.
https://doi.org/10.1109/CVPR42600.2020.00499
[26]  Shotton, J., Glocker, B., Zach, C., et al. (2013) Scene Coordinate Regression Forests for Camera Relocalization in RGB-D images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 23-28 June 2013, 2930-2937.
https://doi.org/10.1109/CVPR.2013.377
[27]  Valentin, J., Dai, A., Nießner, M., et al. (2016) Learning to Navigate the Energy Landscape. 2016 Fourth International Conference on 3D Vision (3DV), Stanford, 25-28 October 2016, 323-332.
https://doi.org/10.1109/3DV.2016.41
[28]  Torii, A., Sivic, J., Pajdla, T. and Okutomi, M. (2013) Visual Place Recognition with Repetitive Structures. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 23-28 June 2013, 883-890.
https://doi.org/10.1109/CVPR.2013.119
[29]  Warburg, F., Hauberg, S., Lopez-Antequera, M., et al. (2020) Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 2623-2632.
https://doi.org/10.1109/CVPR42600.2020.00270
[30]  Berton, G., Masone, C. and Caputo, B. (2022) Rethinking Visual Geo-Localization for Large-Scale Applications. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 4868-4878.
https://doi.org/10.1109/CVPR52688.2022.00483

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413