全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于时空分析的手术场景三维重建方法研究
Research on Three-Dimensional Reconstruction Method of Surgical Scene Based on Spatiotemporal Analysis

DOI: 10.12677/JISP.2024.132010, PP. 107-116

Keywords: 内窥镜图像,深度估计,时空分析,位姿优化,三维重建
Endoscopic Image
, Depth Estimation, Spatiotemporal Analysis, Pose Optimization, Three-Dimensional Reconstruction

Full-Text   Cite this paper   Add to My Lib

Abstract:

内窥镜图像的深度估计与手术场景三维重建是微创手术中提高外科医师手术效率的关键因素。本文提出一种基于时空分析的手术场景三维重建方法,深度估计网络采用编码器–解码器结构,编码器使用ResNet34模块、改进的SAB注意力机制、改进的FPN模块以及特征增强模块;解码器通过上采样获取图像的深度信息和位姿信息,实现内窥镜图像的准确深度估计。在跟踪重建方面,通过时空跟踪优化相机位姿,将空间维度的深度信息与时间维度相结合,通过时空分析与融合,还原手术场景的三维结构。评估采用Hamlyn公共数据集,实验结果表明本文所提方法可有效提高内窥镜图像深度估计的准确性,同时通过与时间维度的深度信息融合,可准确还原手术场景的三维信息,进一步辅助外科医师实现术中精准导航。
Depth estimation of endoscopic image and 3D reconstruction of surgical scene are key factors to improve surgical efficiency of surgeons in minimally invasive surgery. In this paper, a 3D recon-struction method of surgical scene based on spatiotemporal analysis is proposed, and the proposed network is designed as an encoder-decoder structure. The encoder uses ResNet34 module, im-proved SAB attention mechanism, improved FPN module and feature enhancement module, and the decoder obtains the depth information and pose information of the image through up-sampling, so as to realize accurate depth estimation of the endoscope image. In terms of tracking and recon-struction, the camera pose is optimized through spatiotemporal tracking, the depth information of the spatial dimension is combined with the time dimension, and the three-dimensional structure of the surgical scene is restored through spatiotemporal analysis and fusion. Hamlyn public dataset was used for evaluation, and experimental results show that the method proposed in this paper could effectively improve the accuracy of depth estimation of endoscopic images. At the same time, three-dimensional information of surgical scene could be accurately restored through fusion with depth information and time dimension to assist surgeons to achieve accurate intraoperative navi-gation.

References

[1]  Isachsen, T.M.E. (2021) Fast and Accurate GPU-Accelerated, High-Resolution 3D Registration for the Robotic 3D Re-construction of Compliant Food Objects. Computers and Electronics in Agriculture, 180, Article ID: 105929.
https://doi.org/10.1016/j.compag.2020.105929
[2]  Chen, W., Fu, Z., Yang, D., et al. (2016) Single-Image Depth Perception in the Wild. Advances in Neural Information Processing Systems, 29, 730-738.
[3]  Li, B., Dai, Y.C. and He, M.Y. (2018) Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference. Pattern Recognition, 83, 328-339.
https://doi.org/10.1016/j.patcog.2018.05.029
[4]  Dong, W., Wang, Q., Wang, X. and Zhang, H.B. (2018) PSDF Fusion: Probabilistic Signed Distance Function for On-The-Fly 3D Data Fusion and Scene Reconstruction. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., ECCV 2018: Computer Vi-sion—ECCV 2018, Springer, Cham, 714-730.
https://doi.org/10.1007/978-3-030-01240-3_43
[5]  Xie, C., Yao, T., Wang, J., et al. (2020) Endoscope Localiza-tion and Gastrointestinal Feature Map Construction Based on Monocular Slam Technology. Journal of Infection and Public Health, 13, 1314-1321.
https://doi.org/10.1016/j.jiph.2019.06.028
[6]  孙蕴瀚, 史金龙, 孙正兴. 利用自监督卷积网络估计单图像深度信息[J]. 计算机辅助设计与图形学学报, 2020, 32, 643-651. https://www.jcad.cn/cn/article/doi/10.3724/SP.J.1089.2020.1782
[7]  Sucar, E., Liu, S., Ortiz, J., et al. (2021) IMAP: Implicit Mapping and Positioning in Real-Time. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 11-17 October 2021, 6229-6238.
https://doi.org/10.1109/ICCV48922.2021.00617
[8]  Mildenhall, B., Srinivasan, P.P., Tancik, M., et al. (2021) Nerf: Representing Scenes as Neural Radiance Fields for View Synthesis. Communications of the ACM, 65, 99-106.
https://doi.org/10.1145/3503250
[9]  Recasens, D., Lamarca, J., Fácil, J.M., et al. (2021) En-do-Depth-And-Motion: Reconstruction and Tracking in Endoscopic Videos Using Depth Networks and Photometric Constraints. IEEE Robotics and Automation Letters, 6, 7225-7232.
https://doi.org/10.1109/LRA.2021.3095528
[10]  Shen, M., Gu, Y., Liu, N., et al. (2019) Context-Aware Depth and Pose Estimation for Bronchoscopic Navigation. IEEE Robotics and Automation Letters, 4, 732-739.
https://doi.org/10.1109/LRA.2019.2893419
[11]  陈苑锋. 视觉深度估计与点云建图研究进展[J]. 液晶与显示, 2021, 36, 896-911.
https://doi.org/10.37188/cjlcd.2020-0047
[12]  Ming, Y, Meng, X, Fam, C, et al. (2021) Deep Learning for Mo-nocular Depth Estimation: A Review. Neurocomputing, 438, 14-33.
https://doi.org/10.1016/j.neucom.2020.12.089
[13]  Jing, L. and Tian, Y. (2020) Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 4037-4058.
https://doi.org/10.1109/TPAMI.2020.2992393
[14]  Luo, W., Xing, J., Milan, A., et al. (2021) Multiple Object Tracking: A Literature Review. Artificial Intelligence, 293, Article ID: 103448.
https://doi.org/10.1016/j.artint.2020.103448
[15]  Cai, B., Xu, X., Xing, X., et al. (2016) BIT: Biologically Inspired Tracker. IEEE Transactions on Image Processing, 25, 1327-1339.
https://doi.org/10.1109/TIP.2016.2520358
[16]  Xie, H, Yao, H, Zhang, S, et al. (2020) Pix2Vox++: Multi-Scale Context-Aware 3D Object Reconstruction from Single and Multiple Images. arXiv: 2006.12250.
[17]  Yang, G., Huang, X., Hao, Z., et al. (2019) PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 4540-4549.
https://doi.org/10.1109/ICCV.2019.00464
[18]  Wang, N., Zhang, Y., Li, Z., et al. (2018) Pixel2mesh: Generating 3D Mesh Models from Single Rgb Images. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., ECCV 2018: Computer Vision—ECCV 2018, Springer, Cham, 55-71.
https://doi.org/10.1007/978-3-030-01252-6_4
[19]  Huang, Z., Stojanov, S., Thai, A., et al. (2022) Planes vs. Chairs: Category-Guided 3D Shape Learning without Any 3D Cues. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M. and Hassner, T., Eds., ECCV 2022: Computer Vision—ECCV 2022, Springer, Cham, 727-744.
https://doi.org/10.1007/978-3-031-19769-7_42
[20]  Xu, Q., Wang, W., Ceylan, D., et al. (2019) DISN: Deep Im-plicit Surface Network for High-Quality Single-View 3D Reconstruction. arXiv: 1905.10711.
[21]  Huang, K. and Hao, Q. (2021) Joint Multi-Object Detection and Tracking with Camera-LiDAR Fusion for Autonomous Driving. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, 27 September-1 October 2021, 6983-6989.
https://doi.org/10.1109/IROS51168.2021.9636311
[22]  Reijgwart, V., Millane, A., Oleynikova, H., et al. (2019) Voxgraph: Globally Consistent, Volumetric Mapping Using Signed Distance Function Submaps. IEEE Ro-botics and Automation Letters, 5, 227-234.
https://doi.org/10.1109/LRA.2019.2953859
[23]  Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convo-lutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., MICCAI 2015: Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Springer, Cham, 234-241.
https://doi.org/10.1007/978-3-319-24574-4_28
[24]  Zhang, Z., Xu, C., Yang, J., et al. (2018) Progres-sive Hard-Mining Network for Monocular Depth Estimation. IEEE Transactions on Image Processing, 27, 3691-3702.
https://doi.org/10.1109/TIP.2018.2821979
[25]  Lorensen, W.E. and Cline, H.E. (1987) Marching Cubes: A High Resolution 3D Surface Construction Algorithm. ACM SIGGRAPH Computer Graphics, 21, 163-169.
https://doi.org/10.1145/37402.37422

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413