全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

多尺度信息融合的实时语义分割网络
Real-Time Semantic Segmentation Network Based on Multi-Scale Information

DOI: 10.12677/AIRR.2024.131003, PP. 19-29

Keywords: 实时语义分割,部分卷积,多尺度特征,编解码器结构
Real-Time Semantic Segmentation
, Partial Convolution, Multi-Scale Information, Codec Structure

Full-Text   Cite this paper   Add to My Lib

Abstract:

在自动驾驶、无人机等处理器资源受限的任务中,需要考虑模型的参数量和运算速度,并确保较好的准确性。一些语义分割模型采用并行式结构提取多尺度信息时,使用深度可分离卷积或分组卷积替换常规卷积来降低计算量。但这些操作存在增加网络延迟,降低推理速度的问题。基于此问题,提出一个基于编码器–解码器的实时语义分割模型。编码器阶段,使用部分卷积结合扩张卷积构建不同的并行式模块,用于提取不同阶段的多尺度信息。解码器阶段,使用融合上采样特征的方式。模型在Cityscapes和CamVid数据集上进行实验,平均交并比分别为71.3%和66.8%,运行速度分别为97帧/s和98帧/s,结果表明该模型在分割精度和运行速度之间达到较好平衡。
In tasks with limited processor resources such as autonomous driving and UAV, it is necessary to consider the number of parameters and operation speed of the model, and ensure good accuracy. When some semantic segmentation models adopt a parallel structure to extract multi-scale information, they use depth wise separable convolution or grouped convolution to replace conventional convolution to reduce computational complexity. However, these operations have the problem of increasing network delay and reducing inference speed. To solve this problem, a real-time semantic segmentation model based on encoder-decoder is proposed. In the encoder stage, partial convolution combined with dilated convolution was used to construct different parallel modules for extracting multi-scale information at different stages. In the decoder stage, the up sampled features are fused. The model is tested on Cityscapes and CamVid datasets, the MIU is 71.3%and 66.8%respectively, and the running speed is 97 frames/s and 98 frames/s respectively. The results show that the model achieves a good balance between segmentation accuracy and running speed.

References

[1]  Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3431-3440.
https://doi.org/10.1109/CVPR.2015.7298965
[2]  Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097-1105.
[3]  Simonyan, K. and Zisserman, A. (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. 3rd International Conference on Learning Representations (ICLR 2015), San Diego, 7-9 May 2015, 1-14.
[4]  Szegedy, C., Liu, W., Jia, Y., et al. (2014) Going Deeper with Convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 1-9.
https://doi.org/10.1109/CVPR.2015.7298594
[5]  Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Springer, Cham, 234-241.
https://doi.org/10.1007/978-3-319-24574-4_28
[6]  Chen, L.C., Papandreou, G., Kokkinos, I., et al. (2014) Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Computer Science, 4, 357-361.
[7]  Chen, L.C., Papandreou, G., Kokkinos, I., et al. (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848.
https://doi.org/10.1109/TPAMI.2017.2699184
[8]  Chen, L.C., Papandreou, G., Schroff, F., et al. (2023) Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv: 1706.05587.
[9]  Chen, L.C., Zhu, Y.K., Papandreou, G., et al. (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 833-851.
https://doi.org/10.1007/978-3-030-01234-2_49
[10]  Zhao, H.S., Qi, X.J., Shen, X., Shi, J. and Jia, J. (2018) Icnet for Real-Time Semantic Segmentation on High-Resolution Images. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 418-434.
https://doi.org/10.1007/978-3-030-01219-9_25
[11]  Li, H.C., Xiong, P.F., Fan, H.Q. and Sun, J. (2019) Dfanet: Deep Feature Aggregation for Real-Time Semantic Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 9522-9531.
https://doi.org/10.1109/CVPR.2019.00975
[12]  Chollet, F. (2017) Xception: Deep Learning with Depthwise Separable Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 1800-1807.
https://doi.org/10.1109/CVPR.2017.195
[13]  Li, X.T., You, A.S., Zhu, Z., et al. (2002) Semantic Flow for Fast and Accurate Scene Parsing. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Springer, Cham, 775-793.
[14]  He, K.M., Zhang, X.Y., Ren, S.Q., and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778.
https://doi.org/10.1109/CVPR.2016.90
[15]  Ma, N.N., Zhang, X.Y., Zheng, H.T. and Su, J. (2018) Shufflenetv2: Practical Guidelines for Efficient CNN Architecture Design. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 122-138.
[16]  Yu, C.Q., Wang, J.B., et al. (2018) BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. . In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 334-349.
[17]  Paszke, A., Chaurasia, A., Kim, S. and Culurciello, E. (2016) ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv: 1606.02147.
[18]  Li, G., Yun, I.Y., Kim, J. and Kim, J. (2019) Dabnet: Depth-Wise Asymmetric Bottleneck for Real-Time Semantic Segmentation. arXiv: 1907.11357.
[19]  Gao, R. (2021) Rethinking Dilated Convolution for Real-time Semantic Segmentation. arXiv: 2111.09957.
[20]  Howard, A.G., Zhu, M.L., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861.
[21]  Xie, S.N., Girshick, R., et al. (2023) Aggregated Residual Transformations for Deep Neural Networks. arXiv: 1611.05431.
[22]  Chen, J.R., Kao, S.H., et al. (2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, 17-24 June 2023, 12021-12031.
https://doi.org/10.1109/CVPR52729.2023.01157
[23]  Yu, C.Q., Gao, C.X., et al. (2021) Bisenet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. International Journal of Computer Vision, 129, 3051-3068.
https://doi.org/10.1007/s11263-021-01515-2
[24]  Sandler, M., Howard, A., Zhu, M, L., et al. (2018) MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520.
https://doi.org/10.1109/CVPR.2018.00474
[25]  Cordts, M., Omran, M., Ramos, S., et al. (2016) The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 3213-3223.
https://doi.org/10.1109/CVPR.2016.350
[26]  Brostow, G.J., Shotton, J., Fauqueur, J., et al. (2008) Segmentation and Recognition Using Structure from Motion Point Clouds. In: Forsyth, D., Torr, P. and Zisserman, A., Eds., Computer Vision—ECCV 2008, Springer, Berlin, 44-57.
https://doi.org/10.1007/978-3-540-88682-2_5
[27]  Mehta, S., Rastegari, M., Caspi, A., et al. (2018) ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 552-568.
https://doi.org/10.1007/978-3-030-01249-6_34
[28]  Wu, T.Y., Tang, S., Zhang, R., et al. (2021) CGNet: A Light-Weight Context Guided Network for Semantic Segmentation. IEEE Transactions on Image Processing, 30, 1169-1179.
https://doi.org/10.1109/TIP.2020.3042065
[29]  Romera, E., Alvarez, J.M., Bergasa, L.M., et al. (2017) ERFNet: Efficient Residual Factorized Convnet for Real-Time Semantic Segmentation. IEEE Transactions on Intelligent Transportation Systems, 19, 263-272.
https://doi.org/10.1109/TITS.2017.2750080
[30]  Wang, Y., Zhou, Q., Liu, J., et al. (2019) Lednet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation. Proceedings of the IEEE International Conference on Image Processing, Taipei, 22-25 September 2019, 1860-1864.
https://doi.org/10.1109/ICIP.2019.8803154

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413