全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于深度学习的模糊遮挡船号文本识别
Deep Learning-Based Recognition of Blurred Ship Identification Text in Maritime Context

DOI: 10.12677/MOS.2024.131043, PP. 450-459

Keywords: 深度学习,文字识别,GAN,CRNN
Deep Learning
, Text Recognition, GAN, CRNN

Full-Text   Cite this paper   Add to My Lib

Abstract:

随着我国船运业的蓬勃发展,船号作为船舶身份的唯一标识,确保准确识别船舶船号对于船舶管理至关重要。然而,在实际场景中,船号常常存在文字模糊或者被遮挡的情况,这会极大地降低识别的精准度。由于传统的图像处理和文字识别方法不能很好地解决这一问题,而且现有的大多数方式通常分二个步骤解决这个问题,先进行图像恢复再对恢复图像进行文字识别,但这种方式忽视了图像恢复和文字识别间的关联性,因此本文提出了一种联合生成对抗网络(GAN)和卷积循环神经网络(CRNN)的针对模糊遮挡船号文本的双分支耦合文字识别框架,称为SRC-GAN,通过对抗性学习将文字识别和图像恢复集成起来。通过将识别模型和GAN模型联合训练,学习更多图像的共性特征,从而对低质量的图像有更好的识别性能。在船舶数据集和CTW数据集上的识别实验表明,该方法相较于原始CRNN识别精度平均分布提升了11.98%和10.68%,相较于二阶段识别模型也有一定的优势,SRC-GAN对于模糊遮挡文本图像有着更好的识别效果。
With the vigorous development of China’s shipping industry, the ship number serves as the unique identifier of the ship’s identity, ensuring accurate identification of the ship number is crucial for ship management. However, in practical scenarios, ship numbers often have blurred or obstructed text, which greatly reduces the accuracy of recognition. Due to the fact that traditional image pro-cessing and text recognition methods cannot effectively solve this problem, and most existing methods usually solve this problem in two steps, first performing image restoration and then per-forming text recognition on the restored image, this approach ignores the correlation between im-age restoration and text recognition. Therefore, this article proposes a dual branch coupled text recognition framework called SRC-GAN, which combines Generative Adversarial Network (GAN) and Convolutional Recurrent Neural Network (CRNN) for fuzzy occluded ship number texts. It integrates text recognition and image restoration through adversarial learning. By jointly training the recog-nition model and GAN model, more common features of images can be learned, resulting in better recognition performance for low-quality images. The recognition experiments on the ship dataset and CTW dataset show that this method has improved the average recognition accuracy distribu-tion by 11.98% and 10.68% compared to the original CRNN, and also has certain advantages over the two-stage recognition model. SRC-GAN has better recognition performance for fuzzy occluded text images.

References

[1]  陈丹涌, 陈观凤, 刘明明. 航运公司数字化安全管理体系系统设计[J]. 广州航海学院学报, 2020, 28(4): 23-27.
[2]  Liu, X.Y., Meng, G.F. and Pan, C.H. (2019) Scene Text Detection and Recognition with Advances in Deep Learning: A Survey. International Journal on Document Analysis and Recognition (IJDAR), 22, 143-162.
https://doi.org/10.1007/s10032-019-00320-5
[3]  Lin, H., Yang, P. and Zhang, F. (2020) Review of Scene Text Detection and Recognition. Archives of Computational Methods in Engineering: State of the Art Reviews, 27, 433-454.
https://doi.org/10.1007/s11831-019-09315-1
[4]  Aggarwal, A., Mittal, M. and Battineni, G. (2021) Generative Adversarial Network: An Overview of Theory and Applications. International Journal of Information Management Da-ta Insights, 1, Article 100004.
https://doi.org/10.1016/j.jjimei.2020.100004
[5]  Mahajan, S. and Rani, R. (2021) Text Detection and Localization in Scene Images: A Broad Review. Artificial Intelligence Review, 54, 4317-4377.
https://doi.org/10.1007/s10462-021-10000-8
[6]  Tian, Z., Huang, W., He, T., He, P. and Qiao, Y. (2016). De-tecting Text in Natural Image with Connectionist Text Proposal Network. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer Vision—ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, Springer, Cham.
https://doi.org/10.1007/978-3-319-46484-8_4
[7]  Zhou, X.Y., et al. (2017) East: An Efficient and Accurate Sce-ne Text Detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 5551-5560.
https://doi.org/10.48550/arXiv.1704.03155
[8]  Wang, W., Xie, E., Li, X., et al. (2019) Shape Robust Text Detec-tion with Progressive Scale Expansion Network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Los Angeles, 16-20 June 2019, 9336-9345.
https://doi.org/10.48550/arXiv.1903.12473
[9]  Redmon, J., Divvala, S., Girshick, R., et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 26 June-1 July 2016, 779-788.
https://doi.org/10.48550/arXiv.1506.02640
[10]  Graves, A., Fernández, S., Gomez, F., et al. (2006) Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. Proceedings of the 23rd International Conference on Machine Learning, Pitts-burgh, 25-29 June 2006, 369-376.
https://dl.acm.org/doi/abs/10.1145/1143844.1143891
[11]  Sutskever, I., Vinyals, O. and Le, Q.V. (2014) Sequence to Sequence Learning with Neural Networks. Advances in Neural Infor-mation Processing Systems, 27.
https://arxiv.org/abs/1409.3215
[12]  Shi, B., Bai, X. and Yao, C. (2016) An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recog-nition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2298-2304.
https://doi.org/10.1109/TPAMI.2016.2646371
[13]  Graves, A., Mohamed, A. and Hinton, G. (2013) Speech Recognition with Deep Recurrent Neural Networks. 2013 IEEE International Conference on Acoustics, Speech and Sig-nal Processing, Vancouver, 26-31 May 2013, 6645-6649.
https://doi.org/10.1109/ICASSP.2013.6638947
[14]  Cao, L., Li, H., Xie, R. and Zhu, J. (2020) A Text Detection Algorithm for Image of Student Exercises Based on CTPN and Enhanced YOLOv3, IEEE Access, 8, 176924-176934.
https://doi.org/10.1109/ACCESS.2020.3025221
[15]  Graves, A., Fernández, S. and Schmidhuber, J. (2005) Bidi-rectional LSTM Networks for Improved Phoneme Classification and Recognition. International Conference on Artificial Neural Networks, Berlin, 11-15 September 2005, 799-804.
https://doi.org/10.1007/11550907_163
[16]  Yu, Y., Si, X., Hu, C., et al. (2019) A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation, 31, 1235-1270.
https://doi.org/10.1162/neco_a_01199

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413