OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Hans Journal of Data Mining 2023

基于端到端的复杂场景中文文字识别方法研究
Research on End-to-End Chinese Text Recognition Method in Complex Scenes

DOI: 10.12677/HJDM.2023.132015, PP. 154-164

帅梓涵, 胡金蓉, 郎子鑫, 罗月梅, 李桂钢

Keywords: 端到端，文字识别，Transformer，深度学习，End-to-End, Text Recognition, Transformer, Deep Learning

Full-Text Cite this paper Add to My Lib

Abstract:

近年来，由于成功挖掘了场景文本检测和识别的内在协同作用，端到端场景文本识别引起了人们的极大关注。然而，最近最先进的方法通常仅通过共享主干来结合检测和识别，这些方法由于其尺度和纵横比的极端变化不能很好地处理场景文本。在本文中，我们提出了一种新的端到端场景文本识别框架，称为ES-Transformer。与以往以整体方式学习场景文本的方法不同，我们的方法基于几个代表性特征来执行场景文本识别，这避免了背景干扰并降低了计算成本。具体来说，使用基本特征金字塔网络进行特征提取，然后，我们采用Swin-Transformer来建模采样特征之间的关系，从而有效地将它们划分为合理的组。在提升识别精度的同时降低了计算复杂度，不再依赖于繁杂的后处理模块。对中文数据集的定性和定量实验表明，ES-Transformer优于现有方法。
In recent years, due to the successful exploration of the inherent synergistic effect of scene text detection and recognition, end-to-end scene text recognition has attracted great attention. However, the most recent state-of-the-art methods usually only combine detection and recognition by sharing backbones, and these methods cannot handle scene text well due to extreme variations in scale and aspect ratio. In this paper, we propose a new end-to-end scene text recognition framework called ES-Transformer. Unlike previous methods that learn scene text in a holistic way, our approach per-forms scene text recognition based on several representative features, which avoids background interference and reduces computational cost. Specifically, we use a basic feature pyramid network for feature extraction, and then we employ Swin-Transformer to model the relationships between the sampled features, effectively partitioning them into reasonable groups. By improving recognition accuracy and reducing computational complexity, ES-Transformer no longer relies on complex post-processing modules. Qualitative and quantitative experiments on Chinese datasets show that ES-Transformer outperforms existing methods.

References

[1]	He, M.H., et al. (2021) MOST: A Multi-Oriented Scene Text Detector with Localization Refinement. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 8809-8818. https://doi.org/10.1109/CVPR46437.2021.00870
[2]	Li, X., et al. (2018) Shape Robust Text Detection with Pro-gressive Scale Expansion Network. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 16-17 June 2019, 9328-9337.
[3]	Vaswani, A., Shazeer, N., Parmar, N., et al. (2007) Attention Is All You Need. Proceedings NIPS, Vancouver, 3-6 December 2007, 5998-6008.
[4]	Carion, N., Massa, F., Synnaeve, G., et al. (2020) End-to-End Object Detection with Transformers. Computer Vision-ECCV 2020: 16th European Conference, Glasgow, 23-28 August 2020, 213-229. https://doi.org/10.1007/978-3-030-58452-8_13
[5]	Dai, X., Chen, Y., Yang, J., et al. (2021) Dynamic detr: End-to-End Object Detection with Dynamic Attention. Proceedings of the IEEE/CVF International Conference on Com-puter Vision, Montreal, 11-17 October 2021, 2988-2997. https://doi.org/10.1109/ICCV48922.2021.00298
[6]	Meng, D., Chen, X., Fan, Z., Zeng, G., et al. (2021) Condi-tional detr for Fast Training Convergence. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 11-17 October 2021, 3651-3660. https://doi.org/10.1109/ICCV48922.2021.00363
[7]	Zhu, X., Su, W., Lu, L., et al. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection. ICLR 2021, 3-7 May 2021, 1-16.
[8]	Liu, W., Anguelov, D., Erhan, D., et al. (2016) Ssd: Single Shot Multibox Detector. Computer Vision-ECCV 2016: 14th European Confer-ence, Amsterdam, 11-14 October 2016, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2
[9]	Raisi, Z., Naiel, M.A., Younes, G., et al. (2021) Transformer-Based Text Detection in the Wild. Proceedings of the IEEE/CVF Confer-ence on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 3162-3171. https://doi.org/10.1109/CVPRW53098.2021.00353
[10]	Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., et al. (2015) ICDAR 2015 Competition on Robust Reading. 2015 13th International Conference on Document Analysis and Recognition (ICDAR) IEEE, Tunis, 23-26 August 2015, 1156-1160. https://doi.org/10.1109/ICDAR.2015.7333942
[11]	Nayef, N., Yin, F., Bizid, I., et al. (2017) Icdar2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification-rrc-mlt. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) IEEE, Vol. 1, 1454-1459. https://doi.org/10.1109/ICDAR.2017.237
[12]	Yu, D., Li, X., Zhang, C., et al. (2020) Towards Accurate Scene Text Recognition with Semantic Reasoning Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 14-19 June 2020, 12113-12122. https://doi.org/10.1109/CVPR42600.2020.01213
[13]	Fang, S., Xie, H., Wang, Y., et al. (2021) Read like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 7098-7107. https://doi.org/10.1109/CVPR46437.2021.00702
[14]	Wang, Y., Xie, H., Fang, S., et al. (2021) From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 11-17 October 2021, 14194-14203. https://doi.org/10.1109/ICCV48922.2021.01393
[15]	Baek, Y., Lee, B., Han, D., Yun, S., et al. (2019) Character Region Awareness for Text Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 9365-9374. https://doi.org/10.1109/CVPR.2019.00959
[16]	Liao, M., Shi, B. and Bai, X. (2018) Textboxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, 27, 3676-3690. https://doi.org/10.1109/TIP.2018.2825107
[17]	Liao, M., Wan, Z., Yao, C., et al. (2020) Real-Time Scene Text Detection with Differentiable Binarization. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11474-11481. https://doi.org/10.1609/aaai.v34i07.6812
[18]	Zhou, X., Yao, C., Wen, H., et al. (2017) East: An Ef-ficient and Accurate Scene Text Detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion, Honolulu, 21-26 July 2017, 5551-5560. https://doi.org/10.1109/CVPR.2017.283
[19]	Li, Y., Wu, Z., Zhao, S., et al. (2020) PSENet: Psoriasis Severity Evaluation Network. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 800-807. https://doi.org/10.1609/aaai.v34i01.5424
[20]	Lyu, P., Liao, M., Yao, C., et al. (2018) Mask Textspotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 67-83. https://doi.org/10.1007/978-3-030-01264-9_5
[21]	Zhang, R., Zhou, Y., Jiang, Q., et al. (2019) Icdar 2019 Robust Reading Challenge on Reading Chinese Text on Signboard. 2019 International Conference on Document Analysis and Recognition (ICDAR) IEEE, Sydney, 20-25 September 2019, 1577-1581. https://doi.org/10.1109/ICDAR.2019.00253
[22]	Tan, M. and Le, Q. (2019) Efficientnet: Rethinking model Scaling for Convolutional Neural Networks. International Conference on Machine Learning. PMLR, Long Beach, 9-15 June 2019, 6105-6114.
[23]	Liu, R., Lehman, J., Molino, P., et al. (2018) An Intriguing Failing of Convolutional Neural Networks and the Coordconv Solution. Proceedings of the 32nd International Conference on Neural Information Pro-cessing Systems, Montréal, 2-8 December 2018, 9628-9639.
[24]	Liu, Z., Lin, Y., Cao, Y., Hu, H., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 11-17 October 2021, 10012-10022. https://doi.org/10.1109/ICCV48922.2021.00986
[25]	Sun, Y., Ni, Z., Chng, C.K., Liu, Y., et al. (2019) ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling-rrc-lsvt. 2019 IEEE International Conference on Document Analysis and Recognition (ICDAR), Sydney, 20-25 September 2019, 1557-1562. https://doi.org/10.1109/ICDAR.2019.00250
[26]	Chng, C.K., Liu, Y.L., Sun, Y.P., et al. (2019) Icdar2019 Robust Reading Challenge on Arbitrary-Shaped Text-rrc-art. 2019 IEEE International Conference on Document Analysis and Recognition (ICDAR), Sydney, 20-25 September 2019, 1571-1576. https://doi.org/10.1109/ICDAR.2019.00252

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133

基于端到端的复杂场景中文文字识别方法研究Research on End-to-End Chinese Text Recognition Method in Complex Scenes

基于端到端的复杂场景中文文字识别方法研究
Research on End-to-End Chinese Text Recognition Method in Complex Scenes