|
基于改进YOLO V3算法的室人数统计模型
|
Abstract:
基于机器学习与深度学习的目标检测方法被广泛应用于人数统计,然而实际检测区域往往存在人群相互遮挡,或光照不均匀等情况时,人数统计仍然面临很大挑战。为此,提出了一种改进的YOLO V3模型,使其更好的适应室内人群的人数统计。首先自建并丰富了数据集,增加了训练数据的多样性,并通过K-means算法重新聚类锚框;其次,提出了F-YOLO V3模型,该模型中增加104 × 104尺寸的特征图输出并取消13 × 13尺寸特征图的输出;将原网络每一层上采样后的特征图继续上采样,得到的特征图与原网络相应尺寸的特征图进行拼接;并将输出层前的5个卷积变成了1个卷积和2个残差单元,提取更多特征信息,增强对模糊或者较小目标检测能力;最后增加一个ADIOU Loss分支衡量检测框的定位准确度。实验结果表明,F-YOLO V3模型具有更高的召回率和平均精度,室内场景下的人员统计性能得到显著提升。
Object detection methods based on machine learning and deep learning model are widely used in people counting. However, when there are too many objects in the same area, people will be oc-cluded, or people in the video are not easy to find in the dark, people counting is still a big challenge. Therefore, an improved YOLO V3 model is proposed to better adapt to the number of indoor crowd statistics in classrooms. Firstly, the data set was self-built and enriched to increase the diversity of training data, and the anchor boxes were re-clustered by K-means algorithm. Secondly, the YOLO V3 feature extraction network and multi-dimension detection algorithm were improved, and the F-YOLO V3 model was proposed. In this model, the output of 104 × 104 feature map was added and the output of 13 × 13 feature map was canceled. The sampled feature images of each layer of the original network are continued to be sampled, and the obtained feature images are spliced with the corresponding size feature images of the original network. The 5 convolutions in front of the output layer are changed into 1 convolution and 2 residual units to extract more feature information and enhance the detection ability of fuzzy or small targets. Add an ADIOU Loss branch to measure the positioning accuracy of the detection box; Finally, the real-time number of people in the output screen is counted. The experimental results show that the improved YOLO V3 algorithm has higher recall rate and average precision, and the performance of personnel statistics in indoor scenes is significantly improved.
[1] | LeCun, Y., Bengio, Y. and Hinton, G. (2015) Deep Learning. Nature, 521, 436-444.
https://doi.org/10.1038/nature14539 |
[2] | Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classi-fication with Deep Convolutional Neural Network. Proceedings of the 25th International Conference on Neural Infor-mation Processing Systems, Lake Tahoe, 3-8 December 2012, 1097-1105. |
[3] | Girshick, R., Donahue, J., Darrell, T., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, 23-28 June 2014, 580-587. https://doi.org/10.1109/CVPR.2014.81 |
[4] | Li, L.-H., Lun, Z.-M., Lian, J., et al. (2017) Convolution Neural Net-work-Based Vehicle Detection Method. Journal of Jilin University (Engineering and Technology Edition), 47, 384-391. |
[5] | Xiong, C.-Z., Shan, Y.-M. and Guo, F.-H. (2017) Image Retrieval Method Based on Image Principal Part Detection. Optics and Precision Engineering, 25, 792-798. https://doi.org/10.3788/OPE.20172503.0792 |
[6] | Li, Y., Qi, H., Dai, J., et al. (2017) Fully Convolutional Instane-Aware Semantic Segmentation. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 4438-4446.
https://doi.org/10.1109/CVPR.2017.472 |
[7] | Kisantal, M., Wojna, Z., Murawski, J., et al. (2019) Augmentation for Small Object Detection. Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 1-15.
https://doi.org/10.5121/csit.2019.91713 |
[8] | 李航, 朱明. 基于深度卷积神经网络的小目标检测算法[J]. 计算机工程与科学, 2020, 42(4): 649-657. |
[9] | 易诗, 李欣荣, 吴志娟, 等. 基于红外热成像与改进YOLO V3的夜间野兔检测方法[J]. 农业工程学报, 2019, 35(19): 223-229. |
[10] | Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, 23-28 June 2014, 580-587. https://doi.org/10.1109/CVPR.2014.81 |
[11] | Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Confer-ence on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448. https://doi.org/10.1109/ICCV.2015.169 |
[12] | He, K.M., Zhang, X.Y. and Ren, S.Q. (2014) Spatial Pyramid Pool-ing in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis & Machine Intelli-gence, 37, 1904-1916.
https://doi.org/10.1109/TPAMI.2015.2389824 |
[13] | Law, H. and Deng, J. (2018) CornerNet, Detecting Objects as Paired Key-Points. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 734-750.
https://doi.org/10.1007/978-3-030-01264-9_45 |
[14] | Ren, S.Q., He, K.M., Girshick, R., et al. (2017) Faster R-CNN, towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149.
https://doi.org/10.1109/TPAMI.2016.2577031 |
[15] | He, K.M., Gkioxari, G., Dollar, P., et al. (2017) Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2980-2988. https://doi.org/10.1109/ICCV.2017.322 |
[16] | Redmon, J., Divvala, S., Girshick, R., et al. (2016) You Only Look Once, Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788.
https://doi.org/10.1109/CVPR.2016.91 |
[17] | Redmon, J. and Farhadi, A. (2017) YOLO9000, Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6517-6525. https://doi.org/10.1109/CVPR.2017.690 |
[18] | Redmon, J. and Farhadi, A. (2018) YOLOv3, an Incre-mental Improvement. https://arxiv.org/abs/1804.02767 |
[19] | Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD, Single Shot Multibox Detector. In: Leibe, B., Matas, J., Sebe, N., et al., Eds., Lecture Notes in Computer Science, Springer, Cham, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2 |
[20] | 陈晓. 基于目标检测的视频人数统计算法研究[D]: [硕士学位论文]. 哈尔滨: 哈尔滨工业大学, 2019. |
[21] | 成玉荣, 胡海洋. 基于改进Tiny-YOLOv3的人数统计方法[J]. 科技创新导报, 2020, 17(10): 4-5+8. |
[22] | 郑国书, 朱秋煜, 王辉. 基于深度学习SSD模型的视频室内人数统计[J]. 工业控制计算机, 2017, 30(11): 48-50. |
[23] | He, K.M., Zhang, X.Y., Ren, S.Q., et al. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778.
https://doi.org/10.1109/CVPR.2016.90 |
[24] | Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Clas-sification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural In-formation Processing Systems, Lake Tahoe, 3-8 December 2012, 1097-1105. |
[25] | 鞠默然, 罗海波, 等. 改进的YOLO V3算法及其在小目标检测中的应用[J]. 光学学报, 2019, 39(7): 0715004. |