OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Journal of Computer and Communications 2024

A Lightweight Convolutional Neural Network with Hierarchical Multi-Scale Feature Fusion for Image Classification

DOI: 10.4236/jcc.2024.122011, PP. 173-200

Adama Dembele, Ronald Waweru Mwangi, Ananda Omutokoh Kube

Keywords: MobileNet, Image Classification, Lightweight Convolutional Neural Network, Depthwise Dilated Separable Convolution, Hierarchical Multi-Scale Feature Fusion

Full-Text Cite this paper Add to My Lib

Abstract:

Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.

References

[1]	Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) Imagenet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. https://doi.org/10.1145/3065386
[2]	Wang, N. and Yeung, D.Y. (2013) Learning a Deep Compact Image Representation for Visual Tracking. Proceedings of the 26th International Conference on Neural Information Processing Systems, Nevada, 5-10 December 2013, 809-817.
[3]	Wei, W., Can, T., Xin, W., Yanhong, L., Yongle, H. and Ji, L. (2019) Image Object Recognition via Deep Feature-Based Adaptive Joint Sparse Representation. Computational Intelligence and Neuroscience, 2019, Article ID: 8258275. https://doi.org/10.1155/2019/8258275
[4]	Li, F., Wang, C., Liu, X., Peng, Y. and Jin, S. (2018) A Composite Model of Wound Segmentation Based on Traditional Methods and Deep Neural Networks. Computational Intelligence and Neuroscience, 2018, Article ID: 4149103. https://doi.org/10.1155/2018/4149103
[5]	Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv: 1409.1556.
[6]	Targ, S., Almeida, D. and Lyman, K. (2016) Resnet in Resnet: Generalizing Residual Architectures. arXiv: 1603.08029.
[7]	Li, C. and Shi, C.R. (2018) Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 746-761. https://doi.org/10.1007/978-3-030-01249-6_45
[8]	Wen, W., Wu, C., Wang, Y., Chen, Y. and Li, H. (2016) Learning Structured Sparsity in Deep Neural Networks. arXiv: 1608.03665.
[9]	Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R. and Bengio, Y. (2017) Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. The Journal of Machine Learning Research, 18, 6869-6898.
[10]	Huang, G., Liu, S.X., van der Maaten, L. and Weinberger, K.Q. (2018) CondenseNet: An Efficient Dense Netusing Learned Group Convolutions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2752-2761. https://doi.org/10.1109/CVPR.2018.00291
[11]	Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. AND Chen, L.C. (2018) MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520. https://doi.org/10.1109/CVPR.2018.00474
[12]	Ma, N., Zhang, X., Zheng, H.T. and Sun, J. (2018) ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 122-138. https://doi.org/10.1007/978-3-030-01264-9_8
[13]	Howard, A.G., Zhu, M., Chen, B., et al. (2017) Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861.
[14]	Schwarz Schuler, J.P., Also, S.R., Puig, D., Rashwan, H. and Abdel-Nasser, M. (2022) An Enhanced Scheme for Reducing the Complexity of Pointwise Convolutions in CNNs for Image Classification Based on Interleaved Grouped Filters without Divisibility Constraints. Entropy, 24, Article 1264. https://doi.org/10.3390/e24091264
[15]	Sunkara, R. and Luo, T. (2022) No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In: Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P. and Tsoumakas, G., Eds., ECML PKDD 2022: Machine Learning and Knowledge Discovery in Databases, Springer, Cham, 443-459. https://doi.org/10.1007/978-3-031-26409-2_27
[16]	Ku, T., Yang, Q. and Zhang, H. (2021) Multilevel Feature Fusion Dilated Convolutional Network for Semantic Segmentation. International Journal of Advanced Robotic Systems, 18, 1-12. https://doi.org/10.1177/17298814211007665
[17]	Li, X., Song, D. and Dong, Y. (2020) Hierarchical Feature Fusion Network for Salient Object Detection. IEEE Transactions on Image Processing, 29, 9165-9175. https://doi.org/10.1109/TIP.2020.3023774
[18]	Schuler, J.P.S., Romani, S., Abdel-Nasser, M., Rashwan, H. and Puig, D. (2022) Grouped Pointwise Convolutions Reduce Parameters in Convolutional Neural Networks. Mendel, 28, 23-31. https://doi.org/10.13164/mendel.2022.1.023
[19]	Denil, M., Shakibi, B., Dinh, L., Ranzato, M. and De Freitas, N. (2013) Predicting Parameters in Deep Learning. arXiv: 1306.0543.
[20]	Hinton, G., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Network. arXiv: 1503.02531.
[21]	Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J. and Keutzer, K. (2016) SqueezeNet: AlexNet-Level Accuracy with 50× Fewer Parameters and <0.5 MB Model Size. arXiv: 1602.07360.
[22]	Tan, M., Chen, B., Pang, R., et al. (2019) Mnasnet: Platform-Aware Neural Architecture Search for Mobile. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 2815-2823. https://doi.org/10.1109/CVPR.2019.00293
[23]	Qian, S., Ning, C. and Hu, Y. (2021) MobileNetV3 for Image Classification. 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Nanchang, 26-28 March 2021, 490-497. https://doi.org/10.1109/ICBAIE52039.2021.9389905
[24]	Zhang, X., Zhou, X., Lin, M. and Sun, J. (2018) ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6848-6856. https://doi.org/10.1109/CVPR.2018.00716
[25]	Xia, H., Sun, W., Song, S. and Mou, X. (2020) Md-Net: Multi-Scale Dilated Convolution Network for CT Images Segmentation. Neural Processing Letters, 51, 2915-2927. https://doi.org/10.1007/s11063-020-10230-x
[26]	Liu, S., Huang, D. and Wang, Y. (2018) Receptive Field Block Net for Accurate and Fast Object Detection. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 404-419. https://doi.org/10.1007/978-3-030-01252-6_24
[27]	Lei, X., Pan, H. and Huang, X. (2019) A Dilated CNN Model for Image Classification. IEEE Access, 7, 124087-124095. https://doi.org/10.1109/ACCESS.2019.2927169
[28]	Wang, W., Hu, Y., Zou, T., Liu, H., Wang, J. and Wang, X. (2020) A New Image Classification Approach via Improved MobileNet Models with Local Receptive Field Expansion in Shallow Layers. Computational Intelligence and Neuroscience, 2020, Article ID: 8817849. https://doi.org/10.1155/2020/8817849
[29]	Sun, W., Zhang, X. and He, X. (2020) Lightweight Image Classifier Using Dilated and Depthwise Separable Convolutions. Journal of Cloud Computing, 9, Article No. 55. https://doi.org/10.1186/s13677-020-00203-9
[30]	Drossos, K., Mimilakis, S.I., Gharib, S., Li, Y. and Virtanen, T. (2020) Sound Event Detection with Depthwise Separable and Dilated Convolutions. 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, 19-24 July 2020, 1-7. https://doi.org/10.1109/IJCNN48605.2020.9207532
[31]	Xie, W., Jiao, L. and Hua, W. (2022) Complex-Valued Multi-Scale Fully Convolutional Network with Stacked-Dilated Convolution for PolSAR Image Classification. Remote Sensing, 14, Article 3737. https://doi.org/10.3390/rs14153737
[32]	Kaddar, B., Fizazi, H., Hernández-Cabronero, M., Sanchez, V. and Serra-Sagristà, J. (2021) DivNet: Efficient Convolutional Neural Network via Multilevel Hierarchical Architecture Design. IEEE Access, 9, 105892-105901. https://doi.org/10.1109/ACCESS.2021.3099952
[33]	Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2017) Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848. https://doi.org/10.1109/TPAMI.2017.2699184
[34]	Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J. (2017) Pyramid Scene Parsing Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6230-6239. https://doi.org/10.1109/CVPR.2017.660
[35]	Wang, G., Yuan, G., Li, T. and Lv, M. (2018) An Multi-Scale Learning Network with Depthwise Separable Convolutions. IPSJ Transactions on Computer Vision and Applications, 10, Article No. 1. https://doi.org/10.1186/s41074-017-0037-0
[36]	Huo, X., Sun, G., Tian, S., et al. (2024) HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification. Biomedical Signal Processing and Control, 87, Article ID: 105534. https://doi.org/10.1016/j.bspc.2023.105534
[37]	Olimov, B., Subramanian, B., Ugli, R.A.A., Kim, J.S. and Kim, J. (2023) Consecutive Multiscale Feature Learning-Based Image Classification Model. Scientific Reports, 13, Article No. 3595. https://doi.org/10.1038/s41598-023-30480-8
[38]	Lian, X., Pang, Y., Han, J. and Pan, J. (2021) Cascaded Hierarchical Atrous Spatial Pyramid Pooling Module for Semantic Segmentation. Pattern Recognition, 110, Article ID: 107622. https://doi.org/10.1016/j.patcog.2020.107622
[39]	Mehta, S., Rastegari, M., Shapiro, L. and Hajishirzi, H. (2019) ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9182-9192. https://doi.org/10.1109/CVPR.2019.00941
[40]	Ioffe, S. and Szegedy, C. (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv: 1502.03167.
[41]	Ramachandran, P., Zoph, B. and Le, Q.V. (2017) Searching for Activation Functions. arXiv: 1710.059417.
[42]	Krizhevsky, A. and Hinton, G. (2009) Learning Multiple Layers of Features from Tiny Images. https://scholar.google.com/scholar?q=Learning+Multiple+Layers+of+Features+from+Tiny+Images&hl=zh-CN&as_sdt=0&as_vis=1&oi=scholart
[43]	Rajaraman, S., Antani, S.K., Poostchi, M., et al. (2018) Pre-Trained Convolutional Neural Networks as Feature Extractors toward Improved Malaria Parasite Detection in Thin Blood Smear Images. PeerJ, 6, e4568. https://doi.org/10.7717/peerj.4568
[44]	Pogorelov, K., Randel, K.R., Griwodz, C., et al. (2017) Kvasir: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Disease Detection. Association for Computing Machinery, New York. https://doi.org/10.1145/3193289
[45]	Abadi, M., Agarwal, A., Barham, P., et al. (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://tensorflow.org/
[46]	Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv: 1412.6980.
[47]	Ge, R., Kakade, S.M., Kidambi, R. and Netrapalli, P. (2019) The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure for Least Squares. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, 8-14 December 2019, 14977-14988.
[48]	Glorot, X. and Bengio, Y. (2010) Understanding the Difficulty of Training Deep Feedforward Neural Networks. Journal of Machine Learning Research, 9, 249-256.
[49]	Hanhirova, J., Kamarainen, T., Seppala, S., Siekkinen, M., Hirvisalo, V. and Yla-Jaaski, A. (2018) Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision. Proceedings of the 9th ACM Multimedia Systems Conference, Amsterdam, 12-15 June 2018, 204-215. https://doi.org/10.1145/3204949.3204975

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413