全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

New Fusion Approach of Spatial and Channel Attention for Semantic Segmentation of Very High Spatial Resolution Remote Sensing Images

DOI: 10.4236/ojapps.2024.142020, PP. 288-319

Keywords: Spatial-Channel Attention, Super-Token Segmentation, Self-Attention, Vision Transformer

Full-Text   Cite this paper   Add to My Lib

Abstract:

The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requires considering spatial local context and long-term dependencies. To address this problem, the proposed approach is inspired by the MAC-UNet network which is an extension of U-Net, densely connected combined with channel attention. The advantages of this solution are as follows: 1) The new model introduces a new attention called propagate attention to build an attention-based encoder. 2) The fusion of multi-scale information is achieved by a weighted linear combination of the attentions whose coefficients are learned during the training phase. 3) Introducing in the decoder, the Spatial-Channel-Global-Local block which is an attention layer that uniquely combines channel attention and spatial attention locally and globally. The performances of the model are evaluated on 2 datasets WHDLD and DLRSD and show results of mean intersection over union (mIoU) index in progress between 1.54% and 10.47% for DLRSD and between 1.04% and 4.37% for WHDLD compared with the most efficient algorithms with attention mechanisms like MAU-Net and transformers like TMNet.

References

[1]  Chen, B., Xia, M. and Huang, J. (2021) MFANet: A Multi-Level Feature Aggregation Network for Semantic Segmentation of Land Cover. Remote Sensing, 13, Article 731.
https://doi.org/10.3390/rs13040731
[2]  Jensen, J.R., Qiu, F. and Patterson, K. (2001) A Neural Network Image Interpretation System to Extract Rural and Urban Land Use and Land Cover Information from Remote Sensor Data. Geocarto International, 16, 21-30.
https://doi.org/10.1080/10106040108542179
[3]  Wang, J., Zheng, Z., Ma, A., Lu, X. and Zhong, Y. (2021) Loveda: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation. ArXiv: 2110.08733.
[4]  Wang, L., Li, R., Zhang, C., Fang, S., Duan, C., Meng, X. and Atkinson, P.M. (2022) UNetFormer: A UNet-Like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 190, 196-214.
https://doi.org/10.1016/j.isprsjprs.2022.06.008
[5]  Zhang, T., Su, J., Liu, C. and Chen, W.-H. (2021) State and Parameter Estimation of the AquaCrop Model for Winter Wheat Using Sensitivity Informed Particle Filter. Computers and Electronics in Agriculture, 180, Article 105909.
https://doi.org/10.1016/j.compag.2020.105909
[6]  Witharana, C., Bhuiyan, M.A.E., Liljedahl, A.K., Kanevskiy, M., Epstein, H.E., Jones, B.M., Daanen, R., Griffin, C.G., Kent, K. and Jones, M.K.W. (2020) Understanding the Synergies of Deep Learning and Data Fusion of Multispectral and Panchromatic High Resolution Commercial Satellite Imagery for Automated Ice-Wedge Polygon Detection. ISPRS Journal of Photogrammetry and Remote Sensing, 170, 174-191.
https://doi.org/10.1016/j.isprsjprs.2020.10.010
[7]  Blake, A., Criminisi, A., Cross, G. and Kolmogorov, V. (2010). Image Segmentation of Foreground from Background Layers. US Patent 7,676,081.
[8]  Qi, S., Ma, J., Lin, J., Li, Y. and Tian, J. (2015) Unsupervised Ship Detection Based on Saliency and S-Hog Descriptor from Optical Satellite Images. IEEE Geoscience and Remote Sensing Letters, 12, 1451-1455.
https://doi.org/10.1109/LGRS.2015.2408355
[9]  Goncalves, H., Corte-Real, L. and Goncalves, J.A. (2011) Automatic Image Registration through Image Segmentation and Sift. IEEE Transactions on Geoscience and Remote Sensing, 49, 2589-2600.
https://doi.org/10.1109/TGRS.2011.2109389
[10]  Simonyan, K., Vedaldi, A. and Zisserman, A. (2013) Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. ArXiv: 1312.6034.
[11]  Ding, L., Lin, D., Lin, S., Zhang, J., Cui, X., Wang, Y., Tang, H. and Bruzzone, L. (2022) Looking outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 60, Article No. 4410313.
https://doi.org/10.1109/TGRS.2022.3168697
[12]  Yang, M.Y., Kumaar, S., Lyu, Y. and Nex, F. (2021) Real-Time Semantic Segmentation with Context Aggregation Network. ISPRS Journal of Photogrammetry and Remote Sensing, 178, 124-134.
https://doi.org/10.1016/j.isprsjprs.2021.06.006
[13]  Li, R., Zheng, S., Zhang, C., Duan, C., Wang, L. and Atkinson, P. M. (2021) ABCNet: Attentive Bilateral Contextual Network for Efficient Semantic Segmentation of Fine-Resolution Remotely Sensed Imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 181, 84-98.
https://doi.org/10.1016/j.isprsjprs.2021.09.005
[14]  Bahdanau, D., Cho, K. and Bengio, Y. (2014) Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv: 1409.0473.
[15]  Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017) Attention Is All You Need. In: Guyon, I., et al., Eds., Advances in Neural Information Processing Systems 30, Neural Information Processing Systems Foundation, Inc. (NeurIPS), Long Beach, 6000-6010.
[16]  Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. ArXiv: 2010.11929.
[17]  Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. and Adam, H. (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 801-818.
https://doi.org/10.1007/978-3-030-01234-2_49
[18]  Sun, Y., Bi, F., Gao, Y., Chen, L. and Feng, S. (2022) A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images. Symmetry, 14, Article 906.
https://doi.org/10.3390/sym14050906
[19]  Wang, G., Zhai, Q. and Lin, J. (2022) Multi-Scale Network for Remote Sensing Segmentation. IET Image Processing, 16, 1742-1751.
https://doi.org/10.1049/ipr2.12444
[20]  Li, R., Duan, C., Zheng, S., Zhang, C. and Atkinson, P.M. (2022) MACU-Net for Semantic Segmentation of Fine-Resolution Remotely Sensed Images. IEEE Geoscience and Remote Sensing Letters, 19, Article No. 8007205.
https://doi.org/10.1109/LGRS.2021.3052886
[21]  Song, C.H., Han, H.J. and Avrithis, Y. (2022) All the Attention You Need: Global-Local, Spatial-Channel Attention for Image Retrieval. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, 3-8 January 2022, 2754-2763.
https://doi.org/10.1109/WACV51458.2022.00051
[22]  Yang, F., Sun, Q., Jin, H. and Zhou, Z. (2020) Superpixel Segmentation with Fully Convolutional Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 13-19 June 2020, 13964-13973.
https://doi.org/10.1109/CVPR42600.2020.01398
[23]  Shao, Z., Yang, K. and Zhou, W. (2018) Performance Evaluation of Single-Label and Multi-Label Remote Sensing Image Retrieval Using a Dense Labeling Dataset. Remote Sensing, 10, Article 964.
https://doi.org/10.3390/rs10060964
[24]  Thanh Noi, P. and Kappas, M. (2017) Comparison of Random Forest, K-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery. Sensors, 18, Article 18.
https://doi.org/10.3390/s18010018
[25]  Han, B. (2015) Watershed Segmentation Algorithm Based on Morphological Gradient Reconstruction. 2015 2nd International Conference on Information Science and Control Engineering, Shanghai, 24-26 April 2015, 533-536.
https://doi.org/10.1109/ICISCE.2015.124
[26]  Radman, A., Zainal, N. and Suandi, S.A. (2017) Automated Segmentation of Iris Images Acquired in an Unconstrained Environment Using HOG-SVM and GrowCut. Digital Signal Processing, 64, 60-70.
https://doi.org/10.1016/j.dsp.2017.02.003
[27]  Blaschke, T. (2010) Object Based Image Analysis for Remote Sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 65, 2-16.
https://doi.org/10.1016/j.isprsjprs.2009.06.004
[28]  Carleer, A., Debeir, O. and Wolff, E. (2005) Assessment of Very High Spatial Resolution Satellite Image Segmentations. Photogrammetric Engineering & Remote Sensing, 71, 1285-1294.
https://doi.org/10.14358/PERS.71.11.1285
[29]  Kirillov, A., Girshick, R., He, K. and Dollár, P. (2019) Panoptic Feature Pyramid Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 6399-6408.
https://doi.org/10.1109/CVPR.2019.00656
[30]  Kotaridis, I. and Lazaridou, M. (2021) Remote Sensing Image Segmentation Advances: A Meta-Analysis. ISPRS Journal of Photogrammetry and Remote Sensing, 173, 309-322.
https://doi.org/10.1016/j.isprsjprs.2021.01.020
[31]  Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J. (2017) Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2881-2890.
https://doi.org/10.1109/CVPR.2017.660
[32]  Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, 5-9 October 2015, 234-241.
https://doi.org/10.1007/978-3-319-24574-4_28
[33]  Diakogiannis, F.I., Waldner, F., Caccetta, P. and Wu, C. (2020) ResUNet-a: A Deep Learning Framework for Semantic Segmentation of Remotely Sensed Data. ISPRS Journal of Photogrammetry and Remote Sensing, 162, 94-114.
https://doi.org/10.1016/j.isprsjprs.2020.01.013
[34]  Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N. and Liang, J. (2018) UNet++: A Nested U-Net Architecture for Medical Image Segmentation. DLMIA 2018, ML-CDS 2018: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Granada, 20 September 2018, 3-11.
https://doi.org/10.1007/978-3-030-00889-5_1
[35]  Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., Han, X., Chen, Y.-W. and Wu, J. (2020) UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 4-8 May 2020, 1055-1059.
https://doi.org/10.1109/ICASSP40776.2020.9053405
[36]  Dong, Z., Xu, K., Yang, Y., Bao, H., Xu, W. and Lau, R.W. (2021) Location-Aware Single Image Reflection Removal. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 5017-5026.
https://doi.org/10.1109/ICCV48922.2021.00497
[37]  Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al. (2018) Attention U-Net: Learning Where to Look for the Pancreas. ArXiv: 1804.03999.
[38]  Shi, H., Fan, J., Wang, Y. and Chen, L. (2021) Dual Attention Feature Fusion and Adaptive Context for Accurate Segmentation of Very High-Resolution Remote Sensing Images. Remote Sensing, 13, Article 3715.
https://doi.org/10.3390/rs13183715
[39]  Ding, X., Guo, Y., Ding, G. and Han, J. (2019) ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 1911-1920.
https://doi.org/10.1109/ICCV.2019.00200
[40]  Woo, S., Park, J., Lee, J.-Y. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 3-19.
https://doi.org/10.1007/978-3-030-01234-2_1
[41]  Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y. and Liu, W. (2019) CCNet: Criss-Cross Attention for Semantic Segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October 2-2 November 2019, 603-612.
https://doi.org/10.1109/ICCV.2019.00069
[42]  Yuan, Y., Huang, L., Guo, J., Zhang, C., Chen, X. and Wang, J. (2021) OCNet: Object Context for Semantic Segmentation. International Journal of Computer Vision, 129, 2375-2398.
https://doi.org/10.1007/s11263-021-01465-9
[43]  Strudel, R., Garcia, R., Laptev, I. and Schmid, C. (2021) Segmenter: Transformer for Semantic Segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 7262-7272.
https://doi.org/10.1109/ICCV48922.2021.00717
[44]  Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M. and Luo, P. (2021) Segformer: Simple and Efficient Design for Semantic Segmentation with Transformers. Advances in Neural Information Processing Systems, 34, 12077-12090.
[45]  Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S. and Guo, B. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 10012-10022.
https://doi.org/10.1109/ICCV48922.2021.00986
[46]  Panboonyuen, T., Jitkajornwanich, K., Lawawirojwong, S., Srestasathiern, P. and Vateekul, P. (2021) Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images. Remote Sensing, 13, Article 5100.
https://doi.org/10.3390/rs13245100
[47]  Sun, L., Zou, H., Wei, J., Cao, X., He, S., Li, M. and Liu, S. (2023) Semantic Segmentation of High-Resolution Remote Sensing Images Based on Sparse Self-Attention and Feature Alignment. Remote Sensing, 15, Article 1598.
https://doi.org/10.3390/rs15061598
[48]  Liu, T., Luo, R., Xu, L., Feng, D., Cao, L., Liu, S. and Guo, J. (2022) Spatial Channel Attention for Deep Convolutional Neural Networks. Mathematics, 10, Article 1750.
https://doi.org/10.3390/math10101750
[49]  Yu, F. and Koltun, V. (2015) Multi-Scale Context Aggregation by Dilated Convolutions. ArXiv: 1511.07122.
[50]  Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z. (2016) Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 2818-2826.
https://doi.org/10.1109/CVPR.2016.308
[51]  Huang, H., Zhou, X., Cao, J., He, R. and Tan, T. (2023) Vision Transformer with Super Token Sampling. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 17-24 June 2023, 22690-22699.
[52]  Shaw, P., Uszkoreit, J. and Vaswani, A. (2018) Self-Attention with Relative Position Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2, 464-468.
https://doi.org/10.18653/v1/N18-2074
[53]  Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A. and Shlens, J. (2019) Stand-Alone Self-Attention in Vision Models. In: Wallach, H., et al., Eds., Advances in Neural Information Processing Systems 32, NeurIPS 2019, Vancouver, 68-80.
[54]  Lin, T.-Y., Goyal, P., Girshick, R., He, K. and Dollár, P. (2017) Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 2980-2988.
https://doi.org/10.1109/ICCV.2017.324
[55]  Jadon, S. (2020) A Survey of Loss Functions for Semantic Segmentation. 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Via del Mar, 27-29 October 2020, 1-7.
https://doi.org/10.1109/CIBCB48159.2020.9277638
[56]  Shao, Z., Zhou, W., Deng, X., Zhang, M. and Cheng, Q. (2020) Multilabel Remote Sensing Image Retrieval Based on Fully Convolutional Network. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 318-328.
https://doi.org/10.1109/JSTARS.2019.2961634
[57]  Yang, Y. and Newsam, S. (2010) Bag-of-Visual-Words and Spatial Extensions for Land-Use Classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, 2-5 November 2010, 270-279.
https://doi.org/10.1145/1869790.1869829
[58]  Kingma, D. P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. ArXiv: 1412.6980.
[59]  Loshchilov, I. and Hutter, F. (2016) Sgdr: Stochastic Gradient Descent with Warm Restarts. ArXiv: 1608.03983.
[60]  Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S. and Jorge Cardoso, M. (2017) Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations. DLMIA 2017, ML-CDS 2017: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Québec City, 14 September 2017, 240-248.
https://doi.org/10.1007/978-3-319-67558-9_28
[61]  Sokolova, M., Japkowicz, N. and Szpakowicz, S. (2006) Beyond Accuracy, F-Score and Roc: A Family of Discriminant Measures for Performance Evaluation. Australasian Joint Conference on Artificial Intelligence, Hobart, 4-8 December 2006, 1015-1021.
https://doi.org/10.1007/11941439_114
[62]  Sravya, N., Lal, S., Nalini, J., Reddy, C.S., Dell’Acqua, F., et al. (2022) Dppnet: An Efficient and Robust Deep Learning Network for Land Cover Segmentation from High-Resolution Satellite Images. IEEE Transactions on Emerging Topics in Computational Intelligence, 7, 128-139.
https://doi.org/10.1109/TETCI.2022.3182414
[63]  Qi, X., Wu, Y., Mao, Y., Zhang, W. and Zhang, Y. (2023) Self-Guided Few-Shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models. ArXiv: 2311.13200.
[64]  Jia, J., Song, J., Kong, Q., Yang, H., Teng, Y. and Song, X. (2023) Multi-Attention-Based Semantic Segmentation Network for Land Cover Remote Sensing Images. Electronics, 12, Article 1347.
https://doi.org/10.3390/electronics12061347
[65]  Gu, J., Kwon, H., Wang, D., Ye, W., Li, M., Chen, Y.-H., Lai, L., Chandra, V. and Pan, D.Z. (2022) Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, 18-24 June 2022, 12094-12103.
https://doi.org/10.1109/CVPR52688.2022.01178
[66]  Gao, Y., Zhang, S., Zuo, D., Yan, W. and Pan, X. (2023) TMNet: A Two-Branch Multi-Scale Semantic Segmentation Network for Remote Sensing Images. Sensors, 23, Article 5909.
https://doi.org/10.3390/s23135909
[67]  Zhang, Z., Liu, B. and Li, Y. (2023) FURSformer: Semantic Segmentation Network for Remote Sensing Images with Fused Heterogeneous Features. Electronics, 12, Article 3113.
https://doi.org/10.3390/electronics12143113

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133