全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Image Classification Based on Vision Transformer

DOI: 10.4236/jcc.2024.124005, PP. 49-59

Keywords: Convolutional Neural Networks, ViT, CNN, Deep Learning, Architecture

Full-Text   Cite this paper   Add to My Lib

Abstract:

This research introduces an innovative approach to image classification, by making use of Vision Transformer (ViT) architecture. In fact, Vision Transformers (ViT) have emerged as a promising option for convolutional neural networks (CNN) for image analysis tasks, offering scalability and improved performance. Vision transformer ViT models are able to capture global dependencies and link among elements of images. This leads to the enhancement of feature representation. When the ViT model is trained on different models, it demonstrates strong classification capabilities across different image categories. The ViT’s ability to process image patches directly, without relying on spatial hierarchies, streamlines the classification process and improves computational efficiency. In this research, we present a Python implementation using TensorFlow to employ the (ViT) model for image classification. Four categories of animals such as (cow, dog, horse and sheep) images will be used for classification. The (ViT) model is used to extract meaningful features from images, and a classification head is added to predict the class labels. The model is trained on the CIFAR-10 dataset and evaluated for accuracy and performance. The findings from this study will not only demonstrate the effectiveness of the Vision Transformer model in image classification tasks but also its potential as a powerful tool for solving complex visual recognition problems. This research fills existing gaps in knowledge by introducing a novel approach that challenges traditional convolutional neural networks (CNNs) in the field of computer vision. While CNNs have been the dominant architecture for image classification tasks, they have limitations in capturing long-range dependencies in image data and require hand-designed hierarchical feature extraction.

References

[1]  Dos, S. (2021) A Review of Convolutional Neural Networks for Image Classification. Journal of Computer Vision, 10, 45-67.
[2]  Touvron (2021) Recent Advances in Image Classification Using Convolutional Neural Networks. International Conference on Computer Vision Proceedings, Paris, 15-18 November 2021, 112-125.
[3]  Krizhevsky (2012) Image Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems, 25, 112-125.
[4]  Dosovitskiy, A. (2021) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New York City, 23-26 June 2021, 45-67.
[5]  Touvron (2020) Transformer in Image Recognition: ViT Models for Image Classification. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, 14-18 September 2020, 112-125.
[6]  LeCun (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324.
https://doi.org/10.1109/5.726791
[7]  Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D. and Rabinovich, A. (2014) Going Deeper with Convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 07-12 June 2015.
https://doi.org/10.1109/CVPR.2015.7298594
[8]  He, K., Zhang, X., Ren, S. and Sun, J. (2015) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778.
https://doi.org/10.1109/CVPR.2016.90
[9]  Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N. and Polosukhin, I. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, California, 4-9 December 2017, 5998-6008.
https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413