全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Lexicon and Deep Learning-Based Approaches in Sentiment Analysis on Short Texts

DOI: 10.4236/jcc.2024.121002, PP. 11-34

Keywords: Opinion Mining, Lexicon Analysis, Twitter Data, LSTM, Machine Learning

Full-Text   Cite this paper   Add to My Lib

Abstract:

Social media is an essential component of our personal and professional lives. We use it extensively to share various things, including our opinions on daily topics and feelings about different subjects. This sharing of posts provides insights into someone’s current emotions. In artificial intelligence (AI) and deep learning (DL), researchers emphasize opinion mining and analysis of sentiment, particularly on social media platforms such as Twitter (currently known as X), which has a global user base. This research work revolves explicitly around a comparison between two popular approaches: Lexicon-based and Deep learning-based Approaches. To conduct this study, this study has used a Twitter dataset called sentiment140, which contains over 1.5 million data points. The primary focus was the Long Short-Term Memory (LSTM) deep learning sequence model. In the beginning, we used particular techniques to preprocess the data. The dataset is divided into training and test data. We evaluated the performance of our model using the test data. Simultaneously, we have applied the lexicon-based approach to the same test data and recorded the outputs. Finally, we compared the two approaches by creating confusion matrices based on their respective outputs. This allows us to assess their precision, recall, and F1-Score, enabling us to determine which approach yields better accuracy. This research achieved 98% model accuracy for deep learning algorithms and 95% model accuracy for the lexicon-based approach.

References

[1]  Kwon, H.J., Ban, H.J., Jun, J.K. and Kim, H.S. (2021) Topic Modeling and Sentiment Analysis of Online Review for Airlines. Information, 12, Article No. 78.
https://doi.org/10.3390/info12020078
[2]  Catelli, R., Pelosi, S. and Esposito, M. (2022) Lexicon-Based vs. Bert-Based Sentiment Analysis: A Comparative Study in Italian. Electronics, 11, Article No. 374.
https://doi.org/10.3390/electronics11030374
[3]  Rahman, M.H., Islam, T., Rana, M.M., Tasnim, R., Mona, T.R. and Sakib, M.M. (2023) Machine Learning Approach on Multiclass Classification of Internet Firewall Log Files. Proceedings of International Conference on Computational Intelligence and Sustainable Engineering Solution, CISES 2023, Greater Noida, 28-30 April 2023, 358-364.
https://doi.org/10.1109/CISES58720.2023.10183601
[4]  Islam, M.T., Ahmed, T., Raihanur Rashid, A.B.M., Islam, T., Rahman, M.S. and Tarek Habib, M. (2022) Convolutional Neural Network Based Partial Face Detection. 2022 IEEE 7th International Conference for Convergence in Technology, I2CT, Mumbai, 7-9 April 2022, 1-6.
https://doi.org/10.1109/I2CT54291.2022.9825259
[5]  Khan, H.U., Nasir, S., Nasim, K., Shabbir, D. and Mahmood, A. (2021) Twitter Trends: A Ranking Algorithm Analysis on Real Time Data. Expert Systems with Applications, 164, Article ID: 113990.
https://doi.org/10.1016/j.eswa.2020.113990
[6]  Birjali, M., Kasri, M. and Beni-Hssane, A. (2021) A Comprehensive Survey on Sentiment Analysis: Approaches, Challenges and Trends. Knowledge-Based Systems, 226, Article ID: 107134.
https://doi.org/10.1016/j.knosys.2021.107134
[7]  Bhowmik, N.R., Arifuzzaman, M. and Mondal, M.R.H. (2022) Sentiment Analysis on Bangla Text Using Extended Lexicon Dictionary and Deep Learning Algorithms. Array, 13, Article ID: 100123.
https://doi.org/10.1016/j.array.2021.100123
[8]  Talukder, M.S.H., Bin Sulaiman, R., Chowdhury, M.R., Nipun, M.S. and Islam, T. (2023) PotatoPestNet: A CTInceptionV3-RS-Based Neural Network for Accurate Identification of Potato Pests. Smart Agricultural Technology, 5, Article ID: 100297.
https://doi.org/10.1016/j.atech.2023.100297
[9]  Tahosin, M.S., Sheakh, M.A., Islam, T., Lima, R.J. and Begum, M. (2023) Optimizing Brain Tumor Classification through Feature Selection and Hyperparameter Tuning in Machine Learning Models. Informatics in Medicine Unlocked, 43, Article ID: 101414.
https://doi.org/10.1016/j.imu.2023.101414
[10]  Gulati, K., Saravana Kumar, S., Sarath Kumar Boddu, R., Sarvakar, K., Kumar Sharma, D. and Nomani, M.Z.M. (2022) Comparative Analysis of Machine Learning-Based Classification Models Using Sentiment Classification of Tweets Related to COVID-19 Pandemic. Materials Today: Proceedings, 51, 38-41.
https://doi.org/10.1016/j.matpr.2021.04.364
[11]  Sani, M., Ahmad, A. and Abdulazeez, H.S. (2022) Sentiment Analysis of Hausa Language Tweet Using Machine Learning Approach. Journal of Research in Applied Mathematics, 8, 7-16.
[12]  ur Rehman, M. and Bashir, M. (2023) Sentiment Analysis on Disputed Territory Discrepancies Using Machine Learning-Based Text Mining Approach. VFAST Transactions on Software Engineering, 11, 17-25.
[13]  Imanina Zabha, N., Ayop, Z., Anawar, S., Hamid, E. and Zainal Abidin, Z. (2019) Developing Cross-Lingual Sentiment Analysis of Malay Twitter Data Using Lexicon-Based Approach. International Journal of Advanced Computer Science and Applications, 10, 346-351.
http://www.ijacsa.thesai.org
https://doi.org/10.14569/IJACSA.2019.0100146
[14]  Mitra, A. (2020) Sentiment Analysis Using Machine Learning Approaches (Lexicon Based on Movie Review Dataset). Journal of Ubiquitous Computing and Communication Technologies, 2, 145-152.
https://doi.org/10.36548/jucct.2020.3.004
[15]  Hajrahimova, M.S. and Ismaylova, M.I. (2021) Machine Learning-Based Sentiment Analysis of Twitter Data.
[16]  Sham, N.M. and Mohamed, A. (2022) Climate Change Sentiment Analysis Using Lexicon, Machine Learning and Hybrid Approaches. Sustainability, 14, Article No. 4723.
https://doi.org/10.3390/su14084723
[17]  Ainapure, B.S., et al. (2023) Sentiment Analysis of COVID-19 Tweets Using Deep Learning and Lexicon-Based Approaches. Sustainability, 15, Article No. 2573.
https://doi.org/10.3390/su15032573
[18]  Srivastava, R., Bharti, P.K. and Verma, P. (2022) Comparative Analysis of Lexicon and Machine Learning Approach for Sentiment Analysis. International Journal of Advanced Computer Science and Applications, 13, 71-77.
https://doi.org/10.14569/IJACSA.2022.0130312
[19]  Zvonarev, A. (2019) A Comparison of Machine Learning Methods of Sentiment Analysis Based on Russian Language Twitter Data.
[20]  Braig, N., Benz, A., Voth, S., Breitenbach, J. and Buettner, R. (2023) Machine Learning Techniques for Sentiment Analysis of COVID-19-Related Twitter Data. IEEE Access, 11, 14778-14803.
https://doi.org/10.1109/ACCESS.2023.3242234
[21]  Go, A., Bhayani, R. and Huang, L. (2009) Twitter Sentiment Classification Using Distant Supervision.
https://www-cs-faculty.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf
[22]  Islam, T., et al. (2023) Review Analysis of Ride-Sharing Applications Using Machine Learning Approaches: Bangladesh Perspective. In: Harjule, P., Rahman, A., Agarwal, B. and Tiwari, V., Eds., Computational Statistical Methodologies and Modeling for Artificial Intelligence, CRC Press, Boca Raton, 99-122.
https://doi.org/10.1201/9781003253051-7
[23]  Shaukat, S., Asad, M. and Akram, A. (2023) Developing an Urdu Lemmatizer Using a Dictionary-Based Lookup Approach. Applied Sciences, 13, Article No. 5103.
https://doi.org/10.3390/app13085103
[24]  Almuzaini, H.A. and Azmi, A.M. (2020) Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization. IEEE Access, 8, 127913-127928.
https://doi.org/10.1109/ACCESS.2020.3009217
[25]  HaCohen-Kerner, Y., Miller, D. and Yigal, Y. (2020) The Influence of Preprocessing on Text Classification Using a Bag-of-Words Representation. PLOS ONE, 15, e0232525.
https://doi.org/10.1371/journal.pone.0232525
[26]  Sharif, O., Hasan, M.Z. and Rahman, A. (2022) Determining an Effective Short-Term COVID-19 Prediction Model in ASEAN Countries. Scientific Reports, 12, Article No. 5083.
https://doi.org/10.1038/s41598-022-08486-5
[27]  Bezdan, T., et al. (2021) Hybrid Fruit-Fly Optimization Algorithm with K-Means for Text Document Clustering. Mathematics, 9, Article No. 1929.
https://doi.org/10.3390/math9161929
[28]  Cagri, T., Halit, Y., Furkan, Ş. and Oguzhan, O. (2023) Impact of Tokenization on Language Models: An Analysis for Turkish. ACM Transactions on Asian and Low-Resource Language Information Processing, 22, Article No. 116.
https://doi.org/10.1145/3578707
[29]  Sharif, O., et al. (2022) Analyzing the Impact of Demographic Variables on Spreading and Forecasting COVID-19. Journal of Healthcare Informatics Research, 6, 72-90.
https://doi.org/10.1007/s41666-021-00105-8
[30]  Islam, T., Kundu, A., Ahmed, T. and Khan, N.I. (2022) Analysis of Arrhythmia Classification on ECG Dataset. 2022 IEEE 7th International conference for Convergence in Technology, I2CT, Mumbai, 7-9 April 2022, 1-6.
https://doi.org/10.1109/I2CT54291.2022.9825052
[31]  Fischer, F., Birk, A., Somers, P., Frenner, K., Tarín, C. and Herkommer, A. (2022) Fea-Sel-Net: A Recursive Feature Selection Callback in Neural Networks. Machine Learning and Knowledge Extraction, 4, 968-993.
https://doi.org/10.3390/make4040049
[32]  Song, C., Wang, X.K., Cheng, P.F., Wang, J.Q. and Li, L. (2020) SACPC: A Framework Based on Probabilistic Linguistic Terms for Short Text Sentiment Analysis. Knowledge-Based Systems, 194, Article ID: 105572.
https://doi.org/10.1016/j.knosys.2020.105572
[33]  Li, H., Chen, Q., Zhong, Z., Gong, R. and Han, G. (2022) E-Word of Mouth Sentiment Analysis for User Behavior Studies. Information Processing & Management, 59, Article ID: 102784.
https://doi.org/10.1016/j.ipm.2021.102784
[34]  Hota, H.S., Sharma, D.K. and Verma, N. (2021) Lexicon-Based Sentiment Analysis Using Twitter Data: A Case of COVID-19 Outbreak in India and Abroad. In: Kose, U., et al., Eds., Data Science for COVID-19, Elsevier, Amsterdam, 275-295.
https://doi.org/10.1016/B978-0-12-824536-1.00015-0
[35]  Abiola, O., Abayomi-Alli, A., Tale, O.A., Misra, S. and Abayomi-Alli, O. (2023) Sentiment Analysis of COVID-19 Tweets from Selected Hashtags in Nigeria Using VADER and Text Blob Analyser. Journal of Electrical Systems and Information Technology, 10, Article No. 5.
https://doi.org/10.1186/s43067-023-00070-9
[36]  Islam, T., Kundu, A., Islam Khan, N., Chandra Bonik, C., Akter, F. and Jihadul Islam, M. (2022) Machine Learning Approaches to Predict Breast Cancer: Bangladesh Perspective. Smart Innovation, Systems and Technologies, 302, 291-305.
https://doi.org/10.1007/978-981-19-2541-2_23
[37]  Islam, T., Hosen, M.A., Mony, A., Hasan, M.T., Jahan, I. and Kundu, A. (2022) A Proposed Bi-LSTM Method to Fake News Detection. 2022 International Conference for Advancement in Technology, Goa, 21-22 January 2022, 1-5.
https://doi.org/10.1109/ICONAT53423.2022.9725937
[38]  Sheakh, M.A., Sazia Tahosin, M., Hasan, M.M., Islam, T., Islam, O. and Rana, M.M. (2023) Child and Maternal Mortality Risk Factor Analysis Using Machine Learning Approaches. ISDFS 2023—11th International Symposium on Digital Forensics and Security, Chattanooga, 11-12 May 2023, 1-6.
https://doi.org/10.1109/ISDFS58141.2023.10131826
[39]  Hasan, M., Tahosin, M.S., Farjana, A., Sheakh, M.A. and Hasan, M.M. (2023) A Harmful Disorder: Predictive and Comparative Analysis for Fetal Anemia Disease by Using Different Machine Learning Approaches. ISDFS 2023—11th International Symposium on Digital Forensics and Security, Chattanooga, 11-12 May 2023, 1-6.
https://doi.org/10.1109/ISDFS58141.2023.10131838

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413