全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

State Space Models Based Efficient Long Documents Classification

DOI: 10.4236/jilsa.2024.163009, PP. 143-154

Keywords: Mamba, Transformer, NLP

Full-Text   Cite this paper   Add to My Lib

Abstract:

Large language models like Generative Pretrained Transformer (GPT) have significantly advanced natural language processing (NLP) in recent times. They have excelled in tasks such as language translation question answering and text generation. However, their effectiveness is limited by the quadratic training complexity of Transformer models O (L2), which makes it challenging to handle complex tasks like classifying long documents. To overcome this challenge researchers have explored architectures and techniques such as sparse attention mechanisms, hierarchical processing and efficient attention modules. A recent innovation called Mamba based on a state space model approach offers inference speed and scalability in sequence length due to its unique selection mechanism. By incorporating this selection mechanism Mamba allows for context reasoning and targeted focus on particular inputs thereby reducing computational costs and enhancing performance. Despite its advantages, the application of Mamba in long document classification has not been thoroughly investigated. This study aims to fill this gap by developing a Mamba-based model, for long document classification and assessing its efficacy across four datasets; Hyperpartisan, 20 Newsgroups, EURLEX and CMU Book Summary. Our study reveals that the Mamba model surpasses NLP models such as BERT and Longformer showcasing exceptional performance and highlighting Mamba’s efficiency in handling lengthy document classification tasks. These results hold implications for NLP applications empowering advanced language models to address challenging tasks with extended sequences and enhanced effectiveness. This study opens doors for the exploration of Mamba’s abilities and its potential utilization, across diverse NLP domains.

References

[1]  Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[2]  Sherstinsky, A. (2020) Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica D: Nonlinear Phenomena, 404, Article ID: 132306.
https://doi.org/10.1016/j.physd.2019.132306
[3]  Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2018) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, Minneapolis, 2-7 June 2019, 4171-4186.
[4]  Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y. and Dehak, N. (2019) Hierarchical Transformers for Long Document Classification. 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore, 14-18 December 2019, 838-844.
https://doi.org/10.1109/ASRU46091.2019.9003958
[5]  Beltagy, I., Peters, M.E. and Cohan, A. (2020) Longformer: The Long-Document Transformer.
[6]  Gu, A. and Dao, T. (2023) Mamba: Linear-Time Sequence Modeling with Selective State Spaces.
[7]  Gu, A., Goel, K., et al. (2021) Efficiently Modeling Long Sequences with Structured State Spaces.
[8]  Park, H.H., Vyas, Y. and Shah, K. (2022) Efficient Classification of Long Documents Using Transformers. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Volume 2, 702-709.
https://doi.org/10.18653/v1/2022.acl-short.79
[9]  Kiesel, J., Mestre, M., Shukla, R., Vincent, E., Adineh, P., Corney, D., Stein, B. and Potthast, M. (2019) SemEval-2019 Task 4: Hyperpartisan News Detection. Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, June 2019, 829-839.
https://doi.org/10.18653/v1/S19-2145
[10]  Lang, K. (1995) NewsWeeder: Learning to Filter Netnews. Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, 9-12 July 1995, 331-339.
https://doi.org/10.1016/B978-1-55860-377-6.50048-7
[11]  Chalkidis, I., Fergadiotis, E., Malakasiotis, P. and Androutsopoulos, I. (2019) Large-Scale Multi-Label Text Classification on EU Legislation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, July 2019, 6314-6322.
https://doi.org/10.18653/v1/P19-1636
[12]  Bamman, D. and Smith, N.A. (2013) New Alignment Methods for Discriminative Book Summarization.
[13]  Liu, C.-Z., Sheng, Y.-X., Wei, Z.-Q. and Yang, Y.-Q. (2018) Research of Text Classification Based on Improved TF-IDF Algorithm. 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE), Lanzhou, 24-27 August 2018, 218-222.
https://doi.org/10.1109/IRCE.2018.8492945
[14]  Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, October 2014, 1746-1751.
https://doi.org/10.3115/v1/D14-1181
[15]  Liu, Y.H., Ott, M., Goyal, N., Du, J.F., Joshi, M., Chen, D.Q., Levy, O., Lewis, M., Zettlemoyer, L. and Stoyanov, V. (2019) RoBERTa: A Robustly Optimized BERT Pretraining Approach.
[16]  Yang, Z.L., Dai, Z.H., Yang, Y.M., Carbonell, J., Salakhutdinov, R. and Le, Q.V. (2019) XLNet: Generalized Autoregressive Pretraining for Language Understanding.
[17]  Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., et al. (2020) Language Models Are Few-Shot Learners.
[18]  Yang, Z.C., Yang, D.Y., Dyer, C., He, X.D., Smola, A. and Hovy, E. (2016) Hierarchical Attention Networks for Document Classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, June 2016, 1480-1489.
https://doi.org/10.18653/v1/N16-1174
[19]  Mihalcea, R. and Tarau, P. (2004) TextRank: Bringing Order into Text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, 25-26 July 2004, 404-411.
[20]  Ding, M., Zhou, C., Yang, H.X. and Tang, J. (2020) CogLTX: Applying BERT to Long Texts. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, 6-12 December 2020, 12792-12804.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133