全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

探索中文文本中实体关系智能提取:一种基于数据与模型协同优化的新方法
Exploring Intelligent Entity Relationship Extraction in Chinese Text: A New Method Based on Data and Model Collaborative Optimization

DOI: 10.12677/airr.2024.132044, PP. 425-440

Keywords: 命名实体识别,关系提取,深度学习,双向长短期记忆网络,注意力机制
Named Entity Recognition
, Relation Extraction, Deep Learning, BiLSTM, Attention Mechanism

Full-Text   Cite this paper   Add to My Lib

Abstract:

本文旨在解决从非结构化的中文文本中提取实体和关系的问题,重点关注命名实体识别(NER)和关系提取(RE)所面临的挑战。为了增强识别与提取能力,我们设计了一个管道模型,分别应用于NER和RE,并整合了外部词典信息以及中文语义信息。我们还引入了一种创新的NER模型,结合了中文拼音、字符和词语的特征。此外,我们利用实体距离、句子长度和词性等信息来提高关系提取的性能。本文经过深入研究数据、模型和推理算法之间的关联作用,以提高解决这一挑战的学习效率。通过与现有多个方法的实验结果对比,我们的模型取得了显著的成果。
This paper aims to address the problem of extracting entities and relationships from unstructured Chinese text, focusing on the challenges faced in Named Entity Recognition (NER) and Relation Extraction (RE). To enhance recognition and extraction capabilities, we designed a pipeline model specifically for NER and RE, integrating external dictionary information as well as Chinese semantic information. We also introduced an innovative NER model that combines features of Chinese pinyin, characters, and words. Furthermore, we utilized information such as entity distance, sentence length, and part-of-speech to improve the performance of relation extraction. We delved into the interplay between data, models, and inference algorithms to improve the learning efficiency in tackling this challenge. Compared to existing methods, our model has achieved significant results.

References

[1]  焦凯楠, 李欣, 朱容辰. 中文领域命名实体识别综述[J]. 计算机工程与应用, 2021, 57(16): 1-15.
[2]  刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(3): 329-340.
[3]  康怡琳, 孙璐冰, 朱容波, 等. 深度学习中文命名实体识别研究综述[J]. 华中科技大学学报(自然科学版), 2022, 50(11): 44-53.
https://doi.org/10.13245/j.hust.221104
[4]  钟诗胜, 陈曦, 赵明航, 等. 引入词集级注意力机制的中文命名实体识别方法[J]. 吉林大学学报(工学版), 2022, 52(5): 1098-1105.
https://doi.org/10.13229/j.cnki.jdxbgxb20200984
[5]  何玉洁, 杜方, 史英杰, 等. 基于深度学习的命名实体识别研究综述[J]. 计算机工程与应用, 2021, 57(11): 21-36.
[6]  谢文芮. 融合字符多语义特征的命名实体识别研究与实现[D]: [硕士学位论文]. 无锡: 江南大学, 2022.
https://doi.org/10.27169/d.cnki.gwqgu.2022.000329
[7]  Cui, M., Li, L., Wang, Z., et al. (2017) A Survey on Relation Extraction. Language, Knowledge, and Intelligence: Second China Conference, CCKS 2017, Chengdu, 26-29 August 2017, 50-58.
[8]  Xiong, S., Li, B. and Zhu, S. (2023) DCGNN: A Single-Stage 3D Object Detection Network Based on Density Clustering and Graph Neural Network. Complex & Intelligent Systems, 9, 3399-3408.
https://doi.org/10.1007/s40747-022-00926-z
[9]  Zeng, D., Liu, K., Chen, Y. and Zhao, J. (2015) Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 17-21 September 2015, 1753-1762.
https://doi.org/10.18653/v1/D15-1203
[10]  Devlin, J., Chang, M.W., Lee, K., et al. (2018) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding.
[11]  Yang, H. (2019) Bert Meets Chinese Word Segmentation.
[12]  Meng, Y., Wu, W., Wang, F., Li, X., Nie, P., Yin, F., Li, M., Han, Q., Sun, X. and Li, J. (2019) Glyce: Glyph-Vectors for Chinese Character Representations. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, 8-14 December 2019, 2746-2757.
[13]  Peng, M., Ma, R., Zhang, Q. and Huang, X. (2019) Simplify the Usage of Lexicon in Chinese NER. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Florence, 28 July-2 August 2019, 5951-5960.
[14]  Liu, W., Fu, X., Zhang, Y. and Xiao, W. (2021) Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 1-6 August 2021, 5847-5858.
https://doi.org/10.18653/v1/2021.acl-long.454
[15]  Zhang, Z., Zhang, H., Chen, K., Guo, Y., Hua, J., Wang, Y. and Zhou, M. (2021) Mengzi: Towards Lightweight yet Ingenious Pre-Trained Models for Chinese.
[16]  Lafferty, J.D., McCallum, A. and Pereira, F. (2001) Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the International Conference on Machine Learning, Williamstown, 28 June-1 July 2001, 282-289.
[17]  彭佳元. 融合字形特征的中文命名实体识别方法研究[D]: [硕士学位论文]. 上海: 上海交通大学, 2019.
https://doi.org/10.27307/d.cnki.gsjtu.2019.004104
[18]  Xiao, L. and Pennington, J. (2022) Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm. Proceedings of the 39th International Conference on Machine Learning, Baltimore, 17-23 July 2022, 24347-24369.
[19]  Song, Y., Shi, S., Li, J. and Zhang, H. (2018) Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, 1-6 June 2018, 175-180.
https://doi.org/10.18653/v1/N18-2028
[20]  Sun, M., Chen, X., Zhang, K., Guo, Z. and Liu, Z. (2016) THULAC: An Efficient Lexical Analyzer for Chinese.
https://github.com/thunlp/thulac
[21]  Li, B., Lu, Y., Pang, W., et al. (2023) Image Colorization Using CycleGAN with Semantic and Spatial Rationality. Multimedia Tools and Applications, 82, 21641-21655.
https://doi.org/10.1007/s11042-023-14675-9
[22]  Peng, N. and Dredze, M. (2015) Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 17-21 September 2015, 548-554.
https://doi.org/10.18653/v1/D15-1064
[23]  Zhang, Y. and Yang, J. (2018) Chinese NER Using Lattice LSTM. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, 15-20 July 2018, 1554-1564.
https://doi.org/10.18653/v1/P18-1144
[24]  Levow, G.-A. (2006) The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition. Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, 22-23 July 2006, 108-117.
[25]  Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M. and Liu, Q. (2019) ERNIE: Enhanced Language Representation with Informative Entities. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, 28 July-2 August 2019, 1441-1451.
https://doi.org/10.18653/v1/P19-1139
[26]  Sun, Z., Li, X., Sun, X., Meng, Y., Ao, X., He, Q., Wu, F. and Li, J. (2021) ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 1-6 August 2021, 2065-2075.
https://doi.org/10.18653/v1/2021.acl-long.161
[27]  Li, S., He, W., Shi, Y., Jiang, W., Liang, H., Jiang, Y., Zhang, Y., Lyu, Y. and Zhu, Y. (2019) DuIE: A Large-Scale Chinese Dataset for Information Extraction. Proceedings of the Natural Language Processing and Chinese Computing, Dunhuang, 9-14 October 2019, 791-800.
https://doi.org/10.1007/978-3-030-32236-6_72
[28]  Xu, J., Wen, J., Sun, X. and Su, Q. (2017) A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text.
[29]  Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D.D.L., Hendricks, L.A., Welbl, J., Clark, A., et al. (2022) Training Compute-Optimal Large Language Models.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413