|
Mine Engineering 2024
基于N-gram剪枝技术的隐患文本自动评估模型
|
Abstract:
为了自动分析海上钻井平台隐患文本中蕴含的隐患响应程度信息,量化隐患严重程度,提出一种基于N-gram词袋向量的隐患响应等级量化评估模型。首先针对1565条钻井平台的现场隐患记录进行分词与过滤处理;其次再以N-gram作为特征单元重塑词袋维度;然后提出使用逆TF-IDF值来强化特征值;最后,使用朴素贝叶斯构建隐患量化模型。结果表明:使用该方法的隐患量化评估模型具有较高的精确率、召回率及F1值。
To automatically analyze the response level information of hidden dangers contained in hidden danger texts and quantify the severity, a quantitative evaluation model based on N-gram word bag vectors is proposed for the response level of hidden dangers. Firstly, segment and filter the on-site hazard records of 1565 drilling platforms; Secondly, using N-gram as feature units to reshape the bag of words dimension; Then, it is proposed to use the inverse TF-IDF value to enhance the feature values; Finally, use naive Bayes to construct a hazard quantification model. The results show that the hazard quantification evaluation model using this method has high accuracy, recall, and F1 value.
[1] | 崔青. 海洋平台发展现状及前景[J]. 石化技术, 2018, 25(6): 213. |
[2] | 何沙, 陈东升, 朱林, 姬荣斌. 海上钻井平台安全风险预警模型应用研究[J]. 中国安全生产科学技术, 2012, 8(4): 148-154. |
[3] | 赵京胜, 宋梦雪, 高祥. 自然语言处理发展及应用综述[J]. 信息技术与信息化, 2019(7): 142-145. |
[4] | Zhi, Y.Z., Bo, F., Hang, Q., Yan, L.Z. and Xiao, B.L. (2017) Modeling Medical Texts for Distributed Representations Based on Skip-Gram Model. 2017 3rd International Conference on Information Management (ICIM), Chengdu, 21-23 April 2017, 279-283. https://doi.org/10.1109/INFOMAN.2017.7950392 |
[5] | Yan, X.Y. (2017) Research and Realization of Internet Public Opinion Analysis Based on Improved TF-IDF Algorithm. 2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES), Anyang, 13-16 October 2017, 80-83. https://doi.org/10.1109/DCABES.2017.24 |
[6] | G?k?ay, D., I?bilir, E. and Yildirim, G. (2012) Predicting the Sentiment in Sentences Based on Words: An Exploratory Study on ANEW and ANET. 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom), Kosice, 2-5 December 2012, 715-718. https://doi.org/10.1109/CogInfoCom.2012.6421945 |
[7] | 谭章禄, 王兆刚, 胡翰, 姜萱, 彭胜男. 基于文本聚类的煤矿安全隐患类型挖掘研究[J]. 中国安全科学学报, 2019, 29(3): 145-148. |
[8] | 陈孝慈, 谭章禄, 单斐, 高青. 基于Bigram的安全隐患文本分类研究[J]. 中国安全科学学报, 2017, 27(8): 156-161. |
[9] | 胡瑾秋, 张曦月, 吴志强. 结合TF-IDF的企业生产隐患关联预警及可视化研究[J]. 中国安全科学学报, 2019, 29(7): 170-176. |
[10] | 黄春梅, 王松磊. 基于词袋模型和TF-IDF的短文本分类研究[J]. 软件工程, 2020, 23(3): 1-3. |
[11] | 孟涛, 王诚. 基于扩展短文本词特征向量的分类研究[J]. 计算机技术与发展, 2019, 29(4): 57-62. |
[12] | 韩天园, 田顺, 吕凯光, 李旋, 张佳涛, 魏朗. 基于文本挖掘的重特大交通事故成因网络分析[J]. 中国安全科学学报, 2021, 31(9): 150-156. |
[13] | 李然, 林政, 林海伦, 王伟平, 孟丹. 文本情绪分析综述[J]. 计算机研究与发展, 2018, 55(1): 30-52. |
[14] | 洪巍, 李敏. 文本情感分析方法研究综述[J]. 计算机工程与科学, 2019, 41(4): 750-757. |
[15] | 谭章禄, 陈晓, 宋庆正, 陈孝慈. 基于文本挖掘的煤矿安全隐患分析[J]. 安全与环境学报, 2017, 17(4): 1262-1266. |
[16] | 奉国和. 文本分类性能评价研究[J]. 情报杂志, 2011, 30(8): 66-70. |