全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于随机森林算法的机器学习分类研究综述
A Review of Machine Learning Classification Based on Random Forest Algorithm

DOI: 10.12677/AIRR.2024.131016, PP. 143-152

Keywords: 决策树,随机森林,机器学习
Decision Trees
, Random Forests, Machine Learning

Full-Text   Cite this paper   Add to My Lib

Abstract:

机器学习是实现人工智能的重要技术,随机森林算法是机器学习的代表算法之一。随机森林算法以简单、有效而闻名工业界和学术界,它是基于决策树的分类器,通过投票选择最优的分类树。随机森林算法有可变重要性度量、包外误差、近似度等优秀特性,因此随机森林被广泛的应用到分类算法中。目前,不仅在医学、农业、自然语言处理等领域被广泛提及,而且在垃圾信息分类、入侵检测、内容信息过滤、情感分析等方面都有广泛的应用。本文主要介绍了随机森林的构建过程以及随机森林的研究现状,主要从分类性能、应用领域以及分类效果加以介绍,分析随机森林算法优缺点以及研究人员对随机森林算法的改进,希望通过分析能够让初学随机森林算法的研究人员掌握随机森林的理论基础。
Machine learning is an important technology to realize artificial intelligence, and random forest algorithm is one of the representative algorithms of machine learning. The random forest algorithm is well-known in industry and academia for its simplicity and effectiveness. It is a decision tree-based classifier that selects the optimal classification tree through voting. Random forest algorithm is widely used in classification algorithms because of its excellent characteristics such as variable importance measure, out-of-envelope error and approximation. At present, it is not only widely mentioned in medicine, agriculture, natural language processing and other fields, but also widely used in junk information classification, intrusion detection, content information filtering, sentiment analysis and other aspects. This paper mainly introduces the construction process of random forest and the research status of random forest, mainly from the classification performance, application field and classification effect, analyzes the advantages and disadvantages of random forest algorithm and the improvement of random forest algorithm by researchers, hoping that through analysis, researchers who have just learned random forest algorithm can master the theoretical basis of random forest.

References

[1]  Abdel-Hamid, O., Mohamed, A., Jiang, H. and Penn, G. (2012) Applying Convolutional Neural Networks Concepts to Hybrid NN-HMM Model for Speech Recognition. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, 25-30 March 2012, 4277-4280.
https://doi.org/10.1109/ICASSP.2012.6288864
[2]  Adeen, I.M.N., Abdulazeez, A.M. and Zeebaree, D.Q. (2020) Systematic Review of Unsupervised Genomic Clustering Algorithms Techniques for High Dimensional Datasets. Technology Reports of Kansai University, 62, 355-374.
[3]  Zeebaree, D.Q., Haron, H., Abdulazeez, A.M. and Zebari, D.A. (2019) Machine Learning and Region Growing for Breast Cancer Segmentation. 2019 International Conference on Advanced Science and Engineering (ICOASE), Zakho-Duhok, 2-4 April 2019, 88-93.
https://doi.org/10.1109/ICOASE.2019.8723832
[4]  Sadiq, S.S., Abdulazeez, A.M. and Haron, H. (2020) Solving Multi-Objective Master Production Schedule Problem Using Memetic Algorithm. Indonesian Journal of Electrical Engineering and Computer Science, 18, 938-945.
https://doi.org/10.11591/ijeecs.v18.i2.pp938-945
[5]  Abdulqader, D.M., Abdulazeez, A.M. and Zeebaree, D.Q. (2020) Machine Learning Supervised Algorithms of Gene Selection: A Review. Technology Reports of Kansai University, 62, 233-243.
[6]  Zebari, D.A., Haron, H., Zeebaree, D.Q. and Zain, A.M. (2019) A Simultaneous Approach for Compression and Encryption Techniques Using Deoxyribonucleic Acid. 2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), Island of Ulkulhas, 26-28 August 2019, 1-6.
https://doi.org/10.1109/SKIMA47702.2019.8982392
[7]  Sadeeq, H. and Abdulazeez, A.M. (2018) Hardware Implementation of Firefly Optimization Algorithm Using FPGAs. 2018 International Conference on Advanced Science and Engineering (ICOASE), Duhok, 9-11 October 2018, 30-35.
https://doi.org/10.1109/ICOASE.2018.8548822
[8]  Najat, N. and Abdulazeez, A.M. (2017) Gene Clustering with Partition around Mediods Algorithm Based on Weighted and Normalized Mahalanobis Distance. 2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), 24-26 November 2017, Okinawa, 140-145.
[9]  Mienye, I.D., Sun, Y. and Wang, Z. (2019) Prediction Performance of Improved Decision Tree-Based Algorithms: A Review. Procedia Manufacturing, 35, 698-703.
https://doi.org/10.1016/j.promfg.2019.06.011
[10]  Das, K., Behera, R.N. and Tech, B. (2007) A Survey on Machine Learning: Concept, Algorithms and Applications. International Journal of Innovative Research in Computer and Communication Engineering, 5, 1301-1309.
[11]  Schonlau, M. and Zou, R.Y. (2020) The Random Forest Algorithm for Statistical Learning. The Stata Journal: Promoting Communications on Statistics and Stata, 20, 3-29.
https://doi.org/10.1177/1536867X20909688
[12]  Han, J., Fang, M., Ye, S., Chen, C., Wan, Q. and Qian, X. (2019) Using Decision Tree to Predict Response Rates of Consumer Satisfaction, Attitude, and Loyalty Surveys. Sustainability, 11, Article 2306.
https://doi.org/10.3390/su11082306
[13]  Zhou, Z., Wang, Y., He, X. and Zhang, X. (2020) Optimization of Random Forests Algorithm Based on ReliefF-SA. IOP Conference Series: Materials Science and Engineering, 768, Article ID: 072065.
https://doi.org/10.1088/1757-899X/768/7/072065
[14]  Kumar, G.K., Viswanath, P. and Rao, A.A. (2016) Ensemble of Randomized Soft Decision Trees for Robust Classification. Sādhanā, 41, 273-282.
https://doi.org/10.1007/s12046-016-0465-z
[15]  Li, Y., Jiang, Z.L., Yao, L., Wang, X., Yiu, S.M. and Huang, Z. (2019) Outsourced Privacy-Preserving C4.5 Decision Tree Algorithm over Horizontally and Vertically Partitioned Dataset among Multiple Parties. Cluster Computing, 22, 1581-1593.
https://doi.org/10.1007/s10586-017-1019-9
[16]  Singh, S. and Giri, M. (2014) Comparative Study Id3, Cart and C4.5 Decision Tree Algorithm: A Survey. International Journal of Advanced Information Science and Technology, 3, 47-52.
[17]  Band, S.S., Janizadeh, S., Saha, S., Mukherjee, K., Bozchaloei, S.K., Cerdà, A., Shokri, M. and Mosavi, A. (2020) Evaluating the Efficiency of Different Regression, Decision Tree, and Bayesian Machine Learning Algorithms in Spatial Piping Erosion Susceptibility Using ALOS/PALSAR Data. Land, 9, Article 346.
https://doi.org/10.3390/land9100346
[18]  Sarker, I.H., Colman, A., Han, J., Khan, A.I., Abushark, Y.B. and Salah, K. (2020) BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model. Mobile Networks and Applications, 25, 1151-1161.
https://doi.org/10.1007/s11036-019-01443-z
[19]  Ozgode Yigin, B., Algin, O. and Saygili, G. (2020) Comparison of Morphometric Parameters in Prediction of Hydrocephalus Using Random Forests. Computers in Biology and Medicine, 116, Article ID: 103547.
https://doi.org/10.1016/j.compbiomed.2019.103547
[20]  Denisko, D. and Hoffman, M.M. (2018) Classification and Interaction in Random Forests. Proceedings of the National Academy of Sciences of the United States of America, 115, 1690-1692.
https://doi.org/10.1073/pnas.1800256115
[21]  Utkin, L.V., Kovalev, M.S. and Coolen, F.P.A. (2020) Imprecise Weighted Extensions of Random Forests for Classification and Regression. Applied Soft Computing, 92, Article ID: 106324.
https://doi.org/10.1016/j.asoc.2020.106324
[22]  Demidova, L. and Ivkina, M. (2019) Defining the Ranges Boundaries of the Optimal Parameters Values for the Random Forest Classifier. 2019 1st International Conference on Control Systems, Mathematical Modelling, Automation and Energy Efficiency (SUMMA), Lipetsk, 20-22 November 2019, 518-522.
https://doi.org/10.1109/SUMMA48161.2019.8947569
[23]  Kolhe, M.L., Tiwari, S., Trivedi, M.C. and Mishra, K.K. (2020). Advances in Data and Information Sciences: Proceedings of ICDIS 2019. Springer, Singapore.
https://doi.org/10.1007/978-981-15-0694-9
[24]  Gajowniczek, K., Grzegorczyk, I., Z?bkowski, T. and Bajaj, C. (2020) Weighted Random Forests to Improve Arrhythmia Classification. Electronics, 9, Article 99.
https://doi.org/10.3390/electronics9010099
[25]  Zhang, B.Z., Qiao, X.M., Yang, H.M. and Zhou, Z.B. (2020). A Random Forest Classification Model for Transmission Line Image Processing. 2020 15th International Conference on Computer Science & Education (ICCSE), Delft, 18-22 August 2020, 613-617.
https://doi.org/10.1109/ICCSE49874.2020.9201900
[26]  Goel, E. and Abhilasha, E. (2017) Random Forest: A Review. International Journal of Advanced Research in Computer Science and Software Engineering, 7, 251-257.
https://doi.org/10.23956/ijarcsse/V7I1/01113
[27]  Imaizumi, T., Okada, A., Miyamoto, S., Sakaori, F., Yamamoto, Y. and Vichi, M. (2020) Advanced Studies in Classification and Data Science. Springer, Singapore.
https://doi.org/10.1007/978-981-15-3311-2
[28]  Darbanian, E., Rahbari, D., Ghanizadeh, R. and Nickray, M. (2020) Improving Response Time of Task Offloading by Random Forest, Extra-Trees and Adaboost Classifiers in Mobile Fog Computing. Jordanian Journal of Computers and Information Technology, 6, 345-360.
https://doi.org/10.5455/jjcit.71-1590557276
[29]  Chaudhary, A., Kolhe, S. and Kamal, R. (2016) An Improved Random Forest Classifier for Multi-Class Classification. Information Processing in Agriculture, 3, 215-222.
[30]  Chen, S., Mulder, V.L., Martin, M.P., Walter, C., Lacoste, M., Richer-De-Forges, A.C., Saby, N.P.A., Loiseau, T., Hu, B. and Arrouays, D. (2019) Probability Mapping of Soil Thickness by Random Survival Forest at a National Scale. Geoderma, 344, 184-194.
https://doi.org/10.1016/j.geoderma.2019.03.016
[31]  Bargarai, F.A.M., Abdulazeez, A.M., Tiryaki, V.M. and Zeebaree, D.Q. (2020) Management of Wireless Communication Systems Using Artificial Intelligence-Based Software Defined Radio. International Journal of Interactive Mobile Technologies (IJIM), 14, 107-133.
https://doi.org/10.3991/ijim.v14i13.14211
[32]  Iwendi, C. and Jo, O. (2020) COVID-19 Patient Health Prediction Using Boosted Random Forest Algorithm. Frontiers in Public Health, 8, Article 357.
https://doi.org/10.3389/fpubh.2020.00357
[33]  Zhang, F. and Yang, X. (2020) Improving Land Cover Classification in an Urbanized Coastal Area by Random Forests: The Role of Variable Selection. Remote Sensing of Environment, 251, Article ID: 112105.
https://doi.org/10.1016/j.rse.2020.112105
[34]  Saenz-Cogollo, J.F. and Agelli, M. (2020) Investigating Feature Selection and Random Forests for Inter-Patient Heartbeat Classification. Algorithms, 13, Article 75.
https://doi.org/10.3390/a13040075
[35]  Chai, Z. and Zhao, C. (2020) Multiclass Oblique Random Forests with Dual-Incremental Learning Capacity. IEEE Transactions on Neural Networks and Learning Systems, 31, 5192-5203.
[36]  Azar, A.T., Gaber, T., Oliva, D., ?ulbah, M.F. and Hassanien, A.E. (2020) Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020). Springer.
https://public.ebookcentral.proquest.com/choice/publicfullrecord.aspx?p=6144671
[37]  汤圣君, 张韵婕, 李晓明, 等. 超体素随机森林与LSTM神经网络联合优化的室内点云高精度分类方法[J]. 武汉大学学报(信息科学版), 2023, 48(4): 525-533.
[38]  徐精诚, 陈学斌, 董燕灵, 等. 融合特征选择的随机森林DDoS攻击检测[J]. 计算机应用, 2023, 43(11): 3497-3503.
[39]  Li, H., Lin, J., Lei, X. and Wei, T.X. (2022) Compressive Strength Prediction of Basalt Fiber Reinforced Concrete via Random Forest Algorithm. Materials Today Communications, 30, Article ID: 103117.
https://doi.org/10.1016/j.mtcomm.2021.103117
[40]  Guo, Q., Zhang, J., Guo, S., et al. (2022) Urban Tree Classification Based on Object-Oriented Approach and Random Forest Algorithm Using Unmanned Aerial Vehicle (UAV) Multispectral Imagery. Remote Sensing, 14, Article 3885.
https://doi.org/10.3390/rs14163885

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413