全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Machine Learning Approaches to Predict Loan Default

DOI: 10.4236/iim.2022.145011, PP. 157-164

Keywords: Machine Learning, Random Forest, Loan Default, Prediction Model

Full-Text   Cite this paper   Add to My Lib

Abstract:

Loan lending plays an important role in our everyday life and powerfully promotes the growth of consumption and the economy. Loan default has been unavoidable, which carries a great risk and may even end up in a financial crisis. Therefore, it is particularly important to identify whether a candidate is eligible for receiving a loan. In this paper, we apply Random Forest and XGBoost algorithms to train the prediction model and compare their performance in prediction accuracy. In the feature engineering part, we use the variance threshold method and Variance Inflation Factor method to filter out unimportant features, and then we input those selected features into Random Forest and XGBoost models. It turns out that Random Forest and XGBoost show little difference in the accuracy of their predictions since both get high accuracy of around 0.9 in the loan default cases.

References

[1]  Lai, L. (2020) Loan Default Prediction with Machine Learning Techniques. 2020 International Conference on Computer Communication and Network Security (CCNS), Xi’an, 21-23 August 2020, 5-9.
https://doi.org/10.1109/CCNS50731.2020.00009
[2]  Aslam, U., Tariq Aziz, H.I., Sohail, A., et al. (2019) An Empirical Study on Loan Default Prediction Models. Journal of Computational and Theoretical Nanoscience, 16, 3483-3488.
https://doi.org/10.1166/jctn.2019.8312
[3]  Madaan, M., Kumar, A., Keshri, C., et al. (2021) Loan Default Prediction Using Decision Trees and Random Forest: A Comparative Study. IOP Conference Series: Materials Science and Engineering, 1022, 012042.
https://doi.org/10.1088/1757-899X/1022/1/012042
[4]  Malekipirbazari, M. and Aksakalli, V. (2015) Risk Assessment in Social Lending via Random Forests. Expert Systems with Applications, 42, 4621-4631.
https://doi.org/10.1016/j.eswa.2015.02.001
[5]  Ma, X., Sha, J., Wang, D., et al. (2018) Study on a Prediction of P2P Network Loan Default Based on the Machine Learning LightGBM and XGboost Algorithms According to Different High Dimensional Data Cleaning. Electronic Commerce Research and Applications, 31, 24-39.
https://doi.org/10.1016/j.elerap.2018.08.002
[6]  Nalić, J. and Švraka, A. (2018) Using Data Mining Approaches to Build Credit Scoring Model: Case Study—Implementation of Credit Scoring Model in Microfinance Institution. 2018 17th International Symposium Infoteh-Jahorina (INFOTEH), East Sarajevo, 21-23 March 2018, 1-5.
https://doi.org/10.1109/INFOTEH.2018.8345543
[7]  Baesens, B., Roesch, D. and Scheule, H. (2016) Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS. John Wiley & Sons, Hoboken.
https://doi.org/10.1002/9781119449560
[8]  Han, J.T., Choi, J.S., Kim, M.J., et al. (2018) Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt. Asian Economic Journal, 32, 3-14.
https://doi.org/10.1111/asej.12139
[9]  Marqués, A.I., García, V. and Sánchez, J.S. (2012) Exploring the Behaviour of Base Classifiers in Credit Scoring Ensembles. Expert Systems with Applications, 39, 10244-10250.
https://doi.org/10.1016/j.eswa.2012.02.092
[10]  Wang, Y. and Priestley, J.L. (2017) Binary Classification on Past Due of Service Accounts Using Logistic Regression and Decision Tree. Grey Literature from PhD Candidates.
http://digitalcommons.kennesaw.edu/dataphdgreylit/
[11]  Li, X.H. (2013) Using “Random Forest” for Classification and Regression. Chinese Journal of Applied Entomology, 50, 1190-1197.
[12]  Li, Z., Li, S., Li, Z., et al. (2021) Application of XGBoost in P2P Default Prediction. Journal of Physics: Conference Series, 1871, 012115.
https://doi.org/10.1088/1742-6596/1871/1/012115
[13]  Akinwande, M.O., Dikko, H.G. and Samson, A. (2015) Variance Inflation Factor: As a Condition for the Inclusion of Suppressor Variable(s) in Regression Analysis. Open Journal of Statistics, 5, 754-767.
https://doi.org/10.4236/ojs.2015.57075
[14]  Gholamy, A., Kreinovich, V. and Kosheleva, O. (2018) Why 70/30 or 80/20 Relation between Training and Testing Sets: A Pedagogical Explanation. Departmental Technical Reports (CS).
https://scholarworks.utep.edu/cs_techrep/1209
[15]  Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32.
https://doi.org/10.1023/A:1010933404324

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133