OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Intelligent Information Management 2022

Machine Learning Approaches to Predict Loan Default

DOI: 10.4236/iim.2022.145011, PP. 157-164

Wanjun Wu

Keywords: Machine Learning, Random Forest, Loan Default, Prediction Model

Full-Text Cite this paper Add to My Lib

Abstract:

Loan lending plays an important role in our everyday life and powerfully promotes the growth of consumption and the economy. Loan default has been unavoidable, which carries a great risk and may even end up in a financial crisis. Therefore, it is particularly important to identify whether a candidate is eligible for receiving a loan. In this paper, we apply Random Forest and XGBoost algorithms to train the prediction model and compare their performance in prediction accuracy. In the feature engineering part, we use the variance threshold method and Variance Inflation Factor method to filter out unimportant features, and then we input those selected features into Random Forest and XGBoost models. It turns out that Random Forest and XGBoost show little difference in the accuracy of their predictions since both get high accuracy of around 0.9 in the loan default cases.

References

[1]	Lai, L. (2020) Loan Default Prediction with Machine Learning Techniques. 2020 International Conference on Computer Communication and Network Security (CCNS), Xi’an, 21-23 August 2020, 5-9. https://doi.org/10.1109/CCNS50731.2020.00009
[2]	Aslam, U., Tariq Aziz, H.I., Sohail, A., et al. (2019) An Empirical Study on Loan Default Prediction Models. Journal of Computational and Theoretical Nanoscience, 16, 3483-3488. https://doi.org/10.1166/jctn.2019.8312
[3]	Madaan, M., Kumar, A., Keshri, C., et al. (2021) Loan Default Prediction Using Decision Trees and Random Forest: A Comparative Study. IOP Conference Series: Materials Science and Engineering, 1022, 012042. https://doi.org/10.1088/1757-899X/1022/1/012042
[4]	Malekipirbazari, M. and Aksakalli, V. (2015) Risk Assessment in Social Lending via Random Forests. Expert Systems with Applications, 42, 4621-4631. https://doi.org/10.1016/j.eswa.2015.02.001
[5]	Ma, X., Sha, J., Wang, D., et al. (2018) Study on a Prediction of P2P Network Loan Default Based on the Machine Learning LightGBM and XGboost Algorithms According to Different High Dimensional Data Cleaning. Electronic Commerce Research and Applications, 31, 24-39. https://doi.org/10.1016/j.elerap.2018.08.002
[6]	Nalić, J. and Švraka, A. (2018) Using Data Mining Approaches to Build Credit Scoring Model: Case Study—Implementation of Credit Scoring Model in Microfinance Institution. 2018 17th International Symposium Infoteh-Jahorina (INFOTEH), East Sarajevo, 21-23 March 2018, 1-5. https://doi.org/10.1109/INFOTEH.2018.8345543
[7]	Baesens, B., Roesch, D. and Scheule, H. (2016) Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS. John Wiley & Sons, Hoboken. https://doi.org/10.1002/9781119449560
[8]	Han, J.T., Choi, J.S., Kim, M.J., et al. (2018) Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt. Asian Economic Journal, 32, 3-14. https://doi.org/10.1111/asej.12139
[9]	Marqués, A.I., García, V. and Sánchez, J.S. (2012) Exploring the Behaviour of Base Classifiers in Credit Scoring Ensembles. Expert Systems with Applications, 39, 10244-10250. https://doi.org/10.1016/j.eswa.2012.02.092
[10]	Wang, Y. and Priestley, J.L. (2017) Binary Classification on Past Due of Service Accounts Using Logistic Regression and Decision Tree. Grey Literature from PhD Candidates. http://digitalcommons.kennesaw.edu/dataphdgreylit/
[11]	Li, X.H. (2013) Using “Random Forest” for Classification and Regression. Chinese Journal of Applied Entomology, 50, 1190-1197.
[12]	Li, Z., Li, S., Li, Z., et al. (2021) Application of XGBoost in P2P Default Prediction. Journal of Physics: Conference Series, 1871, 012115. https://doi.org/10.1088/1742-6596/1871/1/012115
[13]	Akinwande, M.O., Dikko, H.G. and Samson, A. (2015) Variance Inflation Factor: As a Condition for the Inclusion of Suppressor Variable(s) in Regression Analysis. Open Journal of Statistics, 5, 754-767. https://doi.org/10.4236/ojs.2015.57075
[14]	Gholamy, A., Kreinovich, V. and Kosheleva, O. (2018) Why 70/30 or 80/20 Relation between Training and Testing Sets: A Pedagogical Explanation. Departmental Technical Reports (CS). https://scholarworks.utep.edu/cs_techrep/1209
[15]	Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133