In recent
years, the expansion of Fintech has speeded the development of the online
peer-to-peer lending market, offering a huge opportunity for investment by
directly connecting borrowers to lenders, without traditional financial
intermediaries. This innovative approach is though accompanied by increasing
default risk since the information asymmetry tends to rise with online
businesses. This paper aimed to predict the probability of default of the
borrower, using data from the LendingClub, the leading American online
peer-to-peer lending platform. For this purpose, three machine learning methods
were employed: logistic regression, random forest and neural network. Prior to
the scoring models building, the LendingClub model was assessed, using the
grades attributed to the borrowers in the dataset. The results indicated that
the LendingClub model showed low performance with an AUC of 0.67, whereas the
logistic regression (0.9), the random forest (0.9) and the neural network
(0.93) displayed better predictive power. It stands out that the neural network
classifier outperformed the other models with the highest AUC. No difference
was noted in their respective accuracy value which was 0.9. Besides, in order
to enhance their investment decision, investors might take into consideration
the relationship between some variables and the likelihood of default. For
instance, the higher the loan amounts, the higher the likelihood of default.
The higher the debt to income, the higher the likelihood of default. While the
higher the annual income, the lower the probability of default. The probability
of default has a tendency to decline as the number of total open accounts
rises.
References
[1]
Abd Elrahman, S. M., & Abraham, A. (2013). A Review of Class Imbalance Problem. Journal of Network and Innovative Computing, 1, 332-340. http://ias04.softcomputing.net/jnic2.pdf
[2]
Abdou, H. A., Dongmo Tsafack, M. D., Ntim, C. G., & Baker, R. D. (2016). Predicting Creditworthiness in Retail Banking with Limited Scoring Data. Knowledge-Based Systems, 103, 89-103. https://www.journals.elsevier.com/knowledge-based-systems https://doi.org/10.1016/j.knosys.2016.03.023
[3]
Aldrich, J. H., & Nelson, F. D. (1984). Linear Probability, Logit, and Probit Models. In Quantitative Application in the Social Science. SAGE Publications. https://doi.org/10.4135/9781412984744
[4]
Breiman, L. (2001). Random Forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
[5]
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984) Classification and Regression Trees. Chapman and Hall.
[6]
Chang, A.-H., Yang, L.-K., Tsaih, R.-H., & Lin, S.-K. (2022). Machine Learning and Artificial Neural Networks to Construct P2P Lending Credit-Scoring Model: A Case Using Lending Club Data. Quantitative Finance and Economics, 6, 303-325. https://doi.org/10.3934/QFE.2022013
Fabozzi, F. J., Gupta, F., & Markowitz, H. M. (2002). The Legacy of Modern Portfolio Theory. The Journal of Investing, 11, 7-22. https://doi.org/10.3905/joi.2002.319510
[9]
Francis, J. C., & Kim, D. (2013). Modern Portfolio Theory: Foundation, Analysis and New Development. John Wiley & Sons, Inc.
[10]
Fu, Y. (2017). Combination of Random Forests and Neural Networks in Social Lending. Journal of Financial Risk Management, 6, 418-426. https://doi.org/10.4236/jfrm.2017.64030
[11]
He, Q., & Li, X. (2020). The Failure of Chinese Peer-to-Peer Lending Platforms: Finance and Politics. BOFIT Discussion Paper No. 27/2020. The Bank of Finland Institute for Emerging Economies. https://ssrn.com/abstract=3764783 https://doi.org/10.2139/ssrn.3764783
[12]
Hou, X. Y. (2020). P2P Borrower Default Identification and Prediction Based on RFE-Multiple Classification Models. Open Journal of Business and Management, 8, 866-880. https://doi.org/10.4236/ojbm.2020.82053
[13]
Jahangir, R. (2020). Peer-to-Peer Lending. https://www.researchgate.net/publication/341445894_PEER_TO_PEER_P2P_LENDING
[14]
Lantz, B (2013). Machine Learning with R. Packt Publishing.
[15]
Lund, B., & Brotherton, D. C. (2013). Information Value Statistic. https://mwsug.org/proceedings/2013/AA/MWSUG-2013-AA14.pdf
[16]
Ma, Z., Hou, W., & Zhang, D. (2021). A Credit Risk Assessment Model of Borrowers in P2P Lending Based on BP Neural Network. PLOS ONE, 16, e0255216. https://doi.org/10.1371/journal.pone.0255216
[17]
Madasamy, K., & Ramaswami, M. (2017). Data Imbalance and Classifiers: Impact and Solutions from a Big Data Perspective. International Journal of Computational Intelligence Research, 13, 2267-2281. https://www.ripublication.com/ijcir17/ijcirv13n9_09.pdf
[18]
Mangram, M. E. (2013). A Simplified Perspective of the Markowitz Portfolio Theory. Global Journal of Business Research, 7, 59-70. https://ssrn.com/abstract=2147880
[19]
Markowitz, H. (1952). Portfolio Selection. The Journal of Finance, 7, 77-91. https://doi.org/10.1111/j.1540-6261.1952.tb01525.x
[20]
Millerbernd, A., & Choudhuri-Wade, R. (2022). LendingClub Personal Loans: 2023 Review. NerdWallet. https://www.nerdwallet.com/reviews/loans/personal-loans/lendingclub-personal-loans
[21]
Namvar, A., Siami, M., Rabhi, F., & Naderpour, M. (2018). Credit Risk Prediction in an Imbalanced Social Lending Environment. International Journal of Computational Intelligence Systems, 11, 925-935. https://doi.org/10.2991/ijcis.11.1.70
[22]
Navlani, A. (2019). Neural Network Models in R.
[23]
Pfaff, B. (2012). Financial Risk Modelling and Portfolio Optimization with R. John Wiley & Sons. https://doi.org/10.1002/9781118477144
[24]
Pokorná, M., & Sponer, M. (2016). Social Lending and Its Risks. Procedia-Social and Behavioral Sciences, 220, 330-337. https://doi.org/10.1016/j.sbspro.2016.05.506
[25]
Serio, A. (2022). What Is Peer-to-Peer Lending? How Peer-to-Peer Lending Works and the 6 Best Peer-to-Peer Lending Sites. https://www.finder.com/peer-to-peer-lending
[26]
Serrano-Cinca, C., Gutiérrez-Nieto, B., & López-Palacios, L. (2015). Determinants of Default in P2P Lending. PLOS ONE, 10, e0139427. https://doi.org/10.1371/journal.pone.0139427
[27]
Shelke, M. S., Deshmukh, P. R., & Shandilya, V. (2017). A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique. International Journal of Recent Trends in Engineering & Research.
[28]
Vinod Kumar, L., Natarajan, S., Keerthana, S., Chinmayi, K. M., & Lakshmi, N. (2016). Credit Risk Analysis in Peer-to-Peer Lending System. In 2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA) (pp. 193-196). IEEE. https://www.researchgate.net/publication/312038529_Credit_Risk_Analysis_in_Peer-to-Peer_Lending_System
[29]
von Eye, A., & Clogg, C. C. (1996). Categorical Variables in Developmental Research: Methods of Analysis. Academic Press.
[30]
Wan, J., Zhang, H., Zhu, X., Sun, X., & Li, G. (2019). Research on Influencing Factors of P2P Network Loan Prepayment Risk Based on Cox Proportional Hazards. Procedia Computer Science, 162, 842-848. https://doi.org/10.1016/j.procs.2019.12.058