×

A prediction-driven mixture cure model and its application in credit scoring. (English) Zbl 1431.91413

Summary: In the credit market, assessment of a borrower’s default risk over time is essential to enabling timely risk management, since borrowers’ exposure to risk and the losses that result from defaults are strongly related to the time when they default. Mixture cure models, with their ability to predict not only whether borrowers will default but also when they are likely to default, have been applied to credit scoring. We propose a prediction-driven mixture cure model, which sacrifices interpretability for potentially better prediction performance, and apply it to credit scoring. In the incidence part of the mixture cure model, we substitute the typical statistical incidence model (i.e., logistic regression) with a more flexible, and hopefully more accurate, classification method (i.e., random forests). For the latency part, we propose a survival analysis model, named time-dependent hazards, which accommodates a direct relationship between failure times and covariates and can potentially better predict the probability of default over time than the standard Cox PH model. Empirical evaluation using real-world data from a major P2P lending institution in China shows that both extensions contributed to performance improvement in both discrimination and calibration.

MSC:

91G40 Credit risk
62P05 Applications of statistics to actuarial sciences and financial mathematics

Software:

Smcure
Full Text: DOI

References:

[1] Alves, B. C.; Dias, J. G., Survival mixture models in behavioral scoring, Expert Systems with Applications, 42, 8, 3902-3910 (2015)
[2] Banasik, J.; Crook, J. N.; Thomas, L. C., Not if but when will borrowers default, Journal of the Operational Research Society, 50, 12, 1185-1190 (1999) · Zbl 1054.90531
[3] Baesens, B.; Van Gestel, T.; Viaene, S.; Stepanova, M.; Suykens, J.; Vanthienen, J., Benchmarking state-of-the-art classification algorithms for credit scoring, Journal of the Operational Research Society, 54, 6, 627-635 (2003) · Zbl 1097.91516
[4] Bellotti, T.; Crook, J., Credit scoring with macroeconomic variables using survival analysis, Journal of the Operational Research Society, 60, 12, 1699-1707 (2009) · Zbl 1196.91064
[5] Beran, J.; Djaïdja, A. Y.K., Credit risk modeling based on survival analysis with immunes, Statistical Methodology, 4, 3, 251-276 (2007) · Zbl 1248.91095
[6] Bhattacharya, A.; Wilson, S. P.; Soyer, R., A Bayesian approach to modeling mortgage default and prepayment, European Journal of Operational Research, 274, 3, 1112-1124 (2019) · Zbl 1431.91409
[7] Breiman, L., Random forests, Machine Learning, 45, 1, 5-32 (2001) · Zbl 1007.68152
[8] Breslow, N., Covariance analysis of censored survival data, Biometrics, 30, 1, 89-99 (1974)
[9] Cai, C.; Zou, Y.; Peng, Y.; Zhang, J., smcure: An R-package for estimating semiparametric mixture cure models, Computer Methods and Programs in Biomedicine, 108, 3, 1255-1260 (2012)
[10] Chang, Y. C.; Chang, K. H.; Chu, H. H.; Tong, L. I., Establishing decision tree-based short-term default credit risk assessment models, Communications in Statistics-Theory and Methods, 45, 23, 6803-6815 (2016) · Zbl 1349.91294
[11] Cox, D. R., Regression models and life-tables, Journal of the Royal Statistical Society, 34, 2, 187-220 (1972) · Zbl 0243.62041
[12] De Leonardis, D.; Rocci, R., Default risk analysis via a discrete-time cure rate model, Applied Stochastic Models in Business & Industry, 30, 5, 529-543 (2014) · Zbl 07880617
[13] Dietterich, T. G., Ensemble methods in machine learning, (Proceedings of international workshop on multiple classifier systems (2000), Heidelberg, Berlin), 1-15
[14] Dirick, L.; Claeskens, G.; Baesens, B., An Akaike information criterion for multiple event mixture cure models, European Journal of Operational Research, 241, 2, 449-457 (2015) · Zbl 1341.62076
[15] Djeundje, V. B.; Crook, J., Incorporating heterogeneity and macroeconomic variables into multi-state delinquency models for credit cards, European Journal of Operational Research, 271, 2, 697-709 (2018) · Zbl 1403.91246
[16] Djeundje, V. B.; Crook, J., Dynamic survival models with varying coefficients for credit risks, European Journal of Operational Research, 275, 1, 319-333 (2019) · Zbl 1431.91410
[17] Finlay, S., Multiple classifier architectures and their application to credit risk assessment, European Journal of Operational Research, 210, 2, 368-378 (2011)
[18] Finlay, S., Credit scoring, response modelling and insurance rating: A practical guide to forecasting consumer behavior (2012), Palgrave Macmillan
[19] Fygenson, M.; Ritov, Y., Monotone estimating equations for censored data, Annals of Statistics, 22, 2, 732-746 (1994) · Zbl 0807.62032
[20] Guo, Y.; Zhou, W.; Luo, C.; Liu, C.; Xiong, H., Instance-based credit risk assessment for investment decisions in P2P lending, European Journal of Operational Research, 249, 2, 417-426 (2016) · Zbl 1346.91250
[21] Hand, D. J.; Henley, W. E., Statistical classification methods in consumer credit, Journal of the Royal Statistical Society: Series A (Statistics in Society), 160, 3, 523-541 (1997)
[22] Hand, D. J.; Kelly, M. G., Lookahead scorecards for new fixed term credit products, Journal of the Operational Research Society, 52, 9, 989-996 (2001) · Zbl 1181.91325
[23] Hand, D. J., Measuring classifier performance: A coherent alternative to the area under the ROC curve, Machine Learning, 77, 1, 103-123 (2009) · Zbl 1470.62085
[24] Hajjem, A.; Bellavance, F.; Larocque, D., Mixed effects regression trees for clustered data, Statistics & Probability Letters, 81, 4, 451-459 (2011) · Zbl 1207.62136
[25] Huang, Z.; Zhao, H.; Zhu, D., Two new prediction-driven approaches to discrete choice prediction, ACM Transactions on Management Information Systems (TMIS), 3, 2, 9 (2012)
[26] Im, J. K.; Apley, D. W.; Qi, C.; Shan, X., A time-dependent proportional hazards survival model for credit risk analysis, Journal of the Operational Research Society, 63, 3, 306-321 (2012)
[27] Jin, Z.; Lin, D. Y.; Wei, L. J.; Ying, Z., Rank-based inference for the accelerated failure time model, Biometrika, 90, 2, 341-353 (2003) · Zbl 1034.62103
[28] Larocque, D., Mixed-effects random forest for clustered data, Journal of Statistical Computation & Simulation, 84, 6, 1313-1328 (2014) · Zbl 1453.62543
[29] Lessmann, S.; Baesens, B.; Seow, H. V.; Thomas, L. C., Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research, European Journal of Operational Research, 247, 1, 1-32 (2015) · Zbl 1346.90835
[30] Lin, L. I., A concordance correlation coefficient to evaluate reproducibility, Biometrics, 45, 1, 255-268 (1989) · Zbl 0715.62114
[31] Liu, F.; Hua, Z.; Lim, A., Identifying future defaulters: A hierarchical Bayesian method, European Journal of Operational Research, 241, 1, 202-211 (2015) · Zbl 1338.91145
[32] Liu, L.; Levine, M.; Zhu, Y., A functional EM algorithm for mixing density estimation via nonparametric penalized likelihood maximization, Journal of Computational & Graphical Statistics, 18, 2, 481-504 (2009)
[33] Ma, J.; Heritier, S.; Lô, S. N., On the maximum penalized likelihood approach for proportional hazard models with right censored survival data, Computational Statistics & Data Analysis, 74, 5, 142-156 (2014) · Zbl 1506.62121
[34] Malekipirbazari, M.; Aksakalli, V., Risk assessment in social lending via random forests, Expert Systems with Applications, 42, 10, 4621-4631 (2015)
[35] Mchugh, M. L., Multiple comparison analysis testing in ANOVA, Biochemia Medica, 21, 3, 203-209 (2011)
[36] Moeyersoms, J.; Martens, D., Including high-cardinality attributes in predictive models: A case study in churn prediction in the energy sector, Decision Support Systems, 72, 8, 72-81 (2015)
[37] Molinaro, A. M.; Simon, R.; Pfeiffer, R. M., Prediction error estimation: A comparison of resampling methods, Bioinformatics, 21, 15, 3301-3307 (2005)
[38] Narain, B., Survival analysis and the credit granting decision, Credit scoring and credit control, 109-121 (1992), Oxford University Press: Oxford University Press Oxford
[39] Nevat, I.; Peters, G. W.; Yuan, J., Maximum a-posteriori estimation in linear models with a random Gaussian model matrix: A Bayesian-EM approach, (Proceedings of the IEEE international conference on acoustics, speech and signal processing (2008)), 2889-2892
[40] Peng, Y., A nonparametric mixture model for cure rate estimation, Biometrics, 56, 1, 237-243 (2000) · Zbl 1060.62651
[41] Rosenberg, E.; Gleit, A., Quantitative methods in credit management: A survey, Operations Research, 42, 4, 589-613 (1994) · Zbl 0815.90110
[42] Sela, R. J.; Simonoff, J. S., RE-EM trees: A data mining approach for longitudinal and clustered data, Machine Learning, 86, 2, 169-207 (2012) · Zbl 1238.68131
[43] Sy, J. P.; Taylor, J. M.G., Estimation in a Cox proportional hazards cure model, Biometrics, 56, 1, 227-236 (2000) · Zbl 1060.62670
[44] Stepanova, M.; Thomas, L., Survival analysis methods for personal loan data, Operations Research, 50, 2, 277-289 (2002) · Zbl 1163.91521
[45] Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P., Random forest: A classification and regression tool for compound classification and QSAR modeling, Journal of Chemical Information and Computer Sciences, 43, 6, 1947-1958 (2003)
[46] Tong, E. N.C.; Mues, C.; Thomas, L. C., Mixture cure models in credit scoring: If and when borrowers default, European Journal of Operational Research, 218, 1, 132-139 (2012) · Zbl 1244.91099
[47] Yamashita, T.; Yamashita, K.; Kamimura, R., A stepwise AIC method for variable selection in linear regression, Communications in Statistics - Theory and Methods, 36, 13, 2395-2403 (2007) · Zbl 1128.62077
[48] Zhang, J.; Peng, Y., A new estimation method for the semiparametric accelerated failure time mixture cure model, Statistics in Medicine, 26, 16, 3157-3171 (2007)
[49] Zhang, J.; Thomas, L. C., Comparisons of linear regression and survival analysis using single and mixture distributions approaches in modelling LGD, International Journal of Forecasting, 28, 1, 204-215 (2012)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.