×

On the relationship between multicollinearity and separation in logistic regression. (English) Zbl 1497.62202

Summary: Multicollinearity and separation are two major issues in logistic regression. In this paper, for the first time we study the relationship between multicollinearity and separation. We analytically prove that multicollinearity implies quasi-complete separation. Through counter examples, we show that multicollinearity does not always imply complete separation and that separation does not always imply multicollinearity. We also present the consequences of multicollinearity and separation. We analytically prove that multicollinearity means no finite solution to maximum likelihood estimate and that separation means no finite solution to maximum likelihood estimate.

MSC:

62J12 Generalized linear models (logistic models)
62F10 Point estimation
62P05 Applications of statistics to actuarial sciences and financial mathematics

Software:

aplore3
Full Text: DOI

References:

[1] Albert, A. and J. A. Anderson. 1984. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71 (1):1-10. · Zbl 0543.62020
[2] Chatelain, J. B., and K. Ralf. 2014. Spurious regressions and near multicollinearity, with an application to Aid, policies and growth. Journal of Macroeconomics 39 (PA):85-96.
[3] Demidenko, E., Computational aspects of probit model, Mathematical Communications, 6, 233-247 (2001) · Zbl 0987.62013
[4] Hosmer, D. W.; Lemeshow, S.; Sturdivant, R. X., Applied logistic regression (2013), New York: John Wiley & Sons, Inc, New York · Zbl 1276.62050
[5] Midi, H.; Sarkar, S. K.; Rana, S., Collinearity diagnostics of binary logistic regression model, Journal of Interdisciplinary Mathematics, 13, 3, 253-67 (2013) · Zbl 1222.62095 · doi:10.1080/09720502.2010.10700699
[6] Murray, L., H. Nguyen, Y. Lee, M. D. Remmenga, and D. Smith. 2012. Variance Inflation Factors in Regression Models with Dummy Variables, Conference Proceedings of Annual Conference on Applied Statistics in Agriculture, Kansas State University Libraries, New Prairie Press, str. 160-177.
[7] Refaat, M., Credit risk scorecards: Development and implementation using SAS (2011), Raleigh, North Carolina, USA: LULU.COM, Raleigh, North Carolina, USA
[8] Sarlija, N.; Bilandzic, A.; Stanic, M., Logistic regression modelling: procedures and pitfalls in developing and interpreting prediction models, Croatian Operational Research Review, 8, 631-52 (2017)
[9] Shen, J.; Gao, S., A solution to separation and multicollinearity in multiple logistic regression, Journal of Data Science: Jds, 6, 4, 515-31 (2008)
[10] Siddiqi, N., Credit risk scorecards: Developing and implementing intelligent credit scoring (2006), Hoboken, New Jersey: John Wiley & Sons, Inc, Hoboken, New Jersey
[11] A Study of Effects of MultiCollinearity in the Multivariable Analysis, International Journal of Applied Science and Technology, 4, 5, 9-19 (2014)
[12] Zeng, G., Metric divergence measures and information value in credit scoring, Journal of Mathematics, 2013, 1 (2013) · Zbl 1486.94043 · doi:10.1155/2013/848271
[13] Zeng, G., A rule of thumb for reject inference in credit scoring. Mathematical finance letters, Article, 2014, 2 (2014)
[14] Zeng, G., A necessary condition for a good binning algorithm in credit scoring, Applied Mathematical Sciences Applied Sciences, 8, 65, 3229 (2014) · doi:10.12988/2014.44300
[15] Zeng, G., A unified definition of mutual information with applications in machine learning, Mathematical Problems in Engineering, 2015 (2015) · Zbl 1395.94204 · doi:10.1155/2015/201874
[16] Zeng, G., A comparison study of computational methods of kolmogorov-smirnov statistic in credit scoring, Communications in Statistics: Simulation and Computation, 46, 10, 7744-60 (2017) · Zbl 1383.62257 · doi:10.1080/03610918.2016.1249883
[17] Zeng, G., Invariant properties of logistic regression model in credit scoring under monotonic transformations, Communications in Statistics: Theory and Methods, 46, 17, 8791-807 (2017) · Zbl 1377.62153 · doi:10.1080/03610926.2016.1193200
[18] Zeng, G., On the existence of maximum likelihood estimates for weighted logistic regression, Communications in Statistics: Theory and Methods, 46, 22, 11194-203 (2017) · Zbl 1462.62344 · doi:10.1080/03610926.2016.1260743
[19] Zeng, G.; Zeng, E., On the three-way equivalence of AUC in credit scoring with tied scores, Communications in Statistics: Theory and Methods (2018) · Zbl 07530841 · doi:10.1080/03610926.2018.1435814
[20] Zeng, G., On the confusion matrix in credit scoring and its analytical properties, Submitted to Communications in Statistics: Theory and Method. (2019) · Zbl 1511.91160 · doi:10.1080/03610926.2019.1568485
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.