×

Robustness against separation and outliers in logistic regression. (English) Zbl 1429.62325

Summary: The logistic regression model is commonly used to describe the effect of one or several explanatory variables on a binary response variable. It suffers from the problem that its parameters are not identifiable when there is separation in the space of the explanatory variables. In that case, existing fitting techniques fail to converge or give the wrong answer. To remedy this, a slightly more general model is proposed under which the observed response is strongly related but not equal to the unobservable true response. This model will be called the hidden logistic regression model because the unobservable true responses are comparable to a hidden layer in a feedforward neural net. The maximum estimated likelihood estimator is proposed in this model. It is robust against separation, always exists, and is easy to compute. Outlier-robust estimation is also studied in this setting, yielding the weighted maximum estimated likelihood estimator.

MSC:

62J12 Generalized linear models (logistic models)
62F35 Robustness and adaptive procedures (parametric inference)
Full Text: DOI

References:

[1] Albert, A.; Anderson, J. A., On the existence of maximum likelihood estimates in logistic regression models, Biometrika, 71, 1-10 (1984) · Zbl 0543.62020
[2] Christmann, A., Least median of weighted squares in logistic regression with large strata, Biometrika, 81, 413-417 (1994) · Zbl 0807.62029
[3] Christmann, A., 1998. On positive breakdown point estimators in regression models with discrete response variables. Habilitation Thesis, Department of Statistics, University of Dortmund.; Christmann, A., 1998. On positive breakdown point estimators in regression models with discrete response variables. Habilitation Thesis, Department of Statistics, University of Dortmund.
[4] Christmann, A.; Rousseeuw, P. J., Measuring overlap in logistic regression, Comput. Statist. Data Anal., 37, 65-75 (2001) · Zbl 1051.62065
[5] Christmann, A.; Fischer, P.; Joachims, T., Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications, Comput. Statist., 17, 273-287 (2002) · Zbl 1010.62054
[6] Copas, J. B., Binary regression models for contaminated data, J. Roy. Statist. Soc. B, 50, 225-265 (1988), With discussion
[7] Efron, B., Double exponential families and their use in generalized linear regression, J. Amer. Statist. Assoc., 81, 709-721 (1986) · Zbl 0611.62072
[8] Ekholm, A., Palmgren, J., 1982. A model for binary response with misclassification. In: Gil-christ, R. (Ed.), GLIM-82, Proceedings of the International Conference on Generalized Linear Models. Springer, Heidelberg, pp. 128-143.; Ekholm, A., Palmgren, J., 1982. A model for binary response with misclassification. In: Gil-christ, R. (Ed.), GLIM-82, Proceedings of the International Conference on Generalized Linear Models. Springer, Heidelberg, pp. 128-143.
[9] Finney, D. J., The estimation from individual records of the relationship between dose and quantal response, Biometrika, 34, 320-334 (1947) · Zbl 0036.09701
[10] Firth, D., Bias reduction of maximum likelihood estimates, Biometrika, 80, 27-38 (1993) · Zbl 0769.62021
[11] He, X.; Simpson, D. G.; Portnoy, S. L., Breakdown robustness of tests, J. Amer. Statist. Assoc., 85, 446-452 (1990) · Zbl 0713.62039
[12] Huang, Y., Interval estimation of the ED50 when a logistic dose-response curve is incorrectly assumed, Comput. Statist. Data Anal., 36, 525-537 (2001) · Zbl 1030.62085
[13] Hubert, M.; Rousseeuw, P. J.; Verboven, S., A fast method for robust principal components with applications to chemometrics, Chemometrics Intell. Lab. Systems, 60, 101-111 (2002)
[14] Intrator, O.; Intrator, N., Interpreting neural-network resultsa simulation study, Comput. Statist. Data Anal., 37, 373-393 (2001) · Zbl 1061.62559
[15] Künsch, H. R.; Stefanski, L. A.; Carroll, R. J., Conditionally unbiased bounded influence estimation in general regression models, with applications to generalized linear models, J. Amer. Statist. Assoc., 84, 460-466 (1989) · Zbl 0679.62024
[16] Müller, C., Neykov, C., 2002. Breakdown points of trimmed likelihood estimators and related estimators in generalized linear models. J. Statist. Plann. Inference, to appear.; Müller, C., Neykov, C., 2002. Breakdown points of trimmed likelihood estimators and related estimators in generalized linear models. J. Statist. Plann. Inference, to appear.
[17] Pregibon, D., Logistic regression diagnostics, Ann. Statist., 9, 705-724 (1981) · Zbl 0478.62053
[18] Pregibon, D., Resistant fits for some commonly used logistic models with medical applications, Biometrics, 38, 485-498 (1982)
[19] Riedwyl, H., Lineare Regression und Verwandtes (1997), Birkhäuser: Birkhäuser Basel · Zbl 0873.62002
[20] Rousseeuw, P. J., Least median of squares regression, J. Amer. Statist. Assoc., 79, 871-880 (1984) · Zbl 0547.62046
[21] Rousseeuw, P. J.; Hubert, M., Regression depth, J. Amer. Statist. Assoc., 94, 388-433 (1999) · Zbl 1007.62060
[22] Rousseeuw, P. J.; Van Driessen, K., A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223 (1999)
[23] Rousseeuw, P. J.; Van Zomeren, B. C., Unmasking multivariate outliers and leverage points, J. Amer. Statist. Assoc., 85, 651-663 (1990)
[24] Santner, T. J.; Duffy, D. E., A note on A. Albert and J.A. Anderson’s conditions for the existence of maximum likelihood estimates in logistic regression models, Biometrika, 73, 755-758 (1986) · Zbl 0655.62022
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.