×

Regularized proportional odds models. (English) Zbl 1457.62223

Summary: The proportional odds model (POM) is commonly used in regression analysis to predict the outcome for an ordinal response variable. The maximum likelihood estimation (MLE) approach is typically used to obtain the parameter estimates. The likelihood estimates do not exist when the number of parameters, \(p\), is greater than the number of observations \(n\). The MLE also does not exist if there are no overlapping observations in the data. In a situation where the number of parameters is less than the sample size but \(p\) is approaching to \(n\), the likelihood estimates may not exist, and if they exist they may have quite large standard errors. An estimation method is proposed to address the last two issues, i.e. complete separation and the case when \(p\) approaches \(n\), but not the case when \(p>n\). The proposed method does not use any penalty term but uses pseudo-observations to regularize the observed responses by downgrading their effect so that they become close to the underlying probabilities. The estimates can be computed easily with all commonly used statistical packages supporting the fitting of POMs with weights. Estimates are compared with MLE in a simulation study and an application to the real data.

MSC:

62J12 Generalized linear models (logistic models)
62J07 Ridge regression; shrinkage estimators (Lasso)
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

glmnet; Fahrmeir; R
Full Text: DOI

References:

[1] Schaefer R, Roi L, Wolfe R. A ridge logistic estimator Commun Stat: Theory Methods. 1984;13:99-113. doi: 10.1080/03610928408828664[Taylor & Francis Online], [Web of Science ®], [Google Scholar]
[2] Schaefer R. Alternative estimators in logistic regression when the data are collinear J Stat Comput Simul. 1986;25:75-91. doi: 10.1080/00949658608810925[Taylor & Francis Online], [Google Scholar] · Zbl 0596.62060
[3] Nyquist H. Restricted estimation of generalized linear models J Appl Stat. 1991;40:133-141. doi: 10.2307/2347912[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0825.62612
[4] Segerstedt B. On ordinary ridge regression in generalized linear models Commun Stat: Theory Methods. 1992;21:2227-2246. doi: 10.1080/03610929208830909[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 0775.62185
[5] Zahid FM, Ramzan S. Ordinal ridge regression with categorical predictors J Appl Stat. 2012;39:161-171. doi: 10.1080/02664763.2011.578622[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1514.62136
[6] Tibshirani R. Regression shrinkage and selection via lasso J R Stat Soc B. 1996;58:267-288. [Google Scholar] · Zbl 0850.62538
[7] Park MY, Hastie T. L1-regularization path algorithm for generalized linear models J R Stat Soc B. 2007;69:659-677. doi: 10.1111/j.1467-9868.2007.00607.x[Crossref], [Google Scholar] · Zbl 07555370
[8] James G, Radchenko P. A generalized Dantzig selector with shrinkage tuning Biometrika. 2009;96:323-337. doi: 10.1093/biomet/asp013[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1163.62054
[9] Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties J Am Stat Assoc. 2001;96:1348-1360. doi: 10.1198/016214501753382273[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1073.62547
[10] Bühlmann P, Hothorn T. Boosting algorithms: regularization, prediction and model fitting (with discussion) Stat Sci. 2007;22:477-505. doi: 10.1214/07-STS242[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1246.62163
[11] Tutz G, Binder H. Generalized additive modelling with implicit variable selection by likelihood based boosting Biometrics. 2006;62:961-971. doi: 10.1111/j.1541-0420.2006.00578.x[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1116.62075
[12] Zahid FM, Tutz G. Multinomial logit models with implicit variable selection Adv Data Anal Classif. 2013in press. [Web of Science ®], [Google Scholar]
[13] Zhu J, Hastie T. Classification of gene microarrays by penalized logistic regression Biostatistics. 2004;5:427-443. doi: 10.1093/biostatistics/kxg046[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1154.62406
[14] Krishnapuram B, Carin L, Figueiredo MA, Hartemink AJ. Sparse multinomial logistic regression: fast algorithms and generalization bounds IEEE Trans Pattern Anal Mach Intell. 2005;27:957-968. doi: 10.1109/TPAMI.2005.127[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[15] Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent J Stat Softw. 2010;33(1):1-22. [Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[16] Rousseeuw PJ, Christmann A. Robustness against separation and outliers in logistic regression Comput Stat Data Anal. 2003;43:315-332. doi: 10.1016/S0167-9473(02)00304-3[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1429.62325
[17] Tutz G, Leitenstorfer F. Response shrinkage estimators in binary regression Comput Stat Data Anal. 2006;50(10):2878-2901. doi: 10.1016/j.csda.2005.04.009[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1445.62197
[18] Pregibon D. Logistic regression diagnostics Ann Stat. 1981;9:705-724. doi: 10.1214/aos/1176345513[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0478.62053
[19] Ananth CV, Kleinbaum DG. Regression models for ordinal responses: a review of methods and applications Int J Epidemiol. 1997;26(6):1323-1333. doi: 10.1093/ije/26.6.1323[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[20] Agresti A. Modelling ordered categorical data: recent advances and future challenges Stat Med. 1999;18:2191-2207. doi: 10.1002/(SICI)1097-0258(19990915/30)18:17/18<2191::AID-SIM249>3.0.CO;2-M[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[21] McCullagh P. Regression models for ordinal data J R Stat Soc B. 1980;42:109-142. [Google Scholar] · Zbl 0483.62056
[22] McCullagh P, Nelder J. Generalized linear models. 2nd ed. New York: Chapman & Hall; 1989. [Crossref], [Google Scholar] · Zbl 0744.62098
[23] Fahrmeir L, Tutz G. Multivariate statistical modelling based on generalized linear models. 2nd ed. New York: Springer-Verlag Inc.; 2001. [Crossref], [Google Scholar] · Zbl 0980.62052
[24] Lesaffre E, Albert A. Multiple-group logistic regression diagnostics Appl Stat. 1989;38:425-440. doi: 10.2307/2347731[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0707.62151
[25] R Development Core Team. R: a language and environment for statistical computingVienna, Austria: R Foundation for Statistical Computing; 2011. ISBN 3-900051-07-0 [Google Scholar]
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.