Model selection in linear mixed effect models. (English) Zbl 1241.62105
Summary: Mixed effect models are fundamental tools for the analysis of longitudinal data, panel data and cross-sectional data. They are widely used by various fields of social sciences, medical and biological sciences. However, the complex nature of these models has made variable selection and parameter estimation a challenging problem. We propose a simple iterative procedure that estimates and selects fixed and random effects for linear mixed models. In particular, we propose to utilize the partial consistency property of the random effect coefficients and select groups of random effects simultaneously via a data-oriented penalty function (the smoothly clipped absolute deviation penalty function). We show that the proposed method is a consistent variable selection procedure and possesses some oracle properties. Simulation studies and a real data analysis are also conducted to empirically examine the performance of this procedure.
MSC:
62J05 | Linear regression; mixed models |
62H12 | Estimation in multivariate analysis |
62J07 | Ridge regression; shrinkage estimators (Lasso) |
62F12 | Asymptotic properties of parametric estimators |
65C60 | Computational problems in statistics (MSC2010) |
References:
[1] | Akaike, H., Information theory and an extension of the maximum likelihood principle, (Petrov, B. N.; Csáki, F., Second Internal Symposium on Information Theory (1973), Akadémiai Kiado: Akadémiai Kiado Budapest), 267-281 · Zbl 0283.62006 |
[2] | Antoniadis, A.; Fan, J., Regularized wavelet approximations (with discussion), Journal of American Statistical Association, 96, 939-967 (2001) · Zbl 1072.62561 |
[3] | Breiman, L., Heuristics of instability and stablilization in model selection, Annals of Statistics, 24, 2350-2383 (1996) · Zbl 0867.62055 |
[4] | Bryk; Raudenbush, Hierarchical Linear Models: Applications and Data Analysis Methods (2001), Sage Publication |
[5] | Demidenko, E., Criteria for global minimum of sum of squares in nonlinear regression, Computational Statistics and Data Analysis, 51, 3, 1739-1753 (2006) · Zbl 1157.62456 |
[6] | Fan, J.; Li, R., New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis, Journal of American Statistical Association, 99, 710-723 (2004) · Zbl 1117.62329 |
[7] | Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 456, 1348-1360 (2001) · Zbl 1073.62547 |
[8] | Fan, J.; Peng, H., Nonconcave penalized likelihood with a diverging number of parameters, The Annals of Statistics, 32, 928-961 (2004) · Zbl 1092.62031 |
[9] | Frank, I. E.; Friedman, J. H., A statistical view of some chemometrics regression tools, Technometrics, 35, 109-148 (1993) · Zbl 0775.62288 |
[10] | Goldstein, H., Multilevel Statistical Models (2002), A Hodder Arnold Publication |
[11] | Hodges, J. S.; Sargent, D. J., Counting degrees of freedom in hierarchical and other richly parameterized models, Biometrika, 88, 367-379 (2001) · Zbl 0984.62045 |
[12] | A. Krishna, 2008, Shrinkage-Based Variable Selection Methods for Linear Regression and Mixed-Effects Models, Dissertation, North Carolina State University.; A. Krishna, 2008, Shrinkage-Based Variable Selection Methods for Linear Regression and Mixed-Effects Models, Dissertation, North Carolina State University. |
[13] | Laird, N. M.; Ware, J. H., Random-effects models for longitudinal data, Biometrics, 38, 963-974 (1982) · Zbl 0512.62107 |
[14] | Mallow, C. L., Some comments on \(C_p\), Technometric, 15, 661-675 (1973) · Zbl 0269.62061 |
[15] | Nishii, R., Asymptotic properties of criteria for selection of variables in multiple regression, Annal of Statistics, 12, 758-765 (1984) · Zbl 0544.62063 |
[16] | Pu, W.; Niu, X., Selecting mixed-effects models based on a generalized information criterion, Journal of Multivariate Analysis, 97, 733-758 (2006) · Zbl 1085.62083 |
[17] | Rao, C. R.; Wu, Y., A strongly consistent procedure for model selection in a regression problem, Biometrika, 76, 369-374 (1989) · Zbl 0669.62051 |
[18] | Schwartz, G., Estimating the dimensions of a model, Annals of Statistics, 6, 461-464 (1978) · Zbl 0379.62005 |
[19] | Shibata, R., Approximate efficiency of a selection procedure for the number of regression variables, Biometrika, 71, 43-49 (1984) · Zbl 0543.62053 |
[20] | Sun, Y.; Zhang, W.; Tong, H., Estimation of the covariance matrix of random effects in longitudinal studies, The Annals of Statistics, 35, 2795-2814 (2007) · Zbl 1129.62053 |
[21] | Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society B, 58, 267-288 (1996) · Zbl 0850.62538 |
[22] | Van der Vaart, A. M., Asymptotic Statistics (1998), Cambridge University Press · Zbl 0910.62001 |
[23] | Wang, H.; Li, R.; Tsai, C.-L., Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika, 94, 553-568 (2007) · Zbl 1135.62058 |
[24] | Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society, B, 68, 49-67 (2006) · Zbl 1141.62030 |
[25] | Zou, H., The adaptive lasso and its oracle properties, Journal of the American Statistical Association, 101, 1418-1429 (2006) · Zbl 1171.62326 |
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.