×

Variable selection with incomplete covariate data. (English) Zbl 1152.62388

Summary: Application of classical model selection methods such as Akaike’s information criterion (AIC) becomes problematic when observations are missing. We propose some variations on the AIC, which are applicable to missing covariate problems. The method is directly based on the expectation maximization (EM) algorithm and is readily available for EM-based estimation methods, without much additional computational efforts. The missing data AIC criteria are formally derived and shown to work in a simulation study and by application to data on diabetic retinopathy.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62B10 Statistical aspects of information-theoretic topics
92C50 Medical applications (general)
65C60 Computational problems in statistics (MSC2010)

References:

[1] Akaike, Second International Symposium on Information Theory pp 267– (1973)
[2] Burnham, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2002) · Zbl 1005.62007
[3] Cavanaugh, An Akaike information criterion for model selection in the presence of incomplete data, Journal of Statistical Planning and Inference 67 pp 45– (1998) · Zbl 1067.62504 · doi:10.1016/S0378-3758(97)00115-8
[4] Claeskens, Model Selection and Model Averaging (2008) · Zbl 1166.62001 · doi:10.1017/CBO9780511790485
[5] Gilks, Adaptive rejection sampling for Gibbs sampling, Applied Statistics 41 pp 337– (1992) · Zbl 0825.62407 · doi:10.2307/2347565
[6] Hens, Model selection for incomplete and design-based samples, Statistics in Medicine 25 pp 2502– (2006) · doi:10.1002/sim.2559
[7] Hurvich, Regression and time series model selection in small samples, Biometrika 76 pp 297– (1989) · Zbl 0669.62085 · doi:10.1093/biomet/76.2.297
[8] Ibrahim, Incomplete data in generalized linear models, Journal of the American Statistical Association 85 pp 765– (1990) · doi:10.2307/2290013
[9] Ibrahim, Monte Carlo EM for missing covariates in parametric regression models, Biometrics 55 pp 591– (1999) · Zbl 1059.62662 · doi:10.1111/j.0006-341X.1999.00591.x
[10] Ibrahim, Missing covariates in generalized linear models when the missing data mechanism is non-ignorable, Journal of the Royal Statistical Society, Series B 61 pp 173– (1999) · Zbl 0917.62060 · doi:10.1111/1467-9868.00170
[11] Klein, The Wisconsin epidemiologic study of diabetic retinopathy: II. Prevalence and risk of diabetic retinopathy when age at diagnosis is less than 30 years, Archives of Ophthalmology 102 pp 520– (1984) · doi:10.1001/archopht.1984.01040030398010
[12] Kotz, Multivariate t Distributions and Their Applications (2004) · Zbl 1100.62059 · doi:10.1017/CBO9780511550683
[13] Linhart, Model Selection (1986)
[14] Little, Statistical Analysis with Missing Data (2002) · doi:10.1002/9781119013563
[15] Liu, Missing data imputation using the multivariate t distribution, Journal of Multivariate Analysis 53 pp 139– (1995) · Zbl 0855.62051 · doi:10.1006/jmva.1995.1029
[16] Liu, ML estimation of the multivariate t distribution with unknown degrees of freedom, Statistica Sinica 5 pp 19– (1995)
[17] Louis, Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society, Series B 44 pp 226– (1982) · Zbl 0488.62018
[18] Meng, Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm, Journal of the American Statistical Association 86 pp 899– (1991) · doi:10.2307/2290503
[19] Nielsen, The stochastic EM algorithm: Estimation and asymptotic results, Bernoulli 6 pp 457– (2000) · Zbl 0981.62022 · doi:10.2307/3318671
[20] Shimodaira, Selecting Models from Data: Artificial Intelligence and Statistics IV pp 21– (1994) · doi:10.1007/978-1-4612-2660-4_3
[21] Takeuchi, Distribution of informational statistics and a criterion of model fitting, Suri-Kagaku (Mathematical Sciences) 153 pp 12– (1976)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.