×

Variable selection in measurement error models. (English) Zbl 1200.62071

Summary: Measurement error data or errors-in-variables data have been collected in many studies. Natural criterion functions are often unavailable for general functional measurement error models due to the lack of information on the distribution of the unobservable covariates. Typically, the parameter estimation is via solving estimating equations. In addition, the construction of such estimating equations routinely requires solving integral equations, hence the computation is often much more intensive compared with ordinary regression models. Because of these difficulties, traditional best subset variable selection procedures are not applicable, and in the measurement error model context, variable selection remains an unsolved issue.
We develop a framework for variable selection in measurement error models via penalized estimating equations. We first propose a class of selection procedures for general parametric measurement error models and for general semi-parametric measurement error models, and study the asymptotic properties of the proposed procedures. Then, under certain regularity conditions and with a properly chosen regularization parameter, we demonstrate that the proposed procedure performs as well as an oracle procedure. We assess the finite sample performance via Monte Carlo simulation studies and illustrate the proposed methodology through the empirical analysis of a familiar data set.

MSC:

62J02 General nonlinear regression
62G08 Nonparametric regression and quantile regression
62F12 Asymptotic properties of parametric estimators
65C05 Monte Carlo methods

References:

[1] Bickel, P.J. and Ritov, A.J.C. (1987). Efficient estimation in the errors-in-variables model. Ann. Statist. 15 513-540. · Zbl 0643.62029 · doi:10.1214/aos/1176350358
[2] Cai, J., Fan, J., Li, R. and Zhou, H. (2005). Variable selection for multivariate failure time data. Biometrika 92 303-316. · Zbl 1094.62123 · doi:10.1093/biomet/92.2.303
[3] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313-2392. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[4] Carroll, R.J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a density. J. Amer. Statist. Assoc. 83 1184-1186. · Zbl 0673.62033 · doi:10.2307/2290153
[5] Carroll, R.J., Ruppert, D., Stefanski, L.A. and Crainiceanu, C. (2006). Measurement Error in Nonlinear Models: A Modern Perspective , 2nd ed. London: CRC Press. · Zbl 1119.62063
[6] Delaigle, A. and Hall, P. (2007). Using SIMEX for smoothing-parameter choice in errors-in-variables problems. J. Amer. Statist. Assoc. 103 280-287. · Zbl 1471.62297 · doi:10.1198/016214507000001355
[7] Delaigle, A. and Meister, A. (2007). Nonparametric regression estimation in the heteroscedastic errors-in-variables problem. J. Amer. Statist. Assoc. 102 1416-1426. · Zbl 1332.62113 · doi:10.1198/016214507000000987
[8] Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257-1272. · Zbl 0729.62033 · doi:10.1214/aos/1176348248
[9] Fan, J. and Huang, T. (2005). Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11 1031-1057. · Zbl 1098.62077 · doi:10.3150/bj/1137421639
[10] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[11] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space (with discussion). J. Roy. Statist. Soc. Ser. B 70 849-911.
[12] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928-961. · Zbl 1092.62031 · doi:10.1214/009053604000000256
[13] Hall, P. and Ma, Y. (2007). Semiparametric estimators of functional measurement error models with unknown error. J. Roy. Statist. Soc. Ser. B 69 429-446. · doi:10.1111/j.1467-9868.2007.00596.x
[14] Härdle, W., Liang, H. and Gao, J. (2000). Partially Linear Models . Heidelberg: Springer Physica. · Zbl 0968.62006
[15] Hunter, D. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617-1642. · Zbl 1078.62028 · doi:10.1214/009053605000000200
[16] Kannel, W.B., Newton, J.D., Wentworth, D., Thomas, H.E., Stamler, J., Hulley, S.B. and Kjelsberg, M.O. (1986). Overall and coronary heart disease mortality rates in relation to major risk factors in 325,348 men screened for MRFIT. Am. Heart J. 112 825-836.
[17] Lam, C. and Fan, J. (2008). Profile-Kernel likelihood inference with diverging number of parameters. Ann. Statist. 36 2232-2260. · Zbl 1274.62289 · doi:10.1214/07-AOS544
[18] Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Ann. Statist. 36 261-286. · Zbl 1132.62027 · doi:10.1214/009053607000000604
[19] Li, R. and Nie, L. (2007). A new estimation procedure for partially nonlinear model via a mixed effects approach. Canad. J. Statist. 35 399-411. · Zbl 1132.62013 · doi:10.1002/cjs.5550350305
[20] Li, R. and Nie, L. (2008). Efficient statistical inference procedures for partially nonlinear models and their applications. Biometrics 64 904-911. · Zbl 1145.62030 · doi:10.1111/j.1541-0420.2007.00937.x
[21] Liang, H., Härdle, W. and Carroll R.J. (1999). Estimation in a semiparametric partially linear errors-in-variables model. Ann. Statist. 27 1519-1535. · Zbl 0977.62036 · doi:10.1214/aos/1017939140
[22] Liang, H. and Li, R. (2009). Variable selection for partially linear models with measurement errors. J. Amer. Statist. Assoc. 104 234-248. · Zbl 1388.62208 · doi:10.1198/jasa.2009.0127
[23] Ma, Y. and Carroll, R.J. (2006). Locally efficient estimators for eemiparametric models with measurement error. J. Amer. Statist. Assoc. 101 1465-1474. · Zbl 1171.62324 · doi:10.1198/016214506000000519
[24] Ma, Y. and Li, R. (2007). Variable selection in measurement error models. Technical report. Available at http://www2.unine.ch/webdav/site/statistics/shared/documents/v10.pdf. · Zbl 1200.62071
[25] Ma, Y. and Tsiatis, A.A. (2006). Closed form semiparametric estimators for measurement error models. Statist. Sinica 16 183-193. · Zbl 1087.62047
[26] Severini, T.A. and Staniswalis, J.G. (1994). Quasilikelihood estimation in semiparametric models. J. Amer. Statist. Assoc. 89 501-511. · Zbl 0798.62046 · doi:10.2307/2290852
[27] Stefanski, L.A. and Carroll, R.J. (1987). Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika 74 703-716. · Zbl 0632.62052
[28] Tsiatis, A.A. and Ma, Y. (2004). Locally efficient semiparametric estimators for functional measurement error models. Biometrika 91 835-848. · Zbl 1064.62034 · doi:10.1093/biomet/91.4.835
[29] Wang, H., Li, R. and Tsai, C. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94 553-568. · Zbl 1135.62058 · doi:10.1093/biomet/asm053
[30] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann. Statist. 36 1509-1566. · Zbl 1282.62112 · doi:10.1214/009053607000000802
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.