×

Sparsity identification in ultra-high dimensional quantile regression models with longitudinal data. (English) Zbl 07529922

Summary: In this paper, we propose a variable selection method for quantile regression model in ultra-high dimensional longitudinal data called as the weighted adaptive robust lasso (WAR-Lasso) which is double-robustness. We derive the consistency and the model selection oracle property of WAR-Lasso. Simulation studies show the double-robustness of WAR-Lasso in both cases of heavy-tailed distribution of the errors and the heavy contaminations of the covariates. WAR-Lasso outperform other methods such as SCAD and etc. A real data analysis is carried out. It shows that WAR-Lasso tends to select fewer variables and the estimated coefficients are in line with economic significance.

MSC:

62J05 Linear regression; mixed models
62J99 Linear inference, regression
62-XX Statistics
Full Text: DOI

References:

[1] Belloni, A.; Chernozhukov, V., l_1-penalized quantile regression in high-dimensional sparse models, The Annals of Statistics, 39, 1, 82-130 (2011) · Zbl 1209.62064 · doi:10.1214/10-AOS827
[2] Bühlmann, P.; van de Geer, S., Statistics for high-dimensional data: Methods, theory and applications (2011), New York: Springer, New York · Zbl 1273.62015
[3] Candes, E.; Tao, T., The Dantzig selector: statistical estimation when p is much larger than n, The Annals of Statistics, 35, 6, 2313-51 (2007) · Zbl 1139.62019 · doi:10.1214/009053606000001523
[4] Cheng, M. Y.; Honda, T.; Li, J.; Peng, H., Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data, The Annals of Statistics, 42, 5, 1819-49 (2014) · Zbl 1305.62169 · doi:10.1214/14-AOS1236
[5] Cheng, M. Y.; Honda, T.; Li, J., Efficient estimation in semivarying coefficient models for longitudinal/clustered data, The Annals of Statistics, 44, 5, 1988-2017 (2016) · Zbl 1349.62128 · doi:10.1214/15-AOS1385
[6] Fan, J.; Zhang, J. T., Two-step estimation of functional linear models with applications to longitudinal data, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62, 2, 303-22 (2000) · doi:10.1111/1467-9868.00233
[7] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 456, 1348-60 (2001) · Zbl 1073.62547 · doi:10.1198/016214501753382273
[8] Fan, J.; Li, R., New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis, Journal of the American Statistical Association, 99, 467, 710-23 (2004) · Zbl 1117.62329 · doi:10.1198/016214504000001060
[9] Fan, J.; Lv, J., Sure independence screening for ultrahigh dimensional feature space (with discussion), Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 5, 849-911 (2008) · Zbl 1411.62187 · doi:10.1111/j.1467-9868.2008.00674.x
[10] Fan, J.; Fan, Y.; Barut, E., Adaptive robust variable selection, Annals of Statistics, 42, 1, 324-51 (2014) · Zbl 1296.62144 · doi:10.1214/13-AOS1191
[11] Fu, L.; Wang, Y. G., Efficient parameter estimation via Gaussian copulas for quantile regression with longitudinal data, Journal of Multivariate Analysis, 143, 492-502 (2016) · Zbl 1342.62060 · doi:10.1016/j.jmva.2015.07.004
[12] Lee, S.; Liao, Y.; Seo, M. H.; Shin, Y., Oracle estimation of a change point in high-dimensional quantile regression, Journal of the American Statistical Association, 113, 523, 1184-94 (2018) · Zbl 1402.62033 · doi:10.1080/01621459.2017.1319840
[13] Li, D.; Ke, Y.; Zhang, W., Model selection and structure specification in ultra-high dimensional generalised semi-varying coefficient models, The Annals of Statistics, 43, 6, 2676-705 (2015) · Zbl 1327.62262 · doi:10.1214/15-AOS1356
[14] Liu, Q., Asymptotic normality for the partially Linear EV models with longitudinal data, Communications in Statistics - Theory and Methods, 40, 7, 1149-58 (2011) · Zbl 1220.62041 · doi:10.1080/03610920903556541
[15] Liu, J.; Li, R.; Wu, R., Feature selection for varying coefficient models with ultrahigh-dimensional covariates, Journal of the American Statistical Association, 109, 505, 266-74 (2014) · Zbl 1367.62048 · doi:10.1080/01621459.2013.850086
[16] Rousseeuw, P. J.; van Zomeren, B. C., Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, 85, 411, 633-9 (1990) · doi:10.1080/01621459.1990.10474920
[17] Shah, R. D.; Bühlmann, P., Goodness-of-fit tests for high dimensional linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 1, 113-35 (2018) · Zbl 06840459 · doi:10.1111/rssb.12234
[18] Sinha, S. K., Robust inference in generalized linear models for longitudinal data, Canadian Journal of Statistics, 34, 2, 261-78 (2006) · Zbl 1142.62384 · doi:10.1002/cjs.5550340205
[19] Tang, Y.; Wang, H. J.; Zhu, Z., Variable selection in quantile varying coefficient models with longitudinal data, Computational Statistics and Data Analysis, 57, 1, 435-49 (2013) · Zbl 1365.62285 · doi:10.1016/j.csda.2012.07.015
[20] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B., 58, 1, 267-88 (1996) · Zbl 0850.62538 · doi:10.1111/j.2517-6161.1996.tb02080.x
[21] Wang, S.; Qian, L.; Carroll, R. J., Generalized empirical likelihood methods for analyzing longitudinal data, Biometrika, 97, 1, 79-93 (2010) · Zbl 1183.62060 · doi:10.1093/biomet/asp073
[22] Wang, L.; Zhou, J.; Qu, A., Penalized generalized estimating equations for high-dimensional longitudinal data analysis, Biometrics, 68, 2, 353-60 (2012) · Zbl 1251.62051 · doi:10.1111/j.1541-0420.2011.01678.x
[23] Wang, K.; Lin, L., Variable selection in robust semiparametric modeling for longitudinal data, Journal of the Korean Statistical Society, 43, 2, 303-14 (2014) · Zbl 1306.62065 · doi:10.1016/j.jkss.2013.10.003
[24] Wang, H. J.; McKeague, I. W.; Qian, M., Testing for marginal linear effects in quantile regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 2, 433-52 (2018) · Zbl 06849262 · doi:10.1111/rssb.12258
[25] Wang, T.; Samworth, R. J., High dimensional change point estimation via sparse projection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 1, 57-83 (2018) · Zbl 1439.62199 · doi:10.1111/rssb.12243
[26] Xu, C.; Chen, J., The Sparse MLE for ultra-high-dimensional feature screening, Journal of the American Statistical Association, 109, 507, 1257-69 (2014) · Zbl 1368.62295 · doi:10.1080/01621459.2013.879531
[27] Xu, Q.; Bai, Y., Semiparametric statistical inferences for longitudinal data with nonparametric covariance modelling, Statistics, 51, 6, 1280-303 (2017) · Zbl 1440.62138 · doi:10.1080/02331888.2017.1354861
[28] Xue, L.; Zhu, L., Empirical likelihood for a varying coefficient model with longitudinal data, Journal of the American Statistical Association, 102, 478, 642-54 (2007) · Zbl 1172.62306 · doi:10.1198/016214507000000293
[29] Yang, Y.; Tokdar, S. T., Minimax-optimal nonparametric regression in high dimensions, The Annals of Statistics, 43, 2, 652-74 (2015) · Zbl 1312.62052 · doi:10.1214/14-AOS1289
[30] Yang, G.; Yu, Y.; Li, R.; Buu, A., Feature screening in ultrahigh dimensional Cox’s model, Statistica Sinica, 26, 881-901 (2016) · Zbl 1356.62175
[31] Zeger, S. L.; Diggle, P. J., Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters, Biometrics, 50, 3, 689-99 (1994) · Zbl 0821.62093
[32] Zhang, W.; Fan, J.; Sun, Y., A Semiparametric model for cluster data, Annals of Statistics, 37, 5, 2377-408 (2009) · Zbl 1173.62030 · doi:10.1214/08-AOS662
[33] Zhang, X.; Park, B. U.; Wang, J. L., Time-varying additive models for longitudinal data, Journal of the American Statistical Association, 108, 503, 983-98 (2013) · Zbl 06224981 · doi:10.1080/01621459.2013.778776
[34] Zheng, Q.; Peng, L.; He, X., High dimensional censored quantile regression, The Annals of Statistics, 46, 1, 308-43 (2018) · Zbl 1416.62236 · doi:10.1214/17-AOS1551
[35] Zou, H.; Zhang, H. H., On the adaptive elastic net with a diverging number parameters, Annals of Statistics, 37, 4, 1733-51 (2009) · Zbl 1168.62064 · doi:10.1214/08-AOS625
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.