×

Scalable and efficient inference via CPE. (English) Zbl 07706299

Summary: Two primary concerns of inference for high-dimensional data are statistical accuracy and computational efficiency. Despite the appealing asymptotic properties of existing de-biasing methods, the de-biasing step is generally considered to be computationally intensive. In this article, we propose the constrained projection estimator (CPE) for deriving confidence intervals in a scalable and efficient way under high dimensions when the unknown parameters adopt an approximately sparse structure. The proposed method is implemented on the constrained projection spaces corresponding to the identifiable signals determined by a prescreening procedure, which significantly reduces the computational cost in comparison to the full de-biasing steps. Theoretically, we demonstrate that the proposed inference method enjoys equivalent asymptotic efficiency to the full de-biasing procedure in view of the lengths of confidence intervals. We demonstrate the scalability and effectiveness of the proposed method through simulation and real data studies.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] Aitchison, J., Principal component analysis of compositional data, Biometrika, 70, 1, 57-65 (1983) · Zbl 0515.62057 · doi:10.1093/biomet/70.1.57
[2] Battey, H.; Fan, J.; Liu, H.; Lu, J.; Zhu, Z., Distributed testing and estimation under sparse high dimensional models, The Annals of Statistics, 46, 3, 1352-82 (2018) · Zbl 1392.62060 · doi:10.1214/17-AOS1587
[3] Belloni, A.; Chernozhukov, V.; Kato, K., Uniform post selection inference for least absolute deviation regression and other Z-estimation problems, Biometrika, 102, 1, 77-94 (2015) · Zbl 1345.62049 · doi:10.1093/biomet/asu056
[4] Berk, R.; Brown, L.; Buja, A.; Zhang, K.; Zhao, L., Valid post-selection inference, The Annals of Statistics, 41, 2, 802-37 (2013) · Zbl 1267.62080 · doi:10.1214/12-AOS1077
[5] Bickel, P. J.; Ritov, Y. A.; Tsybakov, A. B., Simultaneous analysis of Lasso and Dantzig selector, The Annals of Statistics, 37, 4, 1705-32 (2009) · Zbl 1173.62022 · doi:10.1214/08-AOS620
[6] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed optimization and statistical learning via the alternative direction method of multipliers, Foundations and Trends® in Machine Learning, 3, 1, 1-122 (2010) · Zbl 1229.90122 · doi:10.1561/2200000016
[7] Bühlmann, P., Statistical significance in high-dimensional linear models, Bernoulli, 19, 4, 1212-42 (2013) · Zbl 1273.62173 · doi:10.3150/12-BEJSP11
[8] Candès, E.; Tao, T., The Dantzig selector: Statistical estimation when p is much larger than n (with discussion), Annals of Statistics, 35, 2313-404 (2007) · Zbl 1139.62019
[9] Chen, X.; Xie, M., A split-and-conquer approach for analysis of extraordinarily large data, Annals of Statistics, 24, 1655-84 (2014) · Zbl 1480.62258
[10] Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R., Least angle regression, The Annals of Statistics, 32, 2, 407-99 (2004) · doi:10.1214/009053604000000067
[11] Fan, J.; Feng, Y.; Wu, Y., Network exploration via the adaptive Lasso and SCAD penalties, The Annals of Applied Statistics, 3, 2, 521-41 (2009) · Zbl 1166.62040 · doi:10.1214/08-AOAS215
[12] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 456, 1348-60 (2001) · Zbl 1073.62547 · doi:10.1198/016214501753382273
[13] Fan, J.; Lv, J., Sure independence screening for ultrahigh dimensional feature space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 846-911 (2008) · Zbl 1411.62187
[14] Fan, Y.; Lv, J., Asymptotic properties for combined L_1 and concave regularization, Biometrika, 101, 1, 57-70 (2014) · Zbl 1285.62074 · doi:10.1093/biomet/ast047
[15] Fan, Y.; Kong, Y.; Li, D.; Zheng, Z., Innovated interaction screening for high-dimensional nonlinear classification, The Annals of Statistics, 43, 3, 1243-72 (2015) · Zbl 1328.62383 · doi:10.1214/14-AOS1308
[16] Javanmard, A.; Montanari, A., Confidence intervals and hypothesis testing for high-dimensional regression, Journal of Machine Learning Research, 15, 2869-909 (2014) · Zbl 1319.62145
[17] Kong, Y.; Zheng, Z.; Lv, J., The constrained Dantzig selector with enhanced consistency, Journal of Machine Learning Research, 17, 1-22 (2016) · Zbl 1391.94276
[18] Lee, J.; Sun, D.; Sun, Y.; Taylor, J., Exact post-selection inference with application to the Lasso, The Annals of Statistics, 44, 3, 907-27 (2016) · Zbl 1341.62061 · doi:10.1214/15-AOS1371
[19] Lin, W.; Shi, P.; Feng, R.; Li, H., Variable selection in regression with compositional covariates, Biometrika, 101, 4, 785-97 (2014) · Zbl 1306.62164 · doi:10.1093/biomet/asu031
[20] Meinshausen, N.; Bühlmann, P., Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72, 4, 417-73 (2010) · Zbl 1411.62142 · doi:10.1111/j.1467-9868.2010.00740.x
[21] Minnier, L.; Tian, L.; Cai, T., A perturbation method for inference on regularized regression estimates, Journal of the American Statistical Association, 106, 496, 1371-82 (2011) · Zbl 1323.62076 · doi:10.1198/jasa.2011.tm10382
[22] Ren, Z.; Sun, T.; Zhang, C.-H.; Zhou, H., Asymptotic normality and optimalities in estimation of large Gaussian graphical model, The Annals of Statistics, 43, 3, 901-1026 (2015) · Zbl 1328.62342 · doi:10.1214/14-AOS1286
[23] Sun, T.; Zhang, C.-H., Scaled sparse linear regression, Biometrika, 99, 4, 879-98 (2012) · Zbl 1452.62515 · doi:10.1093/biomet/ass043
[24] Tibshirani, R., Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58, 267-88 (1996) · Zbl 0850.62538
[25] Tibshirani, R.; Taylor, J.; Lockhart, R.; Tibshirani, R., Exact post-selection inference for sequential regression procedures, Journal of the American Statistical Association, 111, 514, 600-20 (2016) · doi:10.1080/01621459.2015.1108848
[26] van de Geer, S.; Bühlmann, P.; Ritov, Y.; Dezeure, R., On asymptotically optimal confidence regions and tests for high-dimensional models, The Annals of Statistics, 42, 3, 1166-202 (2014) · Zbl 1305.62259 · doi:10.1214/14-AOS1221
[27] Wang, X.; Leng, C., High dimensional ordinary least squares projection for screening variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 3, 589-611 (2016) · Zbl 1414.62313 · doi:10.1111/rssb.12127
[28] Wasserman, L.; Roeder, K., High dimensional variable selection, The Annals of Statistics, 37, 5, 2178-201 (2009) · Zbl 1173.62054 · doi:10.1214/08-AOS646
[29] Wright, S., Coordinate descent algorithms, Mathematical Programming, 151, 1, 3-34 (2015) · Zbl 1317.49038 · doi:10.1007/s10107-015-0892-3
[30] Zhang, C.-H.; Zhang, S., Confidence intervals for low dimensional parameters in high dimensional linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 1, 217-42 (2014) · Zbl 1411.62196 · doi:10.1111/rssb.12026
[31] Zou, H., The adaptive Lasso and its oracle properties, Journal of the American Statistical Association, 101, 476, 1418-29 (2006) · Zbl 1171.62326 · doi:10.1198/016214506000000735
[32] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 2, 301-20 (2005) · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.