×

Ensemble subset regression (ENSURE): efficient high-dimensional prediction. (English) Zbl 07767613

Summary: In high-dimensional prediction problems, we propose subsampling the predictors prior to the analysis. Specifically, we draw features using random sampling, and then fit a model and make predictions based on the sampled feature subset. This greatly reduces the dimension, storage, and computational bottlenecks. We explore this “subset regression” strategy under a linear regression framework. We propose an ensemble method that combines multiple subset regressions, called the ensemble subset regression (ENSURE) that reduces the uncertainty due to feature sampling. We provide a theoretical upper bound on the excess risk of the predictions computed in the subset regression, and provide theoretical support that the ensemble can improve the performance of the subset regression. Detailed empirical studies demonstrate that ENSURE performs well, better than methods that use all features.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] Bach, F. (2013). Sharp analysis of low-rank kernel matrix approximations. In Proceedings of the 26th Annual Conference on Learning Theory 30, 185-209.
[2] Bair, E., Hastie, T., Paul, D. and Tibshirani, R. (2006). Prediction by supervised principal components. Journal of the American Statistical Association 101, 119-137. · Zbl 1118.62326
[3] Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37, 373-384. · Zbl 0862.62059
[4] Breiman, L. (1996). Bagging predictors. Machine Learning 26, 123-140. · Zbl 0858.68080
[5] Breiman, L. (1998). Arcing classifiers (with discussion). The Annals of Statistics 26, 801-849. · Zbl 0934.62064
[6] Breiman, L. (2001). Random forests. Machine Learning 45, 5-32. · Zbl 1007.68152
[7] Breiman, L. and Spector, P. (1992). Submodel selection and evaluation in regression. Interntional Statistical Review 60, 291-319.
[8] Brylla, R., Gutierrez-Osunab, R. and Quek, F. (2003). Attribute bagging: Improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition 36, 1291-1302. · Zbl 1033.68092
[9] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics 35, 2313-2351. · Zbl 1139.62019
[10] Cannings, T. and Samworth, R. (2017). Random projection ensemble classification. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 79, 959-1035. · Zbl 1373.62301
[11] Cook, R., Forzani, L. and Rothman, A. (2013). Predicition in abundant high-dimensional linear regression. Electronic Journal of Statistics 7, 3059-3088. · Zbl 1279.62140
[12] Dalalyan, A., Hebiri, M. and Lederer, J. (2017). On the prediction performance of the Lasso. Bernoulli 23, 552-581. · Zbl 1359.62295
[13] Efron, B., Johnstone, I., Hastie, T. and Tibshirani, R. (2004). Least angle regression. The Annals of Statistics 32, 407-499. · Zbl 1091.62054
[14] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348-1360. · Zbl 1073.62547
[15] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 70, 849-911. · Zbl 1411.62187
[16] Fard, M., Grinberg, Y., Pineau, J. and Precup, D. (2012). Compressed least-squares regression on sparse space. In Proceedings of the 26th AAAI conference on Artificial Intelligence. AAAI Press, Toronto.
[17] Frank, I. and Friedman, J. (1993). A statistical view of some chemometrics regression tools. Technometrics 35, 109-135. · Zbl 0775.62288
[18] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1-22.
[19] Fu, W. (1998). Penalized regression: The bridge versus the Lasso. Journal of Computational and Graphical Statistics 7, 397-416.
[20] Gross, D. and Nesme, V. (2010). Note on sampling without replacing from a finite collection of matrices. arXiv:1001.2738.
[21] Guhaniyogi, R. and Dunson, D. (2015). Bayesian compressed regression. Journal of the American Statistical Association 110, 1500-1513. · Zbl 1373.62100
[22] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition. Springer, New York. · Zbl 1273.62005
[23] Ho, T. (1998). The random subspace method for constructing decision forests. IEEE Transac-tions on Pattern Analysis and Machine Intelligence 20, 832-844.
[24] Hoerl, A. (1962). Application of ridge analysis to regression problems. Chemical Engineering Progress 58, 54-59.
[25] Hoerl, A. and Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal prob-lems. Technometrics 12, 55-67. · Zbl 0202.17205
[26] Krishnan, S., Bhattacharyya, C. and Hariharan, R. (2007). A randomized algorithm for large scale support learning. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems.
[27] Maillard, O. and Munos, R. (2009). Compressed least-squares regression. In Advances in Neural Information Processing Systems 22.
[28] Malo, N., Libiger, O. and Schork, N. (2008). Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. The American Journal of Human Genetics 82, 375-385.
[29] Särndal, C., Swensson, B. and Wretman, J. (2003). Model Assisted Survey Sampling. Springer, New York. · Zbl 1027.62004
[30] Scheetz, T., Kim, K., Swiderski, R., Philp, A. R., Braun, T., Knudtson, K. et al. (2006). Regulation of gene expression in the Mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences 103, 14429-14434.
[31] Shao, J. and Deng, X. (2012). Estimation in high-dimensional linear models with deterministic design matrices. The Annals of Statistics 40, 812-831. · Zbl 1273.62177
[32] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58, 267-288. · Zbl 0850.62538
[33] Tropp, J. A. (2012). User-friendly tools for random matrices: An introduction. In Advances in Neural Information Processing Systems.
[34] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped vari-ables. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 68, 49-67. · Zbl 1141.62030
[35] Ziyatdinov, A., Fonollosa, J., Fernandez, L., Gutierrez-Galvez, A., Marco, S. and Perera, A. (2015). Bioinspired early detection through gas flow modulation in chemo-sensory systems. Sensors and Actuators B: Chemical 206, 538-547.
[36] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 67, 301-320. · Zbl 1069.62054
[37] David Ruppert Department of Statistics and Data Science, School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA. E-mail: dr24@cornell.edu (Received May 2021; accepted February 2022)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.