×

On the sensitivity of the Lasso to the number of predictor variables. (English) Zbl 1442.62160

Summary: The Lasso is a computationally efficient regression regularization procedure that can produce sparse estimators when the number of predictors \((p)\) is large. Oracle inequalities provide probability loss bounds for the Lasso estimator at a deterministic choice of the regularization parameter. These bounds tend to zero if \(p\) is appropriately controlled, and are thus commonly cited as theoretical justification for the Lasso and its ability to handle high-dimensional settings. Unfortunately, in practice the regularization parameter is not selected to be a deterministic quantity, but is instead chosen using a random, data-dependent procedure. To address this shortcoming of previous theoretical work, we study the loss of the Lasso estimator when tuned optimally for prediction. Assuming orthonormal predictors and a sparse true model, we prove that the probability that the best possible predictive performance of the Lasso deteriorates as \(p\) increases is positive and can be arbitrarily close to one given a sufficiently high signal to noise ratio and sufficiently large \(p\). We further demonstrate empirically that the amount of deterioration in performance can be far worse than the oracle inequalities suggest and provide a real data example where deterioration is observed.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)

References:

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In International Symposium on Information Theory, 2nd, Tsahkadsor, Armenian SSR 267-281. · Zbl 0283.62006
[2] Ando, T. and Li, K.-C. (2014). A model-averaging approach for high-dimensional regression. J. Amer. Statist. Assoc.109 254-265. · Zbl 1367.62209
[3] Bertsimas, D., King, A. and Mazumder, R. (2016). Best subset selection via a modern optimization lens. Ann. Statist.44 813-852. · Zbl 1335.62115 · doi:10.1214/15-AOS1388
[4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist.37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[5] Bien, J., Taylor, J. and Tibshirani, R. (2013). A Lasso for hierarchical interactions. Ann. Statist.41 1111-1141. · Zbl 1292.62109 · doi:10.1214/13-AOS1096
[6] Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli19 1212-1242. · Zbl 1273.62173
[7] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer, Berlin. · Zbl 1273.62015
[8] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via \(l_1\) penalized least squares. In Learning Theory. Lecture Notes in Computer Science4005 379-391. Springer, Berlin. · Zbl 1143.62319
[9] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007a). Aggregation for Gaussian regression. Ann. Statist.35 1674-1697. · Zbl 1209.62065 · doi:10.1214/009053606000001587
[10] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007b). Sparsity oracle inequalities for the Lasso. Electron. J. Stat.1 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[11] Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by \(ℓ_1\) minimization. Ann. Statist.37 2145-2177. · Zbl 1173.62053
[12] Chatterjee, S. (2014). A new perspective on least squares under convex constraint. Ann. Statist.42 2340-2381. · Zbl 1302.62053 · doi:10.1214/14-AOS1254
[13] Craven, P. and Wahba, G. (1978). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math.31 377-403. · Zbl 0377.65007 · doi:10.1007/BF01404567
[14] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist.32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[15] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc.96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[16] Flynn, C. J., Hurvich, C. M. and Simonoff, J. S. (2013). Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. J. Amer. Statist. Assoc.108 1031-1043. · Zbl 06224985
[17] Flynn, C. J., Hurvich, C. M. and Simonoff, J. S. (2016). Deterioration of performance of the Lasso with many predictors: Discussion of a paper by Tutz and Gertheiss. Stat. Model.16 212-216.
[18] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33 1-22.
[19] Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under \(l_1\) constraint. Ann. Statist.34 2367-2386. · Zbl 1106.62022 · doi:10.1214/009053606000000768
[20] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[21] Homrighausen, D. and McDonald, D. J. (2014). Leave-one-out cross-validation is risk consistent for lasso. Mach. Learn.97 65-78. · Zbl 1320.62172 · doi:10.1007/s10994-014-5438-z
[22] Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika76 297-307. · Zbl 0669.62085 · doi:10.1093/biomet/76.2.297
[23] Hyndman, R. J., Booth, H. and Yasmeen, F. (2013). Coherent mortality forecasting: The product-ratio method with functional time series models. Demography50 261-283.
[24] Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statist. Sinica16 1273-1284. · Zbl 1109.62056
[25] Lin, D., Foster, D. P. and Ungar, L. H. (2011). VIF regression: A fast regression algorithm for large data. J. Amer. Statist. Assoc.106 232-247. · Zbl 1396.62170
[26] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal.52 374-393. · Zbl 1452.62522
[27] Rhee, S.-Y., Taylor, J., Wadhera, G., Ben-Hur, A. and Brutlag, D. L. (2006). Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc. Natl. Acad. Sci. USA103 17355-17360.
[28] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist.6 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[29] Thrampoulidis, C., Panahi, A. and Hassibi, B. (2015). Asymptotically exact error analysis for the generalized \(l_2^2\)-LASSO. Preprint. Available at arXiv:1502.06287.
[30] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B58 267-288. · Zbl 0850.62538
[31] Vidaurre, D., Bielza, C. and Larrañaga, P. (2013). A survey of \(L_1\) regression. Int. Stat. Rev.81 361-387. · Zbl 1416.62400
[32] Yu, Y. and Feng, Y. (2014). Modified cross-validation for penalized high-dimensional linear regression models. J. Comput. Graph. Statist.23 1009-1027.
[33] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist.35 2173-2192. · Zbl 1126.62061 · doi:10.1214/009053607000000127
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.