Document Zbl 1320.62172

Homrighausen, Darren; McDonald, Daniel J.

Leave-one-out cross-validation is risk consistent for Lasso. (English) Zbl 1320.62172

Mach. Learn. 97, No. 1-2, 65-78 (2014).

Summary: The lasso procedure pervades the statistical and signal processing literature, and as such, is the target of substantial theoretical and applied research. While much of this research focuses on the desirable properties that lasso possesses – predictive risk consistency, sign consistency, correct model selection – these results assume that the tuning parameter is chosen in an oracle fashion. Yet, this is impossible in practice. Instead, data analysts must use the data twice, once to choose the tuning parameter and again to estimate the model. But only heuristics have ever justified such a procedure. To this end, we give the first definitive answer about the risk consistency of lasso when the smoothing parameter is chosen via cross-validation. We show that under some restrictions on the design matrix, the lasso estimator is still risk consistent with an empirically chosen tuning parameter.

Cited in 6 Documents

MSC:

62J07	Ridge regression; shrinkage estimators (Lasso)
62G05	Nonparametric estimation

Keywords:

stochastic equicontinuity; uniform convergence; persistence

Software:

PDCO; ElemStatLearn

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	Bickel, P. J., Ritov, Y., & Tsybakov, A. B. (2009). Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics, 37(4), 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[2]	Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. The Journal of Machine Learning Research, 2, 499-526. · Zbl 1007.68083
[3]	Bunea, F., Tsybakov, A., & Wegkamp, M. (2007). Sparsity oracle inequalities for the lasso. Electronic Journal of Statistics, 1, 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[4]	Chatterjee, A., & Lahiri, S. (2011). Strong consistency of lasso estimators. Sankhya A-Mathematical Statistics and Probability, 73(1), 55-78. · Zbl 1395.62208
[5]	Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33-61. · Zbl 0919.94002 · doi:10.1137/S1064827596304010
[6]	Davidson, J. (1994). Stochastic limit theory: An introduction for econometricians. Oxford: Oxford university press. · Zbl 0904.60002 · doi:10.1093/0198774036.001.0001
[7]	Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[8]	Fu, W., & Knight, K. (2000). Asymptotics for lasso-type estimators. The Annals of Statistics, 28(5), 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[9]	van de Geer, S., & Lederer, J. (2013). The Lasso, correlated design, and improved oracle inequalities. (2011). http://arxiv.org/abs/1107.0189 · Zbl 1327.62426
[10]	Grandvalet, Y. (1998). Least absolute shrinkage is equivalent to quadratic penalization. In ICANN 98 (pp. 201-206). London: Springer
[11]	Greenshtein, E., & Ritov, Y. A. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10(6), 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[12]	Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression, Verlag: Springer. · Zbl 1021.62024
[13]	Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Verlag: Springer. · Zbl 1273.62005
[14]	Lee, S., Zhu, J., & Xing, E. P. (2010). Adaptive multi-task Lasso: With application to eQTL detection. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.) Advances in neural information processing systems, vol. 23 (pp. 1306-1314 ).
[15]	Leng, C., Lin, Y., & Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statistica Sinica, 16(4), 1273-1284. · Zbl 1109.62056
[16]	Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[17]	Newey, W. K. (1991). Uniform convergence in probability and stochastic equicontinuity. Econometrica, 59(4), 1161-1167. · Zbl 0743.60012 · doi:10.2307/2938179
[18]	Osborne, M., Presnell, B., & Turlach, B. (2000). On the lasso and its dual. Journal of Computational and Graphical statistics, 9(2), 319-337.
[19]	Schaffer, C. (1993). Selecting a classification method by cross-validation. Machine Learning, 13, 135-143.
[20]	Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88, 486-494. · Zbl 0773.62051 · doi:10.1080/01621459.1993.10476299
[21]	Shi, W., Wahba, G., Wright, S., Lee, K., Klein, R., & Klein, B. (2008). LASSO-patternsearch algorithm with application to ophthalmology and genomic data. Statistics and its Interface, 1(1), 137. · Zbl 1230.62093 · doi:10.4310/SII.2008.v1.n1.a12
[22]	Stromberg, K. (1994). Probability for analysts. London: Chapman & Hall. · Zbl 0803.60001
[23]	Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58(1), 267-288. · Zbl 0850.62538
[24]	Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 273-282. · Zbl 1411.62212 · doi:10.1111/j.1467-9868.2011.00771.x
[25]	Tibshirani, R. J. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics, 7, 1456-1490. · Zbl 1337.62173 · doi:10.1214/13-EJS815
[26]	Tibshirani, R. J., & Taylor, J. (2012). Degrees of freedom in lasso problems. The Annals of Statistics, 40, 1198-1232. · Zbl 1274.62469 · doi:10.1214/12-AOS1003
[27]	Wang, H., & Leng, C. (2007). Unified lasso estimation by least squares approximation. Journal of the American Statistical Association, 102(479), 1039-1048. · Zbl 1306.62167 · doi:10.1198/016214507000000509
[28]	Xu, H., Mannor, S., & Caramanis, C. (2008). Sparse algorithms are not stable: A no-free-lunch theorem. In: Proceedings of the IEEE 46th Annual Allerton Conference on Communication, Control, and Computing, (pp. 1299-1303).
[29]	Zou, H., Hastie, T., & Tibshirani, R. (2007). On the degrees of freedom of the lasso. The Annals of Statistics, 35(5), 2173-2192. · Zbl 1126.62061 · doi:10.1214/009053607000000127

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.