×

The smooth-Lasso and other \(\ell _{1}+\ell _{2}\)-penalized methods. (English) Zbl 1274.62443

Summary: We consider a linear regression problem in a high-dimensional setting where the number of covariates \(p\) can be much larger than the sample size \(n\). In such a situation, one often assumes sparsity of the regression vector, i.e., the regression vector contains many zero components. We propose a Lasso-type estimator \(\hat \beta^{\text{Quad}}\) (where ‘Quad’ stands for quadratic) which is based on two penalty terms. The first one is the \(\ell _{1}\) norm of the regression coefficients used to exploit the sparsity of the regression as done by the Lasso estimator, whereas the second is a quadratic penalty term introduced to capture some additional information on the setting of the problem. We detail two special cases: the Elastic-Net \(\hat \beta^{\text{EN}}\) introduced in [H. Zou and T. Hastie, J. R. Stat. Soc., Ser. B, Stat. Methodol. 67, No. 2, 301–320 (2005; Zbl 1069.62054)], which deals with sparse problems where correlations between variables may exist; and the Smooth-Lasso \(\hat \beta^{\text{SL}}\), which responds to sparse problems where successive regression coefficients are known to vary slowly (in some situations, this can also be interpreted in terms of correlations between successive variables). From a theoretical point of view, we establish variable selection consistency results and show that \(\hat \beta^{\text{Quad}}\) achieves a Sparsity Inequality, i.e., a bound in terms of the number of non-zero components of the ‘true’ regression vector. These results are provided under a weaker assumption on the Gram matrix than the one used by the Lasso. In some situations this guarantees a significant improvement over the Lasso. Furthermore, a simulation study is conducted and shows that the S-Lasso \(\hat \beta^{\text{SL}}\) performs better than known methods as the Lasso, the Elastic-Net \(\hat \beta^{\text{EN}}\), and the Fused-Lasso (introduced in [R. Tibshirani et al., J. R. Stat. Soc., Ser. B, Stat. Methodol. 67, No. 1, 91–108 (2005; Zbl 1060.62049)]) with respect to the estimation accuracy. This is especially the case when the regression vector is ‘smooth’, i.e., when the variations between successive coefficients of the unknown parameter of the regression are small. The study also reveals that the theoretical calibration of the tuning parameters and the one based on 10 fold cross validation imply two S-Lasso solutions with close performance.

MSC:

62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
62H20 Measures of association (correlation, canonical correlation, etc.)
62F12 Asymptotic properties of parametric estimators

References:

[1] Bach, F. (2008). Consistency of the group Lasso and multiple kernel learning., J. Mach. Learn. Res. , 9 :1179-1225. · Zbl 1225.68147
[2] Belloni, A. and Chernozhukov, V. (2010)., Post- \ell 1 -Penalized Estimation in High Dimensional Sparse Linear Regression Models. Submitted. · Zbl 1209.62064
[3] Bickel, P. and Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector., Ann. Statist. , 37 (4):1705-1732 · Zbl 1173.62022 · doi:10.1214/08-AOS620
[4] Bunea, F. (2008). Consistent selection via the Lasso for high dimensional approximating regression models., IMS Collections, B. Clarke and S. Ghosal Editors , 3 :122-138. · doi:10.1214/074921708000000101
[5] Bunea, F. (2008). Honest variable selection in linear and logistic regression models via, \ell 1 and \ell 1 + \ell 2 penalization. Electron. J. Stat. 2 :1153-1194. · Zbl 1320.62170 · doi:10.1214/08-EJS287
[6] Bunea, F. and Tsybakov, A. and Wegkamp, M. (2007). Aggregation for Gaussian regression., Ann. Statist. , 35 (4):1674-1697. · Zbl 1209.62065 · doi:10.1214/009053606000001587
[7] Bunea, F. and Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso., Electron. J. Stat. , 1 :169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[8] Chesneau, C. and Hebiri, M. (2008). Some theoretical results on the grouped variables Lasso., Math. Methods Statist. , 17 (4):317-326. · Zbl 1282.62159 · doi:10.3103/S1066530708040030
[9] Dalalyan, A. and Tsybakov, A. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In, Learning theory , volume 4539 of Lecture Notes in Comput. Sci. , pages 97-111. Springer, Berlin. · Zbl 1203.62063 · doi:10.1007/978-3-540-72927-3_9
[10] Daye, Z. John and Jeng, X. Jessie (2009). Shrinkage and model selection with correlated variables via weighted fusion., Computational Statistics & Data Analysis , 53 (4):1284-1298. · Zbl 1452.62049
[11] Dümbgen, L. and van de Geer, S. and Veraar, M. and Wellner, J. (2010). Nemirovski’s inequalities revisited., Amer. Math. Monthly , 117 (2):138-160. · Zbl 1213.60039 · doi:10.4169/000298910X476059
[12] Efron, B. and Hastie, T. and Johnstone, I. and Tibshirani, R. (2004). Least angle regression., Ann. Statist. , 32 (2):407-499. With discussion, and a rejoinder by the authors. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[13] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., J. Amer. Statist. Assoc. , 96 (456):1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[14] Hebiri, M. (2008). Regularization with the Smooth-Lasso procedure. Preprint Laboratoire de Probabilités et Modèles, Aléatoires. · Zbl 1282.62159
[15] Jia, J. and Yu, B. (2008). On model selection consistency of elastic net when, p \gg n . Tech. Report 756, Statistics, UC Berkeley. · Zbl 1187.62125
[16] Kim, S. and Koh, K. and Boyd, S. and Gorinevsky, D. (2009)., l 1 trend filtering. SIAM Rev. 51 (2):339-360. · Zbl 1171.37033 · doi:10.1137/070690274
[17] Land, S. and Friedman, J. (1996). Variable fusion: a new method of adaptive signal regression. Manuscript.,
[18] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators., Electron. J. Stat. , 2 :90-102. · Zbl 1306.62155 · doi:10.1214/08-EJS177
[19] Meier, L. and van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression., J. R. Stat. Soc. Ser. B Stat. Methodol. , 70 (1):53-71. · Zbl 1400.62276 · doi:10.1111/j.1467-9868.2007.00627.x
[20] Meinshausen, N. (2007). Relaxed Lasso Titre., Comput. Statist. Data Anal. , 52 (1):374-393. · Zbl 1452.62522
[21] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso., Ann. Statist. , 34 (3):1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[22] Meinshausen, N. and Meier, L. and Bühlmann, P. (2009). p-values for high-dimensional regression., J. Amer. Statist. Assoc. , 104 :1671-1681. · Zbl 1205.62089 · doi:10.1198/jasa.2009.tm08647
[23] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data., Ann. Statist. , 37 (1):246-270. · Zbl 1155.62050 · doi:10.1214/07-AOS582
[24] Nesterov, Yu. (2007). Gradient methods for minimizing composite objective function. CORE Discussion Papers 2007076, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE). Sep, 2007.
[25] Raskutti, G. and Wainwright, M. and Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over, \ell q -balls. Submitted. · Zbl 1365.62276
[26] Rigollet, P. and Tsybakov, A. (2010). Exponential Screening and optimal rates of sparse estimation., Submitted. · Zbl 1215.62043 · doi:10.1214/10-AOS854
[27] Rinaldo, A. (2009). Properties and refinements of the fused lasso., Ann. Statist. , 37 (5B):2922-2952. · Zbl 1173.62027 · doi:10.1214/08-AOS665
[28] Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths., Ann. Statist. , 35 (3):1012-1030. · Zbl 1194.62094 · doi:10.1214/009053606000001370
[29] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B , 58 (1):267-288. · Zbl 0850.62538
[30] Tibshirani, R. and Saunders, M. and Rosset, S. and Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso., J. R. Stat. Soc. Ser. B Stat. Methodol. , 67 (1):91-108. · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[31] Tibshirani, R.J. and Taylor, J. (2010). Regularization Paths for Least Squares Problems with Generalized, \ell 1 Penalties. Submitted.
[32] Tsybakov, A. and van de Geer, S. (2005). Square root penalty: adaptation to the margin in classification and in edge estimation., Ann. Statist. , 33 (3):1203-1224. · Zbl 1080.62047 · doi:10.1214/009053604000001066
[33] van de Geer, S. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso., Elect. Journ. Statist. , 3 :1360-1392. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[34] Wainwright, M. (2006). Sharp thresholds for noisy and high-dimensional recovery of sparsity using, \ell 1 -constrained quadratic programming. Manuscript.
[35] Ye, F. and Zhang, C. (2010). Rate Minimaxity of the Lasso and Dantzig Selector for the, \ell q Loss in \ell r Balls J. Mach. Learn. Res. , 11 :3519-3540. · Zbl 1242.62074
[36] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., J. R. Stat. Soc. Ser. B Stat. Methodol. , 68 (1):49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[37] Yuan, M. and Lin, Y. (2007). On the non-negative garrote estimator., J. R. Stat. Soc. Ser. B Stat. Methodol. , 69 (2):143-161. · Zbl 1120.62052 · doi:10.1111/j.1467-9868.2007.00581.x
[38] Yueh, W-C. (2005). Eigenvalues of several tridiagonal matrices., Appl. Math. E-Notes , 2 :66-74 (electronic). · Zbl 1157.15307
[39] Zhang, C-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression., Ann. Statist. , 36 (4):1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[40] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., J. Mach. Learn. Res. , 7 :2541-2563. · Zbl 1222.62008
[41] Zou, H. (2006). The adaptive lasso and its oracle properties., J. Amer. Statist. Assoc. , 101 (476):1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[42] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., J. R. Stat. Soc. Ser. B Stat. Methodol. , 67 (2):301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
[43] Zou, H. and Zhang, H. (2009). On the adaptive elastic-net with a diverging number of parameters., Ann. Statist. , 37 (4):1733-1751. · Zbl 1168.62064 · doi:10.1214/08-AOS625
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.