×

Multi-stage convex relaxation for feature selection. (English) Zbl 1359.62293

Summary: A number of recent work studied the effectiveness of feature selection using Lasso. It is known that under the restricted isometry properties (RIP), Lasso does not generally lead to the exact recovery of the set of nonzero coefficients, due to the looseness of convex relaxation. This paper considers the feature selection property of nonconvex regularization, where the solution is given by a multi-stage convex relaxation scheme. The nonconvex regularizer requires two tuning parameters (compared to one tuning parameter for Lasso). Although the method is more complex than Lasso, we show that under appropriate conditions including the dependence of a tuning parameter on the support set size, the local solution obtained by this procedure recovers the set of nonzero coefficients without suffering from the bias of Lasso relaxation, which complements parameter estimation results of this procedure in [T. Zhang, J. Mach. Learn. Res. 11, 1081–1107 (2010; Zbl 1242.68262)].

MSC:

62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
68T05 Learning and adaptive systems in artificial intelligence

Citations:

Zbl 1242.68262

Software:

foba

References:

[1] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[2] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[3] Candes, E.J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203-4215. · Zbl 1264.94121 · doi:10.1109/TIT.2005.858979
[4] Candès, E.J., Wakin, M.B. and Boyd, S.P. (2008). Enhancing sparsity by reweighted \(l_{1}\) minimization. J. Fourier Anal. Appl. 14 877-905. · Zbl 1176.94014 · doi:10.1007/s00041-008-9045-x
[5] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[6] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7-57. · Zbl 1168.62044 · doi:10.1214/07-AIHP146
[7] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90-102. · Zbl 1306.62155 · doi:10.1214/08-EJS177
[8] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[9] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[10] van de Geer, S.A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[11] Wainwright, M.J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_{1}\)-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[12] Wipf, D.P. and Nagarajan, S. (2010). Iterative reweighted \(\ell_{1}\) and \(\ell_{2}\) methods for finding sparse solutions. Journal of Selected Topics in Signal Processing 4 317-329.
[13] Zhang, C.H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120 · doi:10.1214/09-AOS729
[14] Zhang, C.H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[15] Zhang, T. (2009). Some sharp performance bounds for least squares regression with \(L_{1}\) regularization. Ann. Statist. 37 2109-2144. · Zbl 1173.62029 · doi:10.1214/08-AOS659
[16] Zhang, T. (2010). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081-1107. · Zbl 1242.68262
[17] Zhang, T. (2011). Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE Trans. Inform. Theory 57 4689-4708. · Zbl 1365.62288 · doi:10.1109/TIT.2011.2146690
[18] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
[19] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[20] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509-1533. · Zbl 1142.62027 · doi:10.1214/009053607000000802
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.