×

A convex-Nonconvex strategy for grouped variable selection. (English) Zbl 07784487

Summary: This paper deals with the grouped variable selection problem. A widely used strategy is to augment the negative log-likelihood function with a sparsity-promoting penalty. Existing methods include the group Lasso, group SCAD, and group MCP. The group Lasso solves a convex optimization problem but suffers from underestimation bias. The group SCAD and group MCP avoid this estimation bias but require solving a nonconvex optimization problem that may be plagued by suboptimal local optima. In this work, we propose an alternative method based on the generalized minimax concave (GMC) penalty, which is a folded concave penalty that maintains the convexity of the objective function. We develop a new method for grouped variable selection in linear regression, the group GMC, that generalizes the strategy of the original GMC estimator. We present a primal-dual algorithm for computing the group GMC estimator and also prove properties of the solution path to guide its numerical computation and tuning parameter selection in practice. We establish error bounds for both the group GMC and original GMC estimators. A rich set of simulation studies and a real data application indicate that the proposed group GMC approach outperforms existing methods in several different aspects under a wide array of scenarios.

MSC:

62-XX Statistics

Software:

FASTA

References:

[1] ABE, J., YAMAGISHI, M. and YAMADA, I. (2019). Convexity-edge-preserving signal recovery with linearly involved generalized minimax concave penalty function. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4918-4922. IEEE.
[2] ABE, J., YAMAGISHI, M. and YAMADA, I. (2020). Linearly involved generalized Moreau enhanced models and their proximal splitting algorithm under overall convexity condition. Inverse Problems 36 035012. MathSciNet: MR4068241 · Zbl 1435.90042
[3] BAUSCHKE, H. H. and COMBETTES, P. L. (2011). Convex Analysis and Monotone Operator Theory in Hilbert Spaces 408. Springer. MathSciNet: MR2798533 · Zbl 1218.47001
[4] BAYRAM, I. (2015). On the convergence of the iterative shrinkage/thresholding algorithm with a weakly convex penalty. IEEE Transactions on Signal Processing 64 1597-1608. MathSciNet: MR3548876 · Zbl 1412.94018
[5] BLAKE, A. and ZISSERMAN, A. (1987). Visual Reconstruction. MIT Press. MathSciNet: MR0919733
[6] BREHENY, P. and HUANG, J. (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing 25 173-187. MathSciNet: MR3306699 · Zbl 1331.62359
[7] CHAMBOLLE, A. and POCK, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40 120-145. MathSciNet: MR2782122 · Zbl 1255.68217
[8] CHEN, Y., YAMAGISHI, M. and YAMADA, I. (2023). A unified design of generalized Moreau enhancement matrix for sparsity aware LiGME models. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2022EAP1118.
[9] ESSER, E., ZHANG, X. and CHAN, T. (2009). A general framework for a class of first order primal-dual algorithms for TV minimization. UCLA CAM Report 09-67.
[10] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96 1348-1360. MathSciNet: MR1946581 · Zbl 1073.62547
[11] FAN, J., XUE, L. and ZOU, H. (2014). Strong oracle optimality of folded concave penalized estimation. Annals of Statistics 42 819. MathSciNet: MR3210988 · Zbl 1305.62252
[12] GOLDSTEIN, T., LI, M. and YUAN, X. (2015). Adaptive primal-dual splitting methods for statistical learning and image processing. Advances in Neural Information Processing Systems 28 2089-2097.
[13] GOLDSTEIN, T., STUDER, C. and BARANIUK, R. (2014). A field guide to forward-backward splitting with a FASTA implementation. arXiv eprint arXiv:1411.3406.
[14] GOLDSTEIN, T., STUDER, C. and BARANIUK, R. (2015). FASTA: A feneralized implementation of forward-backward splitting. arXiv preprint arXiv:1501.04979.
[15] GOLDSTEIN, T., LI, M., YUAN, X., ESSER, E. and BARANIUK, R. (2013). Adaptive primal-dual hybrid gradient methods for saddle-point problems. arXiv preprint arXiv:1305.0546.
[16] HASTIE, T., TIBSHIRANI, R. and WAINWRIGHT, M. (2015). Statistical learning with sparsity. Monographs on Statistics and Applied Probability 143 143. MathSciNet: MR3616141 · Zbl 1319.68003
[17] HOSMER, D. W. and LEMESHOW, S. (1989). Applied Logistic Regression. John Wiley & Sons. · Zbl 0967.62045
[18] HUANG, J., BREHENY, P. and MA, S. (2012). A selective review of group selection in high-dimensional models. Statistical Science 27 481-499. MathSciNet: MR3025130 · Zbl 1331.62347
[19] HUBER, P. J. (1992). Robust estimation of a location parameter. In Breakthroughs in Statistics 492-518. Springer.
[20] KIM, Y., KIM, J. and KIM, Y. (2006). Blockwise sparse regression. Statistica Sinica 16 375-390. MathSciNet: MR2267240 · Zbl 1096.62076
[21] LANZA, A., MORIGI, S., SELESNICK, I. W. and SGALLARI, F. (2019). Sparsity-inducing nonconvex nonseparable regularization for convex image processing. SIAM Journal on Imaging Sciences 12 1099-1134. MathSciNet: MR3968243 · Zbl 1524.94023
[22] LIU, X. and CHI, C. E. (2022). Revisiting convexity-preserving signal recovery with the linearly involved GMC penalty. Pattern Recognition Letters 156 60-66.
[23] LOH, P.-L. and WAINWRIGHT, M. J. (2015). Regularized \(M\)-estimators with nonconvexity: Statistical and algorithmic theory for local optima. Journal of Machine Learning Research 16 559-616. MathSciNet: MR3335800 · Zbl 1360.62276
[24] Meier, L., Van De Geer, S. and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 53-71. MathSciNet: MR2412631 · Zbl 1400.62276
[25] NEGAHBAN, S. N., RAVIKUMAR, P., WAINWRIGHT, M. J. and YU, B. (2012). A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. Statistical Science 27 538-557. Digital Object Identifier: 10.1214/12-STS400 Google Scholar: Lookup Link MathSciNet: MR3025133 · Zbl 1331.62350 · doi:10.1214/12-STS400
[26] NIKOLOVA, M. (1998). Estimation of binary images by minimizing convex criteria. In Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No. 98CB36269) 2 108-112. IEEE.
[27] NIKOLOVA, M., NG, M. K. and TAM, C.-P. (2010). Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction. IEEE Transactions on Image Processing 19 3073-3088. MathSciNet: MR2789705 · Zbl 1371.94277
[28] ROCKAFELLAR, R. T. (1970). Convex Analysis. Princeton Mathematical Series. Princeton University Press, Princeton, N.J. MathSciNet: MR0274683 · Zbl 0193.18401
[29] SELESNICK, I. (2017a). Total variation denoising via the Moreau envelope. IEEE Signal Processing Letters 24 216-220.
[30] SELESNICK, I. (2017b). Sparse regularization via convex analysis. IEEE Transactions on Signal Processing 65 4481-4494. MathSciNet: MR3684078 · Zbl 1414.94545
[31] SELESNICK, I., LANZA, A., MORIGI, S. and SGALLARI, F. (2020). Non-convex total variation regularization for convex denoising of signals. Journal of Mathematical Imaging and Vision 62 825-841. MathSciNet: MR4126452 · Zbl 1485.94026
[32] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 267-288. MathSciNet: MR1379242 · Zbl 0850.62538
[33] TIBSHIRANI, R. J. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics 7 1456-1490. MathSciNet: MR3066375 · Zbl 1337.62173
[34] Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. Digital Object Identifier: 10.1017/9781108627771 Google Scholar: Lookup Link MathSciNet: MR3967104 · Zbl 1457.62011 · doi:10.1017/9781108627771
[35] WANG, L., CHEN, G. and LI, H. (2007). Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 23 1486-1494.
[36] WANG, L., LI, H. and HUANG, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association 103 1556-1569. MathSciNet: MR2504204 · Zbl 1286.62034
[37] Wei, F. and Huang, J. (2010). Consistent group selection in high-dimensional linear regression. Bernoulli 16 1369-1384. Digital Object Identifier: 10.3150/10-BEJ252 Google Scholar: Lookup Link MathSciNet: MR2759183 · Zbl 1207.62146 · doi:10.3150/10-BEJ252
[38] YATA, W., YAMAGISHI, M. and YAMADA, I. (2022). A constrained LIGME model and its proximal splitting algorithm under overall convexity condition. Journal of Applied & Numerical Optimization 4.
[39] YUAN, M. and LIN, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 49-67. MathSciNet: MR2212574 · Zbl 1141.62030
[40] ZENG, Y., YANG, T. and BREHENY, P. (2021). Hybrid safe-strong rules for efficient optimization in lasso-type problems. Computational Statistics & Data Analysis 153 107063. MathSciNet: MR4146815 · Zbl 1510.62313
[41] ZHANG, C.-H. et al. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38 894-942. MathSciNet: MR2604701 · Zbl 1183.62120
[42] ZHANG, C.-H. and ZHANG, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science 27 576-593. MathSciNet: MR3025135 · Zbl 1331.62353
[43] ZHAO, P., ROCHA, G., YU, B. et al. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics 37 3468-3497. MathSciNet: MR2549566 · Zbl 1369.62164
[44] ZOU, J., SHEN, M., ZHANG, Y., LI, H., LIU, G. and DING, S. (2018). Total variation denoising with non-convex regularizers. IEEE Access 7 4422-4431.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.