×

Efficient sparse semismooth Newton methods for the clustered Lasso problem. (English) Zbl 1427.90200

Summary: We focus on solving the clustered Lasso problem, which is a least squares problem with the \(\ell_1\)-type penalties imposed on both the coefficients and their pairwise differences to learn the group structure of the regression parameters. Here we first reformulate the clustered Lasso regularizer as a weighted ordered-Lasso regularizer, which is essential in reducing the computational cost from \(O(n^2)\) to \(O(n\log (n))\). We then propose an inexact semismooth Newton augmented Lagrangian (Ssnal) algorithm to solve the clustered Lasso problem or its dual via this equivalent formulation, depending on whether the sample size is larger than the dimension of the features. An essential component of the Ssnal algorithm is the computation of the generalized Jacobian of the proximal mapping of the clustered Lasso regularizer. Based on the new formulation, we derive an efficient procedure for its computation. Comprehensive results on the global convergence and local linear convergence of the Ssnal algorithm are established. For the purpose of exposition and comparison, we also summarize/design several first-order methods that can be used to solve the problem under consideration, but with the key improvement from the new formulation of the clustered Lasso regularizer. As a demonstration of the applicability of our algorithms, numerical experiments on the clustered Lasso problem are performed. The experiments show that the Ssnal algorithm substantially outperforms the best alternative algorithm for the clustered Lasso problem.

MSC:

90C06 Large-scale problems in mathematical programming
90C25 Convex programming
90C90 Applications of mathematical programming

Software:

LIBSVM; FPC_AS

References:

[1] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183-202. · Zbl 1175.94009
[2] M. J. Best and N. Chakravarti, Active set algorithms for isotonic regression; a unifying framework, Math. Program., 47 (1990), pp. 425-439. · Zbl 0715.90085
[3] H. D. Bondell and B. J. Reich, Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR, Biometrics, 64 (2008), pp. 115-123. · Zbl 1146.62051
[4] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Tech., 2 (2011), No. 27.
[5] L. Chen, D. F. Sun, and K.-C. Toh, An efficient inexact symmetric Gauss-Seidel based majorized ADMM for high-dimensional convex composite conic programming, Math. Program., 161 (2017), pp. 237-270. · Zbl 1356.90105
[6] Y. Cui, D. F. Sun, and K.-C. Toh, On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming, Math. Program. (2018), https://doi.org/10.1007/s10107-018-1300-6. · Zbl 1423.90171
[7] J. Eckstein and D. P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Math. Program., 55 (1992), pp. 293-318. · Zbl 0765.90073
[8] F. Facchinei and J.-S. Pang, Finite-dimensional Variational Inequalities and Complementarity Problems, Springer Ser. Oper. Res. Financ. Eng., Springer, New York, 2007.
[9] M. Fazel, T. K. Pong, D. F. Sun, and P. Tseng, Hankel matrix rank minimization with applications to system identification and realization, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 946-977. · Zbl 1302.90127
[10] J. Friedman, T. Hastie, and R. Tibshirani, A Note on the Group Lasso and a Sparse Group Lasso, preprint, https://arxiv.org/abs/1001.0736, 2010.
[11] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation, Comput. Math. Appl., 2 (1976), pp. 17-40. · Zbl 0352.65034
[12] R. Glowinski and A. Marroco, Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires, Rev. Fr. Autom. Inform. Rech. Opér. Anal. Numér., 9 (1975), pp. 41-76. · Zbl 0368.65053
[13] G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins Stud. Math. Sci., Johns Hopkins University Press, Baltimore, MD, 2013. · Zbl 1268.65037
[14] J. Han and D. F. Sun, Newton and quasi-Newton methods for normal maps with polyhedral sets, J. Optim. Theory Appl., 94 (1997), pp. 659-676. · Zbl 0892.90164
[15] L. Jacob, G. Obozinski, and J.-P. Vert, Group Lasso with overlap and graph Lasso, in Proceedings of the 26th Annual International Conference on Machine Learning, Association for Computing Machinery, New York, 2009, pp. 433-440.
[16] B. Kummer, Newton’s method for non-differentiable functions, Adv. Math. Optim., 45 (1988), pp. 114-125. · Zbl 0662.65050
[17] X. Li, D. F. Sun, and K.-C. Toh, A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems, SIAM J. Optim., 28 (2018), pp. 433-458. · Zbl 1392.65062
[18] X. Li, D. F. Sun, and K.-C. Toh, On efficiently solving the subproblems of a level-set method for fused Lasso problems, SIAM J. Optim., 28 (2018), pp. 1842-1866. · Zbl 1401.90145
[19] X. Li, D. F. Sun, and K.-C. Toh, On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope, Math. Program. (2019), https://doi.org/10.1007/s10107-018-1342-9. · Zbl 1434.90116
[20] J. Liu, L. Yuan, and J. Ye, An efficient algorithm for a class of fused Lasso problems, in Proceedings of the 16th ACM SIGKDD International conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, 2010, pp. 323-332.
[21] Z. Luo, D. F. Sun, K.-C. Toh, and N. Xiu, Solving the OSCAR and SLOPE Models Using a Semismooth Newton-based Augmented Lagrangian Method, preprint, https://arxiv.org/abs/1803.10740, 2018. · Zbl 1434.68430
[22] F. J. Luque, Asymptotic convergence analysis of the proximal point algorithm, SIAM J. Control Optim., 22 (1984), pp. 277-293. · Zbl 0533.49028
[23] R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM J. Control Optim., 15 (1977), pp. 959-972. · Zbl 0376.90081
[24] S. Petry, C. Flexeder, and G. Tutz, Pairwise Fused Lasso, Technical report 102, Department of Statistics, University of Munich, Munich, 2011.
[25] L. Qi and J. Sun, A nonsmooth version of Newton’s method, Math. Program., 58 (1993), pp. 353-367. · Zbl 0780.90090
[26] S. M. Robinson, Some continuity properties of polyhedral multifunctions, Math. Program. Stud., 14 (1981), pp. 206-214. · Zbl 0449.90090
[27] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. · Zbl 0193.18401
[28] R. T. Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming, Math. Oper. Res., 1 (1976), pp. 97-116. · Zbl 0402.90076
[29] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim., 14 (1976), pp. 877-898. · Zbl 0358.90053
[30] Y. She, Sparse regression with exact clustering, Electron. J. Stat., 4 (2010), pp. 1055-1096. · Zbl 1329.62327
[31] D. F. Sun and J. Sun, Semismooth matrix-valued functions, Math. Oper. Res., 27 (2002), pp. 150-169. · Zbl 1082.49501
[32] J. Sun, On Monotropic Piecewise Quadratic Programming, PhD thesis, University of Washington, Seattle, WA, 1986.
[33] L. Tang and P. X. Song, Fused Lasso approach in regression coefficients clustering: Learning parameter heterogeneity in data integration, J. Mach. Learn. Res., 17 (2016), pp. 3915-3937. · Zbl 1368.62209
[34] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B. Stat. Methodol., 58 (1996), pp. 267-288. · Zbl 0850.62538
[35] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, Sparsity and smoothness via the fused Lasso, J. R. Stat. Soc. Ser. B. Stat. Methodol., 67 (2005), pp. 91-108. · Zbl 1060.62049
[36] Z. Wen, W. Yin, D. Goldfarb, and Y. Zhang, A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation, SIAM J. Sci. Comput., 32 (2010), pp. 1832-1857. · Zbl 1215.49039
[37] S. J. Wright, R. D. Nowak, and M. A. Figueiredo, Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., 57 (2009), pp. 2479-2493. · Zbl 1391.94442
[38] G.-B. Ye and X. Xie, Split Bregman method for large scale fused Lasso, Comput. Statist. Data Anal., 55 (2011), pp. 1552-1569. · Zbl 1328.65048
[39] Y.-L. Yu, On decomposing the proximal map, in Adv. Neural Inf. Process. Syst. 26, Curran Associates, Red Hook, NY, 2013, pp. 91-99.
[40] M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B. Stat. Methodol., 68 (2006), pp. 49-67. · Zbl 1141.62030
[41] X. Zhang, M. Burger, and S. Osher, A unified primal-dual algorithm framework based on Bregman iteration, J. Sci. Comput., 46 (2011), pp. 20-46. · Zbl 1227.65052
[42] Y. Zhang, N. Zhang, D. F. Sun, and K.-C. Toh, An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems, Math. Program. (2018), https://doi.org/10.1007/s10107-018-1329-6. · Zbl 1435.90112
[43] X.-Y. Zhao, D. F. Sun, and K.-C. Toh, A Newton-CG augmented Lagrangian method for semidefinite programming, SIAM J. Optim., 20 (2010), pp. 1737-1765. · Zbl 1213.90175
[44] L. W. Zhong and J. T. Kwok, Efficient sparse modeling with automatic feature grouping, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), pp. 1436-1447.
[45] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B. Stat. Methodol., 67 (2005), pp. 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.