×

A coordinate gradient descent method for \(\ell_{1}\)-regularized convex minimization. (English) Zbl 1220.90092

Summary: In applications such as signal processing and statistics, many problems involve finding sparse solutions to under-determined linear systems of equations. These problems can be formulated as a structured nonsmooth optimization problems, i.e., the problem of minimizing \(\ell_{1}\)-regularized linear least squares problems. In this paper, we propose a block coordinate gradient descent method (abbreviated as CGD) to solve the more general \(\ell_{1}\)-regularized convex minimization problems, i.e., the problem of minimizing an \(\ell_{1}\)-regularized convex smooth function. We establish a Q-linear convergence rate for our method when the coordinate block is chosen by a Gauss-Southwell-type rule to ensure sufficient descent. We propose efficient implementations of the CGD method and report numerical results for solving large-scale \(\ell_{1}\)-regularized linear least squares problems arising in compressed sensing and image deconvolution as well as large-scale \(\ell_{1}\)-regularized logistic regression problems for feature selection in data classification. Comparison with several state-of-the-art algorithms specifically designed for solving large-scale \(\ell_{1}\)-regularized linear least squares or logistic regression problems suggests that an efficiently implemented CGD method may outperform these algorithms despite the fact that the CGD method is not specifically designed just to solve these special classes of problems.

MSC:

90C25 Convex programming
90C52 Methods of reduced gradient type

Software:

RCV1; LIBSVM; PDCO
Full Text: DOI

References:

[1] Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
[2] Bradley, P.S., Fayyad, U.M., Mangasarian, O.L.: Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput. 11, 217–238 (1999) · Zbl 0973.90096 · doi:10.1287/ijoc.11.3.217
[3] Candès, E.J., Tao, T.: Nearly optimal signal recovery from random projections: Universal encoding strategies. IEEE Trans. Inf. Theory 52, 5406–5425 (2006) · Zbl 1309.94033 · doi:10.1109/TIT.2006.885507
[4] Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52, 489–509 (2006) · Zbl 1231.94017 · doi:10.1109/TIT.2005.862083
[5] Chang, C.-C., Lin, C.-J.: LIBSVM–A library for support vector machines. http://www.csie.ntu.edu.tw/\(\sim\)cjlin/libsvmtools/datasets/
[6] Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1999) · Zbl 0919.94002 · doi:10.1137/S1064827596304010
[7] Daubechies, I., De Friese, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004) · Zbl 1077.65055 · doi:10.1002/cpa.20042
[8] Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006) · Zbl 1288.94016 · doi:10.1109/TIT.2006.871582
[9] Donoho, D., Tsaig, Y.: Fast solution of 1-norm minimization problems when the solution may be sparse. IEEE Trans. Inf. Theory 54, 4789–4812 (2008) · Zbl 1247.94009 · doi:10.1109/TIT.2008.929958
[10] Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004) · Zbl 1091.62054 · doi:10.1214/009053604000000067
[11] Figueiredo, M., Nowak, R.: An EM algorithm for wavelet-based image restoration. IEEE Trans. Image Process. 12, 906–916 (2003) · Zbl 1279.94015 · doi:10.1109/TIP.2003.814255
[12] Figueiredo, M., Nowak, R.: A bound optimization approach to wavelet-based image deconvolution. In: IEEE Int. Conf. on Image Processing–ICIP’05, 2005
[13] Figueiredo, M., Nowak, R., Wright, S.J.: Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1, 586–598 (2007) · doi:10.1109/JSTSP.2007.910281
[14] Fuchs, J.-J.: On sparse representations in arbitrary redundant bases. IEEE Trans. Inf. Theory 50, 1341–1344 (2004) · Zbl 1284.94018 · doi:10.1109/TIT.2004.828141
[15] Genkin, A., Lewis, D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49, 291–304 (2007) · doi:10.1198/004017007000000245
[16] Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999) · doi:10.1126/science.286.5439.531
[17] Hale, E.T., Yin, W., Zhang, Y.: A fixed-point continuation method for 1-regularized minimization with applications to compressed sensing. CAAM Technical report TR07-07, Department of Computational and Applied Mathematics, Rice University (July 2007)
[18] Kim, S.-J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale 1-regularized least squares. IEEE J. Sel. Top. Signal Process. 1, 606–617 (2007) · doi:10.1109/JSTSP.2007.910971
[19] Koh, K., Kim, S.-J., Boyd, S.: An interior-point method for large-scale 1-regularized logistic regression. J. Mach. Learn. Res. 8, 1519–1555 (2007) · Zbl 1222.62092
[20] Lee, S., Lee, H., Abeel, P., Ng, A.: Efficient 1-regularized logistic regression. In: Proceedings of the 21st National Conference on Artificial Intelligence, 2006
[21] Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
[22] Lin, C.-J., Weng, R.C., Keerthi, S.S.: Trust region Newton method for large-scale logistic regression. J. Mach. Learn. Res. 9, 627–650 (2008) · Zbl 1225.68195
[23] Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993) · Zbl 0793.90076 · doi:10.1007/BF02096261
[24] Mangasarian, O.L., Musicant, D.R.: Large scale kernel regression via linear programming. Mach. Learn. 46, 255–269 (2002) · Zbl 0998.68106 · doi:10.1023/A:1012422931930
[25] Ng, A.Y.: Feature selection, 1 vs. 2 regularization, and rotational invariance, In: Proceedings of the 21st International Conference on Machine Learning, 2004
[26] Osborne, M., Presnell, B., Turlach, B.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20, 389–403 (2000) · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389
[27] Park, M., Hastie, T.: An 1 regularization-path algorithm for generalized linear models. J. R. Stat. Soc. B 69, 659–677 (2007) · doi:10.1111/j.1467-9868.2007.00607.x
[28] Sardy, S., Tseng, P.: AMlet, RAMlet, and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Stat. 13, 283–309 (2004) · doi:10.1198/1061860043434
[29] Shi, W., Wahba, G., Wright, S.J., Lee, K., Klein, R., Klein, B.: Lasso-patternsearch algorithm with application to ophthalmology and genomic data. Stat. Interface 1, 137–153 (2008) · Zbl 1230.62093 · doi:10.4310/SII.2008.v1.n1.a12
[30] Starck, J.-L., Nguyen, M., Murtagh, F.: Wavelets and curvelets for image deconvolution: a combined approach. Signal Process. 83, 2279–2283 (2003) · Zbl 1145.94329 · doi:10.1016/S0165-1684(03)00150-6
[31] Tropp, J.A.: Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Inf. Theory 51, 1030–1051 (2006) · Zbl 1288.94025 · doi:10.1109/TIT.2005.864420
[32] Tsaig, Y., Donoho, D.: Extensions of compressed sensing. Signal Process. 86, 533–548 (2005) · Zbl 1163.94398 · doi:10.1016/j.sigpro.2005.05.028
[33] Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009) · Zbl 1166.90016 · doi:10.1007/s10107-007-0170-0
[34] Wang, L.: Efficient regularized solution path algorithms with applications in machine learning and data mining. Ph.D. thesis, University of Michigan (2008)
[35] Wright, S.J., Nowak, R., Figueiredo, M.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. (2007, to appear) · Zbl 1391.94442
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.