×

Frank-Wolfe methods with an unbounded feasible region and applications to structured learning. (English) Zbl 1508.90062

Summary: The Frank-Wolfe method is a popular algorithm for solving large-scale convex optimization problems appearing in structured statistical learning. However, the traditional Frank-Wolfe method can only be applied when the feasible region is bounded, which limits its applicability in practice. Motivated by two applications in statistical learning, the \(\ell_1\) trend filtering problem and matrix optimization problems with generalized nuclear norm constraints, we study a family of convex optimization problems where the unbounded feasible region is the direct sum of an unbounded linear subspace and a bounded constraint set. We propose two new Frank-Wolfe methods, the unbounded Frank-Wolfe method and the unbounded away-step Frank-Wolfe method, for solving a family of convex optimization problems with this class of unbounded feasible regions. We show that under proper regularity conditions, the unbounded Frank-Wolfe method has an \(O(1/k)\) sublinear convergence rate, and the unbounded away-step Frank-Wolfe method has a linear convergence rate, matching the best-known results for the Frank-Wolfe method when the feasible region is bounded. Furthermore, computational experiments indicate that our proposed methods appear to outperform alternative solvers.

MSC:

90C25 Convex programming
90C22 Semidefinite programming
90C06 Large-scale problems in mathematical programming
90-08 Computational methods for problems pertaining to operations research and mathematical programming

Software:

CVXPY; SCS; Mosek; softImpute

References:

[1] C. M. Alaíz, A. Barbero, and J. Dorronsoro, Group fused lasso, in International Conference on Artificial Neural Networks, 2013, pp. 66-73. · Zbl 1341.68302
[2] E. D. Andersen and K. D. Andersen, The MOSEK interior point optimizer for linear programming: An implementation of the homogeneous algorithm, in High Performance Optimization, Springer, 2000, pp. 197-232. · Zbl 1028.90022
[3] R. Angst, C. Zach, and M. Pollefeys, The generalized trace-norm and its application to structure-from-motion problems, in International Conference on Computer Vision, IEEE, 2011, pp. 2502-2509.
[4] A. Beck and S. Shtern, Linearly convergent away-step conditional gradient for non-strongly convex functions, Math. Program., 164 (2017), pp. 1-27. · Zbl 1370.90010
[5] I. M. Bomze, F. Rinaldi, and S. R. Bulò, First-order methods for the impatient: Support identification in finite time with convergent Frank-Wolfe variants, SIAM J. Optim., 29 (2019), pp. 2211-2226, https://doi.org/10.1137/18M1206953. · Zbl 1461.65132
[6] I. M. Bomze, F. Rinaldi, and D. Zeffiro, Active set complexity of the away-step Frank-Wolfe algorithm, SIAM J. Optim., 30 (2020), pp. 2470-2500, https://doi.org/10.1137/19M1309419. · Zbl 1462.65063
[7] J. F. Bonnans and A. Shapiro, Optimization problems with perturbations: A guided tour, SIAM Rev., 40 (1998), pp. 228-264, https://doi.org/10.1137/S0036144596302644. · Zbl 0915.49021
[8] E. J. Candès and B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., 9 (2009), pp. 717-772. · Zbl 1219.90124
[9] K. Chiang, C. Hsieh, and I. Dhillon, Matrix completion with noisy side information, in Advances in Neural Information Processing Systems, Curran Associates, 2015, pp. 3447-3455.
[10] K. L. Clarkson, Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm, ACM Trans. Algorithms, 6 (2010), pp. 1-30. · Zbl 1300.90026
[11] V. Demyanov and A. Rubinov, Approximate Methods in Optimization Problems, Elsevier, 1970. · Zbl 0217.46203
[12] S. Diamond and S. Boyd, CVXPY: A Python-embedded modeling language for convex optimization, J. Mach. Learn. Res., 17 (2016), pp. 2909-2913. · Zbl 1360.90008
[13] J. C. Dunn and S. Harshbarger, Conditional gradient algorithms with open loop step size rules, J. Math. Anal. Appl., 62 (1978), pp. 432-444. · Zbl 0374.49017
[14] A. Eftekhari, D. Yang, and M. B. Wakin, Weighted matrix completion and recovery with prior subspace information, IEEE Trans. Inform. Theory, 64 (2018), pp. 4044-4071. · Zbl 1395.15020
[15] W. Fithian and R. Mazumder, Flexible low-rank statistical modeling with missing data and side information, Statist. Sci., 33 (2018), pp. 238-260. · Zbl 1397.62180
[16] M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Res. Logist., 3 (1956), pp. 95-110.
[17] R. M. Freund and P. Grigas, New analysis and results for the Frank-Wolfe method, Math. Program., 155 (2016), pp. 199-230. · Zbl 1342.90101
[18] R. M. Freund, P. Grigas, and R. Mazumder, An extended Frank-Wolfe method with “in-face” directions, and its application to low-rank matrix completion, SIAM J. Optim., 27 (2017), pp. 319-346, https://doi.org/10.1137/15M104726X. · Zbl 1357.90115
[19] D. Garber and E. Hazan, A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization, SIAM J. Optim., 26 (2016), pp. 1493-1528, https://doi.org/10.1137/140985366. · Zbl 1342.65142
[20] G. Golub and C. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press, Baltimore, MD, 1996. · Zbl 0865.65009
[21] M. L. Gonçalves, J. G. Melo, and R. D. C. Monteiro, Projection-free accelerated method for convex optimization, Optim. Methods Softw., 37 (2022), pp. 214-240. · Zbl 1501.90067
[22] J. Guélat and P. Marcotte, Some comments on Wolfe’s “away step\em,” Math. Program., 35 (1986), pp. 110-119. · Zbl 0592.90074
[23] D. H. Gutman and J. F. Pena, The Condition of a Function Relative to a Polytope, preprint, https://arxiv.org/abs/1802.00271, 2018.
[24] D. H. Gutman and J. F. Pena, The condition number of a function relative to a set, Math. Program., 188 (2021), pp. 255-294. · Zbl 1470.90077
[25] Z. Harchaoui, A. Juditsky, and A. Nemirovski, Conditional gradient algorithms for norm-regularized smooth convex optimization, Math. Program., 152 (2015), pp. 75-112. · Zbl 1336.90069
[26] E. Hazan, Sparse approximate solutions to semidefinite programs, in Latin American Symposium on Theoretical Informatics, Springer, 2008, pp. 306-316. · Zbl 1136.90430
[27] M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, in Proceedings of the 30th International Conference on Machine Learning, JMLR.org, 2013, pp. 427-435.
[28] N. Johnson, A dynamic programming algorithm for the fused lasso and \(l_0\)-segmentation, J. Comput. Graph. Statist., 22 (2013), pp. 246-260.
[29] S.-J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, \( \ell_1\) trend filtering, SIAM Rev., 51 (2009), pp. 339-360, https://doi.org/10.1137/070690274. · Zbl 1171.37033
[30] S. Lacoste-Julien and M. Jaggi, On the global linear convergence of Frank-Wolfe optimization variants, in Advances in Neural Information Processing Systems, Curran Associates, 2015, pp. 496-504.
[31] G. Lan, The Complexity of Large-Scale Convex Programming under a Linear Optimization Oracle, preprint, https://arxiv.org/abs/1309.5550, 2013.
[32] G. Lan, First-order and Stochastic Optimization Methods for Machine Learning, Springer, Cham, Switzerland, 2020. · Zbl 1442.68003
[33] G. Lan and Y. Zhou, Conditional gradient sliding for convex optimization, SIAM J. Optim., 26 (2016), pp. 1379-1409, https://doi.org/10.1137/140992382. · Zbl 1342.90132
[34] R. Mazumder, T. Hastie, and R. Tibshirani, Spectral regularization algorithms for learning large incomplete matrices, J. Mach. Learn. Res., 11 (2010), pp. 2287-2322. · Zbl 1242.68237
[35] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer Science & Business Media, 2003. · Zbl 1086.90045
[36] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd, Conic optimization via operator splitting and homogeneous self-dual embedding, J. Optim. Theory Appl., 169 (2016), pp. 1042-1068. · Zbl 1342.90136
[37] N. Parikh and S. Boyd, Proximal algorithms, Found. Trends Optim., 1 (2014), pp. 127-239.
[38] K. Pelckmans, J. De Brabanter, J. Suykens, and B. De Moor, Convex clustering shrinkage, in PASCAL Workshop on Statistics and Optimization of Clustering Workshop, 2005.
[39] J. Pena and D. Rodriguez, Polytope conditioning and linear convergence of the Frank-Wolfe algorithm, Math. Oper. Res., 44 (2019), pp. 1-18. · Zbl 1440.90048
[40] J. Pen͂a, D. Rodríguez, and N. Soheili, On the von Neumann and Frank-Wolfe algorithms with away steps, SIAM J. Optim., 26 (2016), pp. 499-512, https://doi.org/10.1137/15M1009937. · Zbl 1382.90071
[41] A. Ramdas and R. J. Tibshirani, Fast and flexible ADMM algorithms for trend filtering, J. Comput. Graph. Statist., 25 (2016), pp. 839-858.
[42] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. · Zbl 0193.18401
[43] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Phys. D, 60 (1992), pp. 259-268. · Zbl 0780.49028
[44] N. Srebro and R. R. Salakhutdinov, Collaborative filtering in a non-uniform world: Learning with the weighted trace norm, in Advances in Neural Information Processing Systems, Curran Associates, 2010, pp. 2056-2064.
[45] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, Sparsity and smoothness via the fused lasso, J. Roy. Statist. Soc. Ser. B, 67 (2005), pp. 91-108. · Zbl 1060.62049
[46] R. Tibshirani and J. Taylor, The solution path of the generalized lasso, Ann. Statist., 39 (2011), pp. 1335-1371. · Zbl 1234.62107
[47] R. J. Tibshirani, Adaptive piecewise polynomial estimation via trend filtering, Ann. Statist., 42 (2014), pp. 285-323. · Zbl 1307.62118
[48] B. Wahlberg, S. Boyd, M. Annergren, and Y. Wang, An ADMM algorithm for a class of total variation regularized estimation problems, IFAC Proc. Vol., 45 (2012), pp. 83-88.
[49] H. Wang, H. Lu, and R. Mazumder, Frank-Wolfe Methods with an Unbounded Feasible Region and Applications to Structured Learning, preprint, https://arxiv.org/abs/2012.15361, 2020. · Zbl 1508.90062
[50] P. Wolfe, Convergence theory in nonlinear programming, in Integer and Nonlinear Programming, North-Holland, Amsterdam, 1970, pp. 1-36. · Zbl 0336.90045
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.