×

A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization. (English) Zbl 1342.65142

Summary: Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes, and path polytopes are examples of problems for which we have simple and efficient combinatorial algorithms but whose nonlinear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online, and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve on the previous state of the art. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala and of Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for nonsmooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods.

MSC:

65K05 Numerical mathematical programming methods
90C05 Linear programming
90C06 Large-scale problems in mathematical programming
90C25 Convex programming
90C30 Nonlinear programming
90C27 Combinatorial optimization
90C15 Stochastic programming

References:

[1] S. D. Ahipasaoglu, P. Sun, and M. J. Todd, {\it Linear convergence of a modified Frank-Wolfe algorithm for computing minimum-volume enclosing ellipsoids}, Optim. Methods Softw., 23 (2008), pp. 5-19. · Zbl 1146.90047
[2] F. Bach, S. Lacoste-Julien, and G. Obozinski, {\it On the equivalence between herding and conditional gradient algorithms}, in Proceedings of the 29th International Conference on Machine Learning (ICML), 2012, pp. 1359-1366.
[3] A. Beck and S. Shtern, {\it Linearly Convergent Away-Step Conditional Gradient for Non-strongly Convex Functions}, preprint, arXiv:1504.05002, 2015. · Zbl 1370.90010
[4] A. Beck and M. Teboulle, {\it A conditional gradient method with linear rate of convergence for solving convex linear systems}, Math. Methods Oper. Res., 59 (2004), pp. 235-247. · Zbl 1138.90440
[5] N. Cesa-Bianchi, A. Conconi, and C. Gentile, {\it On the generalization ability of on-line learning algorithms}, IEEE Trans. Inform. Theory, 50 (2004), pp. 2050-2057. · Zbl 1295.68182
[6] N. Cesa-Bianchi and G. Lugosi, {\it Prediction, Learning, and Games}, Cambridge University Press, New York, 2006. · Zbl 1114.91001
[7] K. L. Clarkson, {\it Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm}, in Proceedings of the Symposium on Discrete Algorithms (SODA), 2008, pp. 922-931. · Zbl 1192.90142
[8] M. Dudík, Z. Harchaoui, and J. Malick, {\it Lifted coordinate descent for learning with trace-norm regularization.}, in Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, AISTATS, 2012, pp. 327-336.
[9] J. Dunn, {\it Convergence rates for conditional gradient sequences generated by implicit step length rules}, SIAM J. Control Optim., 18 (1980), pp. 473-487. · Zbl 0457.65048
[10] A. Flaxman, A. T. Kalai, and H. B. McMahan, {\it Online convex optimization in the bandit setting: Gradient descent without a gradient}, in Proceedings of the Symposium on Discrete Algorithms (SODA), 2005, pp. 385-394. · Zbl 1297.90117
[11] M. Frank and P. Wolfe, {\it An algorithm for quadratic programming}, Naval Res. Logist. Quart., 3 (1956), pp. 149-154.
[12] D. Garber and E. Hazan, {\it A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization}, CoRR abs/1301.4666, 2013. · Zbl 1342.65142
[13] G. H. Golub and C. F. Van Loan, {\it Matrix Computations,} 3rd ed., Johns Hopkins University Press, Baltimore, MD, 1996. · Zbl 0865.65009
[14] J. GuéLat and P. Marcotte, {\it Some comments on Wolfe’s “away step” }, Math. Program., 35 (1986). · Zbl 0592.90074
[15] Z. Harchaoui, M. Douze, M. Paulin, M. Dudík, and J. Malick, {\it Large-scale image classification with trace-norm regularization}, in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3386-3393.
[16] Z. Harchaoui, A. Juditsky, and A. Nemirovski, {\it Conditional gradient algorithms for norm-regularized smooth convex optimization}, Math. Program., 152 (2015), pp. 75-112. · Zbl 1336.90069
[17] E. Hazan, {\it Sparse approximate solutions to semidefinite programs}, in Proceedings of the Latin American Theoretical Informatics Symposium (LATIN), 2008, pp. 306-316. · Zbl 1136.90430
[18] E. Hazan, {\it A survey: The convex optimization approach to regret minimization}, in Optimization for Machine Learning, MIT Press, Cambridge, MA, 2011, pp. 287-302.
[19] E. Hazan, A. Agarwal, and S. Kale, {\it Logarithmic regret algorithms for online convex optimization}, Machine Learning, 69 (2007), pp. 169-192. · Zbl 1143.90371
[20] E. Hazan and S. Kale, {\it Beyond the regret minimization barrier: An optimal algorithm for stochastic strongly-convex optimization}, J. Mach. Learn. Res. Proc. Track, 19 (2011), pp. 421-436.
[21] E. Hazan and S. Kale, {\it Projection-free online learning}, in Proceedings of the 29th International Conference on Machine Learning (ICML), 2012. · Zbl 1433.68347
[22] M. Jaggi, {\it Revisiting Frank-Wolfe: Projection-free sparse convex optimization}, in Proceedings of the International Conference on Machine Learning (ICML), 2013.
[23] M. Jaggi and M. Sulovský, {\it A simple algorithm for nuclear norm regularized problems}, in Proceedings of the International Conference on Machine Learning (ICML), 2010, pp. 471-478.
[24] A. T. Kalai and S. Vempala, {\it Efficient algorithms for online decision problems}, J. Comput. Systems Sci., 71 (2005), pp. 291-307. · Zbl 1094.68112
[25] W. M. Koolen, M. K. Warmuth, and J. Kivinen, {\it Hedging structured concepts}, in Proceedings of the Conference on Learning Theory (COLT), 2010, pp. 93-105.
[26] S. Lacoste-Julien and M. Jaggi, {\it An affine invariant linear convergence analysis for Frank-Wolfe algorithms}, in Proceedings of the NIPS Workshop on Greedy Algorithms, Frank-Wolfe and Friends, (2013).
[27] S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher, {\it Block-coordinate Frank-Wolfe optimization for structural svms}, in Proceedings of the International Conference on Machine Learning (ICML), 2013.
[28] G. Lan, {\it The Complexity of Large-Scale Convex Programming Under a Linear Optimization Oracle}, CoRR abs/1309.5550, 2013.
[29] S. Laue, {\it A hybrid algorithm for convex semidefinite optimization}, in Proceedings of the International Conference on Machine Learning (ICML), 2012.
[30] A. Migdalas, {\it A regularization of the Frank-Wolfe method and unification of certain nonlinear programming methods}, Math. Program., 65 (1994), pp. 331-345. · Zbl 0834.90124
[31] A. S. Nemirovski and D. B. Yudin, {\it Problem Complexity and Method Efficiency in Optimization}, John Wiley, New York, 1983. · Zbl 0501.90062
[32] Y. Nesterov, {\it Gradient Methods for Minimizing Composite Objective Function}, Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, 2007.
[33] Y. Nesterov and A. Nemirovskii, {\it Interior Point Polynomial Methods in Convex Programming: Theory and Applications}, SIAM, Philadelphia, 1994. · Zbl 0824.90112
[34] A. Schrijver, {\it Combinatorial Optimization–Polyhedra and Efficiency}, Springer, New York, 2003. · Zbl 1041.90001
[35] S. Shalev-Shwartz, {\it Online learning and online convex optimization}, Found. Trends Machine Learning, 4 (2012), pp. 107-194. · Zbl 1253.68190
[36] S. Shalev-Shwartz, A. Gonen, and O. Shamir, {\it Large-scale convex minimization with a low-rank constraint}, in Proceedings of the International Conference on Machine Learning (ICML), 2011, pp. 329-336.
[37] N. Z. Shor, K. C. Kiwiel, and A. Ruszcayǹski, {\it Minimization Methods for Non-differentiable Functions}, Springer-Verlag, New York, 1985. · Zbl 0561.90058
[38] G. Wahba, {\it A least squares estimate of satellite attitude}, SIAM Rev., 7 (1965), p.409.
[39] P. Wolfe, {\it Convergence theory in nonlinear programming}, in Integer and Nonlinear Programming, J.Abadie, ed., North-Holland, Amsterdam, (1970). · Zbl 0336.90045
[40] M. Zinkevich, {\it Online convex programming and generalized infinitesimal gradient ascent}, in Proceedings of the International Conference on Machine Learning (ICML), 2003, pp. 928-936.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.