Document Zbl 1359.90095

A continuous-time approach to online optimization. (English) Zbl 1359.90095

J. Dyn. Games 4, No. 2, 125-148 (2017).

Summary: We consider a family of mirror descent strategies for online optimization in continuous-time and we show that they lead to no regret. From a more traditional, discrete-time viewpoint, this continuous-time approach allows us to derive the no-regret properties of a large class of discrete-time algorithms including as special cases the exponential weights algorithm, online mirror descent, smooth fictitious play and vanishingly smooth fictitious play. In so doing, we obtain a unified view of many classical regret bounds, and we show that they can be decomposed into a term stemming from continuous-time considerations and a term which measures the disparity between discrete and continuous time. This generalizes the continuous-time based analysis of the exponential weights algorithm from [S. Sorin, Math. Program. 116, No. 1–2 (B), 513–528 (2009; Zbl 1181.68319)]. As a result, we obtain a general class of infinite horizon learning strategies that guarantee an \(\mathcal{O}(n^{-1/2})\) regret bound without having to resort to a doubling trick.

Cited in 11 Documents

MSC:

90C25

Convex programming

Keywords:

online optimization; regret minimization; mirror descent; gradient descent; continuous time; convex optimization

Citations:

Zbl 1181.68319

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	P. Auer, Adaptive and self-confident on-line learning algorithms,, Journal of Computer and System Sciences, 64, 48 (2002) · Zbl 1006.68162 · doi:10.1006/jcss.2001.1795
[2]	A. Beck, Mirror descent and nonlinear projected subgradient methods for convex optimization,, Operations Research Letters, 31, 167 (2003) · Zbl 1046.90057 · doi:10.1016/S0167-6377(02)00231-6
[3]	M. Benaïm, Consistency of vanishingly smooth fictitious play,, Mathematics of Operations Research, 38, 437 (2013) · Zbl 1297.91027 · doi:10.1287/moor.1120.0568
[4]	M. Benaïm, Stochastic approximations and differential inclusions, part II: Applications,, Mathematics of Operations Research, 31, 673 (2006) · Zbl 1284.62511 · doi:10.1287/moor.1060.0213
[5]	L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,, USSR Computational Mathematics and Mathematical Physics, 7, 200 (1967) · Zbl 0186.23807
[6]	S. Bubeck, Regret analysis of stochastic and nonstochastic multi-armed bandit problems,, Foundations and trends in machine learning, 5, 1 (2012) · Zbl 1281.91051 · doi:10.1561/2200000024
[7]	S. Bubeck, Introduction to online optimization,, Lecture Notes (2011)
[8]	N. Cesa-Bianchi, Analysis of two gradient-based algorithms for on-line regression,, in COLT ’97: Proceedings of the 10th Annual Conference on Computational Learning Theory, 163 (1997) · Zbl 0961.68148 · doi:10.1145/267460.267492
[9]	N. Cesa-Bianchi, How to use expert advice,, Journal of the ACM, 44, 427 (1997) · Zbl 0890.68066 · doi:10.1145/258128.258179
[10]	N. Cesa-Bianchi, <em>Prediction, Learning, and Games</em>,, Cambridge University Press (2006) · Zbl 1114.91001 · doi:10.1017/CBO9780511546921
[11]	D. Fudenberg, Consistency and cautious fictitious play,, Journal of Economic Dynamics and Control, 19, 1065 (1995) · Zbl 0900.90423 · doi:10.1016/0165-1889(94)00819-4
[12]	D. Fudenberg, <em>The Theory of Learning in Games</em>, vol. 2 of Economic learning and social evolution,, MIT Press (1998) · Zbl 0939.91004
[13]	D. Fudenberg, Conditional universal consistency,, Games and Economic Behavior, 29, 104 (1999) · Zbl 1131.91309 · doi:10.1006/game.1998.0705
[14]	J. Hannan, Approximation to {Bayes} risk in repeated play,, in Contributions to the Theory of Games, 97 (1957) · Zbl 0078.32804
[15]	E. Hazan, A survey: The convex optimization approach to regret minimization,, in Optimization for Machine Learning (eds. S. N. Suvrit Spa and S. J. Wright), 287 (2012)
[16]	J. Hofbauer, On the global convergence of stochastic fictitious play,, Econometrica, 70, 2265 (2002) · Zbl 1141.91336 · doi:10.1111/1468-0262.00376
[17]	S. M. Kakade, Regularization techniques for learning with matrices,, The Journal of Machine Learning Research, 13, 1865 (2012) · Zbl 1432.68388
[18]	J. Kivinen, Exponentiated gradient versus gradient descent for linear predictors,, Information and Computation, 132, 1 (1997) · Zbl 0872.68158 · doi:10.1006/inco.1996.2612
[19]	N. Littlestone, The weighted majority algorithm,, Information and Computation, 108, 212 (1994) · Zbl 0804.68121 · doi:10.1006/inco.1994.1009
[20]	S. Mannor, Approchability, fast and slow,, Journal of Machine Learning Research: Workshop and Conference Proceedings, 30, 1 (2013)
[21]	A. Mas-Colell, <em>Microeconomic Theory</em>,, Oxford University Press (1995) · Zbl 1256.91002
[22]	A. S. Nemirovski, Robust stochastic approximation approach to stochastic programming,, SIAM Journal on Optimization, 19, 1574 (2009) · Zbl 1189.90109 · doi:10.1137/070704277
[23]	A. S. Nemirovski, <em>Problem Complexity and Method Efficiency in Optimization</em>,, Wiley (1983) · Zbl 0501.90062
[24]	Y. Nesterov, Primal-dual subgradient methods for convex problems,, Mathematical Programming, 120, 221 (2009) · Zbl 1191.90038 · doi:10.1007/s10107-007-0149-x
[25]	R. T. Rockafellar, <em>Convex Analysis</em>,, Princeton University Press (1970) · Zbl 0193.18401
[26]	R. T. Rockafellar, <em>Variational Analysis</em>, vol. 317 of A Series of Comprehensive Studies in Mathematics,, Springer-Verlag (1998) · Zbl 0888.49001 · doi:10.1007/978-3-642-02431-3
[27]	S. Shalev-Shwartz, <em>Online Learning: Theory, Algorithms, and Applications</em>,, PhD thesis (2007) · Zbl 1203.68168 · doi:10.1017/CBO9781107298019.022
[28]	S. Shalev-Shwartz, Online learning and online convex optimization,, Foundations and Trends in Machine Learning, 4, 107 (2011) · Zbl 1253.68190 · doi:10.1561/2200000018
[29]	S. Sorin, Exponential weight algorithm in continuous time,, Mathematical Programming, 116, 513 (2009) · Zbl 1181.68319 · doi:10.1007/s10107-007-0111-y
[30]	V. G. Vovk, Aggregating strategies,, in COLT ’90: Proceedings of the Third Workshop on Computational Learning Theory, 371 (1990) · doi:10.1016/B978-1-55860-146-8.50032-1
[31]	V. G. Vovk, A game of prediction with expert advice,, in COLT ’95: Proceedings of the 8th Annual Conference on Computational Learning Theory, 51 (1995) · Zbl 0945.68528 · doi:10.1145/225298.225304
[32]	M. K. Warmuth, Continuous and discrete-time nonlinear gradient descent: Relative loss bounds and convergence,, in Electronic Proceedings of the 5th International Symposium on Artificial Intelligence and Mathematics (1997)
[33]	M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent,, in ICML ’03: Proceedings of the 20th International Conference on Machine Learning (2003)

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.