Article Contents

2017, Volume 4, Issue 2: 125-148. Doi: 10.3934/jdg.2017008

This issue Previous Article Dynamic modeling of nontargeted and targeted advertising strategies in an oligopoly Next Article Nash and social welfare impact in an international trade model

A continuous-time approach to online optimization

Joon Kwon^1, and
Panayotis Mertikopoulos^2,

1.
Centre de mathématiques appliquées, École polytechnique, Université Paris-Saclay, Palaiseau, France
2.
CNRS (French National Center for Scientific Research), Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG, F-38000, Grenoble, France

Received: January 2017

Revised: February 2017

Published: May 2017

Abstract / Introduction Full Text(HTML) Figure(2) / Table(1) Related Papers Cited by

Abstract

We consider a family of mirror descent strategies for online optimization in continuous-time and we show that they lead to no regret. From a more traditional, discrete-time viewpoint, this continuous-time approach allows us to derive the no-regret properties of a large class of discrete-time algorithms including as special cases the exponential weights algorithm, online mirror descent, smooth fictitious play and vanishingly smooth fictitious play. In so doing, we obtain a unified view of many classical regret bounds, and we show that they can be decomposed into a term stemming from continuous-time considerations and a term which measures the disparity between discrete and continuous time. This generalizes the continuous-time based analysis of the exponential weights algorithm from [29]. As a result, we obtain a general class of infinite horizon learning strategies that guarantee an $\mathcal{O}(n^{-1/2})$ regret bound without having to resort to a doubling trick.

Keywords:

Mathematics Subject Classification: Primary: 68Q32, 68T05, 91A26; Secondary: 90C25.

Citation:

Full Text(HTML)

Figure 1. Graphical illustration of the greedy (dashed) and lazy (solid) branches of the projected subgradient (PSG) method

Download: Full-size image PowerPoint slide

Figure 2. Greedy and Lazy Mirror Descent with γ_n = 1

Download: Full-size image PowerPoint slide

Table 1. Summary of the algorithms discussed in Section 6.The suffix "L" indicates a "lazy" variant; the INPUT column stands for the stream of payoff vectors which is used as input for the algorithm and the NORM column specifies the norm of the ambient space; finally, $\xi_{n}$ represents a zero-mean stochastic process with values in $\mathbb{R}^{d}$

ALGORITHM	$\mathcal{C}$	$h(x)$	$\eta _{n}$	INPUT	NORM
EW	$\Delta_d$	$\sum_{i} x_{i}\log x_{i}$	CONSTANT	$v_{n}$	$\ell^{1}$
EW'	$\Delta_{d}$	$\sum_{i} x_{i}\log x_{i}$	$\eta /\sqrt{n}$	$v_{n}$	$\ell^{1}$
SFP	$\Delta_d$	ANY	$\eta /n$	$v_{n}$	$\ell^{1}$
VSFP	$\Delta_d$	ANY	$\eta n^{-\alpha} \;\;(0<\alpha<1)$	$v_{n}$	$\ell^{1}$
OGD-L	ANY	$\frac{1}{2} \left\\| x \right\\|_2^2 $	CONSTANT	$-\nabla f_{n}(x_{n})$	$\ell^{2}$
OMD-L	ANY	ANY	CONSTANT	$-\nabla f_{n}(x_{n})$	ANY
PSG-L	ANY	$\frac{1}{2} \left\\| x \right\\|_2^2 $	$1$	$-\gamma_{n}\nabla f(x_{n})$	$\ell^{2}$
MD-L	ANY	ANY	$1$	$-\gamma_{n} \nabla f(x_{n})$	ANY
RSA-L	ANY	ANY	$1$	$-\gamma_{n} (\nabla f(x_{n})+\xi_n)$	ANY
SPSG-L	ANY	$\frac{1}{2} \left\\| x \right\\|_2^2 $	$1$	$-\gamma_{n}(\nabla f(x_{n})+\xi_n)$	$\ell^{2}$

| Show Table

DownLoad: CSV

Related Papers

Cited by

References

[1]	P. Auer, N. Cesa-Bianchi and C. Gentile, Adaptive and self-confident on-line learning algorithms, Journal of Computer and System Sciences, 64 (2002), 48-75. doi: 10.1006/jcss.2001.1795.
[2]	A. Beck and M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, 31 (2003), 167-175. doi: 10.1016/S0167-6377(02)00231-6.
[3]	M. Benaïm and M. Faure, Consistency of vanishingly smooth fictitious play, Mathematics of Operations Research, 38 (2013), 437-450. doi: 10.1287/moor.1120.0568.
[4]	M. Benaïm, J. Hofbauer and S. Sorin, Stochastic approximations and differential inclusions, part Ⅱ: Applications, Mathematics of Operations Research, 31 (2006), 673-695. doi: 10.1287/moor.1060.0213.
[5]	L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, 7 (1967), 200-217.
[6]	S. Bubeck and N. Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and trends in machine learning, 5 (2012), 1-122. doi: 10.1561/2200000024.
[7]	S. Bubeck, Introduction to online optimization, Lecture Notes, 2011.
[8]	N. Cesa-Bianchi, Analysis of two gradient-based algorithms for on-line regression, in COLT '97: Proceedings of the 10th Annual Conference on Computational Learning Theory, 1997, 163–170. doi: 10.1145/267460.267492.
[9]	N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire and M. K. Warmuth, How to use expert advice, Journal of the ACM, 44 (1997), 427-485. doi: 10.1145/258128.258179.
[10]	N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games, Cambridge University Press, 2006. doi: 10.1017/CBO9780511546921.
[11]	D. Fudenberg and D. K. Levine, Consistency and cautious fictitious play, Journal of Economic Dynamics and Control, 19 (1995), 1065-1089. doi: 10.1016/0165-1889(94)00819-4.
[12]	D. Fudenberg and D. K. Levine, The Theory of Learning in Games, vol. 2 of Economic learning and social evolution, MIT Press, Cambridge, MA, 1998.
[13]	D. Fudenberg and D. K. Levine, Conditional universal consistency, Games and Economic Behavior, 29 (1999), 104-130. doi: 10.1006/game.1998.0705.
[14]	J. Hannan, Approximation to Bayes risk in repeated play, in Contributions to the Theory of Games, Volume Ⅲ (eds. M. Dresher, A. W. Tucker and P. Wolfe), vol. 39 of Annals of Mathematics Studies, Princeton University Press, Princeton, NJ, 1957, 97–139.
[15]	E. Hazan, A survey: The convex optimization approach to regret minimization, in Optimization for Machine Learning (eds. S. N. Suvrit Spa and S. J. Wright), MIT Press, 2012, 287–304.
[16]	J. Hofbauer and W. H. Sandholm, On the global convergence of stochastic fictitious play, Econometrica, 70 (2002), 2265-2294. doi: 10.1111/1468-0262.00376.
[17]	S. M. Kakade, S. Shalev-Shwartz and A. Tewari, Regularization techniques for learning with matrices, The Journal of Machine Learning Research, 13 (2012), 1865-1890.
[18]	J. Kivinen and M. K. Warmuth, Exponentiated gradient versus gradient descent for linear predictors, Information and Computation, 132 (1997), 1-63. doi: 10.1006/inco.1996.2612.
[19]	N. Littlestone and M. K. Warmuth, The weighted majority algorithm, Information and Computation, 108 (1994), 212-261. doi: 10.1006/inco.1994.1009.
[20]	S. Mannor and V. Perchet, Approchability, fast and slow, Journal of Machine Learning Research: Workshop and Conference Proceedings, 30 (2013), 1-16.
[21]	A. Mas-Colell, M. D. Whinston and J. R. Green, Microeconomic Theory, Oxford University Press, New York, NY, USA, 1995.
[22]	A. S. Nemirovski, A. Juditsky, G. G. Lan and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 19 (2009), 1574-1609. doi: 10.1137/070704277.
[23]	A. S. Nemirovski and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, New York, NY, 1983.
[24]	Y. Nesterov, Primal-dual subgradient methods for convex problems, Mathematical Programming, 120 (2009), 221-259. doi: 10.1007/s10107-007-0149-x.
[25]	R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970.
[26]	R. T. Rockafellar and R. J. B. Wets, Variational Analysis, vol. 317 of A Series of Comprehensive Studies in Mathematics, Springer-Verlag, Berlin, 1998. doi: 10.1007/978-3-642-02431-3.
[27]	S. Shalev-Shwartz, Online Learning: Theory, Algorithms, and Applications, PhD thesis, Hebrew University of Jerusalem, 2007. doi: 10.1017/CBO9781107298019.022.
[28]	S. Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning, 4 (2011), 107-194. doi: 10.1561/2200000018.
[29]	S. Sorin, Exponential weight algorithm in continuous time, Mathematical Programming, 116 (2009), 513-528. doi: 10.1007/s10107-007-0111-y.
[30]	V. G. Vovk, Aggregating strategies, in COLT '90: Proceedings of the Third Workshop on Computational Learning Theory, 1990,371–383. doi: 10.1016/B978-1-55860-146-8.50032-1.
[31]	V. G. Vovk, A game of prediction with expert advice, in COLT '95: Proceedings of the 8th Annual Conference on Computational Learning Theory, 1995, 51–60. doi: 10.1145/225298.225304.
[32]	M. K. Warmuth and A. K. Jagota, Continuous and discrete-time nonlinear gradient descent: Relative loss bounds and convergence, in Electronic Proceedings of the 5th International Symposium on Artificial Intelligence and Mathematics, 1997.
[33]	M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, in ICML '03: Proceedings of the 20th International Conference on Machine Learning, 2003.