Abstract
In a Hilbert space setting \({{\mathcal {H}}}\), we study the fast convergence properties as \(t \rightarrow + \infty \) of the trajectories of the second-order differential equation
where \(\nabla \Phi \) is the gradient of a convex continuously differentiable function \(\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}, \alpha \) is a positive parameter, and \(g: [t_0, + \infty [ \rightarrow {{\mathcal {H}}}\) is a small perturbation term. In this inertial system, the viscous damping coefficient \(\frac{\alpha }{t}\) vanishes asymptotically, but not too rapidly. For \(\alpha \ge 3\), and \(\int _{t_0}^{+\infty } t \Vert g(t)\Vert dt < + \infty \), just assuming that \({{\mathrm{argmin\,}}}\Phi \ne \emptyset \), we show that any trajectory of the above system satisfies the fast convergence property
Moreover, for \(\alpha > 3\), any trajectory converges weakly to a minimizer of \(\Phi \). The strong convergence is established in various practical situations. These results complement the \({{\mathcal {O}}}(t^{-2})\) rate of convergence for the values obtained by Su, Boyd and Candès in the unperturbed case \(g=0\). Time discretization of this system, and some of its variants, provides new fast converging algorithms, expanding the field of rapid methods for structured convex minimization introduced by Nesterov, and further developed by Beck and Teboulle with FISTA. This study also complements recent advances due to Chambolle and Dossal.
Similar content being viewed by others
Notes
In fact, W decreases strictly, as long as the trajectory is not stationary.
References
Abbas, B., Attouch, H., Svaiter, B.F.: Newton-like dynamics and forward–backward methods for structured monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 161(2), 331–360 (2014)
Alvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 1102–1119 (2000)
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9(1–2), 3–11 (2001)
Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Appl. 81(8), 747–779 (2002)
Alvarez, F., Peypouquet, J.: Asymptotic almost-equivalence of Lipschitz evolution systems in Banach spaces. Nonlinear Anal. 73(9), 3018–3033 (2010)
Alvarez, F., Peypouquet, J.: A unified approach to the asymptotic almost-equivalence of evolution systems without Lipschitz conditions. Nonlinear Anal. 74(11), 3440–3444 (2011)
Attouch, H., Buttazzo, G., Michaille, G.: Variational analysis in Sobolev and BV spaces. Applications to PDEs and optimization, 2nd edn. MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA, xii+793 pp (2014)
Attouch, H., Cabot, A., Redont, P.: The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Adv. Math. Sci. Appl. 12(1), 273–306 (2002)
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J. Optim. 24(1), 232–256 (2014)
Aujol, J.-F., Dossal, C.: Stability of over-relaxations for the forward–backward algorithm, application to FISTA. SIAM J. Optim. Society for Industrial and Applied Mathematics (2015). hal-01163432
Baillon, J.-B.: Un exemple concernant le comportement asymptotique de la solution du problème \(\frac{du}{dt} + \partial \phi (u) \ni 0\). J. Funct. Anal. 28, 369–376 (1978)
Baillon, J.-B., Haddad, G.: Quelques propriétés des opérateurs angle-bornés et n-cycliquement monotones. Isr. J. Math. 26, 137–150 (1977)
Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert spaces. CMS Books in Mathematics, Springer (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland (1972)
Bruck, R.E.: Asymptotic convergence of nonlinear contraction semigroups in Hilbert spaces. J. Funct. Anal. 18, 15–26 (1975)
Cabot, A., Engler, H., Gadat, S.: On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Am. Math. Soc. 361, 5983–6017 (2009)
Cabot, A., Engler, H., Gadat, S.: Second order differential equations with asymptotically small dissipation and piecewise flat potentials. Electron. J. Differ. Equ. 17, 33–38 (2009)
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of Fista, HAL Id: hal-01060130 https://hal.inria.fr/hal-01060130v3. Submitted on 20 Oct 2014
Ghisi, M., Gobbino, M., Haraux, A.: The remarkable effectiveness of time-dependent damping terms for second order evolution equations (2015). arXiv:1506.06915v1 [math.AP]
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization (2015). arXiv:1406.5468v2 [math.OC]
Knopp, K.: Theory and Application of Infinite Series. Blackie & Son, Glasgow (1951)
Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51(2), 1–15 (2015)
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155(2), 447–454 (2003)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Sov. Math. Dokl. 27, 372–376 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston (2004)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Discussion Papers (2007)
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 123–231 (2013)
Peypouquet, J.: Convex Optimization in Normed Spaces: Theory, Methods and Examples. Springer, Berlin (2015)
Peypouquet, J., Sorin, S.: Evolution equations for maximal monotone operators: asymptotic analysis in continuous and discrete time. J. Convex Anal. 17(3–4), 1113–1163 (2010)
Schmidt, M., Le Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: NIPS’11—25th Annual Conference on Neural Information Processing Systems, Grenada, Spain (2011) HAL inria-00618152v3
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27, 2510–2518 (2014)
Villa, S., Salzo, S., Baldassarres, L., Verri, A.: Accelerated and inexact forward–backward. SIAM J. Optim. 23(3), 1607–1633 (2013)
Author information
Authors and Affiliations
Corresponding author
Additional information
Effort sponsored by the Air Force Office of Scientific Research, Air Force Material Command, USAF, under Grant Number FA9550-14-1-0056. Also supported by Fondecyt Grant 1140829, Conicyt Anillo ACT-1106, ECOS-Conicyt Project C13E03, Millenium Nucleus ICM/FIC RC130003, Conicyt Project MATHAMSUD 15MATH-02, Conicyt Redes 140183, and Basal Project CMM Universidad de Chile.
Appendix: Some auxiliary results
Appendix: Some auxiliary results
In this section, we present some auxiliary lemmas to be used later on. The following result can be found in [1]:
Lemma 5.6
Let \(\delta >0, 1\le p<\infty \) and \(1\le r\le \infty \). Suppose \(F\in L^{p}([\delta ,\infty [)\) is a locally absolutely continuous nonnegative function, \(G\in L^{r}([\delta ,\infty [)\) and
for almost every \(t>\delta \). Then \(\lim _{t\rightarrow \infty }F(t)=0\).
To establish the weak convergence of the solutions of (1), we will use Opial’s Lemma [30], that we recall in its continuous form. This argument was first used in [16] to establish the convergence of nonlinear contraction semigroups.
Lemma 5.7
Let S be a nonempty subset of \({{\mathcal {H}}}\) and let \(x:[0,+\infty )\rightarrow {{\mathcal {H}}}\). Assume that
-
(i)
for every \(z\in S, \lim _{t\rightarrow \infty }\Vert x(t)-z\Vert \) exists;
-
(ii)
every weak sequential limit point of x(t), as \(t\rightarrow \infty \), belongs to S.
Then x(t) converges weakly as \(t\rightarrow \infty \) to a point in S.
Its discrete version is
Lemma 5.8
Let S be a non empty subset of \({{\mathcal {H}}}\), and \((x_k)\) a sequence of elements of \({{\mathcal {H}}}\). Assume that
-
(i)
for every \(z\in S, \lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists;
-
(ii)
every weak sequential limit point of \((x_k)\), as \(k\rightarrow \infty \), belongs to S.
Then \(x_k\) converges weakly as \(k\rightarrow \infty \) to a point in S.
The following allows us to establish the existence of a limit for a real-valued function, as \(t\rightarrow +\infty \):
Lemma 5.9
Let \(\delta >0\), and let \(w: [\delta , +\infty [ \rightarrow \mathbb R\) be a continuously differentiable function which is bounded from below. Assume
for some \(\alpha > 1\), almost every \(t>\delta \), and some nonnegative function \(g\in L^1 (\delta , +\infty )\). Then, the positive part \([{{\dot{w}}}]_+\) of \({{\dot{w}}}\) belongs to \(L^1(t_0,+\infty )\) and \(\lim _{t\rightarrow +\infty }w(t)\) exists.
Proof
Multiply (81) by \(t^{\alpha -1}\) to obtain
By integration, we obtain
Hence,
and so,
Applying Fubini’s Theorem, we deduce that
As a consequence,
Finally, the function \(\theta :[\delta ,+\infty )\rightarrow {\mathbb {R}}\), defined by
is nonincreasing and bounded from below. It follows that
exists. \(\square \)
In the study of the corresponding algorithms, we use the following result, which is a discrete version of Lemma 5.9:
Lemma 5.10
Let \(\alpha \ge 3\), and let \((a_k)\) and \((\omega _k)\) be two sequences of nonnegative numbers such that
for all \(k\ge 1\). If \(\sum _k k\omega _{k\in {{\mathbb {N}}}} <+\infty \), then \(\sum _{k \in {{\mathbb {N}}}} a_k < +\infty \).
Proof
Since \(\alpha \ge 3\) we have \(\alpha -1\ge 2\), and hence
Multiplying this expression by \((k+1)^2\), we obtain
Summing this inequality for \(j=1,2,\ldots ,k\), we obtain
Dividing by \(k^2\), and summing for \(k=2,\ldots ,K\), we obtain
Applying Fubini’s Theorem to this last sum, and observing that \(\sum _{k=j+1}^{\infty }\frac{1}{k^2} \le \int _{j}^{\infty }\frac{1}{t^2}dt = \frac{1}{j}\), we obtain
and the result follows. \(\square \)
The following is a continuous version of Kronecker’s Theorem for series (see, for example, [23, page 129]):
Lemma 5.11
Take \(\delta >0\), and let \(f \in L^1 (\delta , +\infty )\) be nonnegative and continuous. Consider a nondecreasing function \(\psi :(\delta ,+\infty )\rightarrow (0,+\infty )\) such that \(\lim \limits _{t\rightarrow +\infty }\psi (t)=+\infty \). Then,
Proof
Given \(\epsilon >0\), fix \(t_\epsilon \) sufficiently large so that
Then, for \(t \ge t_\epsilon \), split the integral \(\int _{\delta }^t \psi (s)f(s) ds\) into two parts to obtain
Now let \(t\rightarrow +\infty \) to deduce that
Since this is true for any \(\epsilon >0\), the result follows. \(\square \)
Using the previous result, we also obtain the following vector-valued version of Lemma 5.9:
Lemma 5.12
Take \(\delta >0\), and let \(F \in L^1 (\delta , +\infty ; {{\mathcal {H}}})\) be continuous. Let \(x:[\delta , +\infty [ \rightarrow {{\mathcal {H}}}\) be a solution of
with \(\alpha >1\). Then, x(t) converges strongly in \({{\mathcal {H}}}\) as \(t \rightarrow + \infty \).
Proof
As in the proof of Lemma 5.9, multiply (82) by \(t^{\alpha -1}\) and integrate to obtain
Integrate again to deduce that
Fubini’s Theorem applied to the last integral gives
Finally, apply Lemma 5.11 to the last integral with \(\psi (s)=s^{\alpha -1}\) and \(f(s)=\Vert F(s)\Vert \) to conclude that all the terms in the right-hand side of (83) have a limit as \(t \rightarrow + \infty \). \(\square \)
Finally, in the analysis of the solutions for the perturbed system, we shall use the following Gronwall-Bellman Lemma (see [15, Lemme A.5]):
Lemma 5.13
Let \(m:[\delta ,T]\rightarrow [0,+\infty [\) be integrable, and let \(c\ge 0\). Suppose \(w:[\delta ,T]\rightarrow {\mathbb {R}}\) is continuous and
for all \(t\in [\delta ,T]\). Then, \(\displaystyle {|w(t)| \le c + \int _{\delta }^t m(\tau ) d\tau }\) for all \(t\in [\delta ,T]\).
We shall also make use of the following discrete version of the preceding result:
Lemma 5.14
Let \((a_k)\) be a sequence of nonnegative numbers such that
for all \(k\in {{\mathbb {N}}}\), where \((\beta _j )\) is a summable sequence of nonnegative numbers, and \(c\ge 0\). Then, \(\displaystyle a_k \le c + \sum \nolimits _{j=1}^{\infty } \beta _j\) for all \(k\in {{\mathbb {N}}}\).
Proof
For \(k\in {{\mathbb {N}}}\), set \(A_k := \max _{1\le m \le k} a_m \). Then, for \(1\le m\le k\), we have
Taking the maximum over \(1\le m\le k\), we obtain
Bounding by the roots of the corresponding quadratic equation, we obtain the result. \(\square \)
Rights and permissions
About this article
Cite this article
Attouch, H., Chbani, Z., Peypouquet, J. et al. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168, 123–175 (2018). https://doi.org/10.1007/s10107-016-0992-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-016-0992-8
Keywords
- Convex optimization
- Fast convergent methods
- Dynamical systems
- Gradient flows
- Inertial dynamics
- Vanishing viscosity
- Nesterov method