×

On dissipative symplectic integration with applications to gradient-based optimization. (English) Zbl 1489.82010

Summary: Recently, continuous-time dynamical systems have proved useful in providing conceptual and quantitative insights into gradient-based optimization, widely used in modern machine learning and statistics. An important question that arises in this line of work is how to discretize the system in such a way that its stability and rates of convergence are preserved. In this paper we propose a geometric framework in which such discretizations can be realized systematically, enabling the derivation of ‘rate-matching’ algorithms without the need for a discrete convergence analysis. More specifically, we show that a generalization of symplectic integrators to non-conservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error. Moreover, such methods preserve a shadow Hamiltonian despite the absence of a conservation law, extending key results of symplectic integrators to non-conservative cases. Our arguments rely on a combination of backward error analysis with fundamental results from symplectic geometry. We stress that although the original motivation for this work was the application to optimization, where dissipative systems play a natural role, they are fully general and not only provide a differential geometric framework for dissipative Hamiltonian systems but also substantially extend the theory of structure-preserving integration.

MSC:

82B20 Lattice systems (Ising, dimer, Potts, etc.) and systems on graphs arising in equilibrium statistical mechanics
82D40 Statistical mechanics of magnetic materials
82M31 Monte Carlo methods applied to problems in statistical mechanics
65D30 Numerical integration
65P10 Numerical methods for Hamiltonian systems including symplectic integrators
37M15 Discretization methods and integrators (symplectic, variational, geometric, etc.) for dynamical systems
90C52 Methods of reduced gradient type

References:

[1] Su W, Boyd S and Candès E J 2016 J. Mach. Learn. Res.17 1-43 · Zbl 1391.90667
[2] Wibisono A, Wilson A C and Jordan M I 2016 Proc. Natl Acad. Sci.113 E7351-8 · Zbl 1404.90098 · doi:10.1073/pnas.1614734113
[3] França G, Robinson D P and Vidal R 2018 A nonsmooth dynamical systems perspective on accelerated extensions of ADMM (arXiv:1808.04048)
[4] França G, Robinson D P and Vidal R 2018 ADMM and accelerated ADMM as continuous dynamical systems ICML Proc. 35th Int. Conf. on Machine Learning pp 1559-67
[5] França G, Sulam J, Robinson D P and Vidal R 2020 J. Stat. Mech. 124008 · Zbl 1539.37089 · doi:10.1088/1742-5468/abcaee
[6] Krichene W, Bayen A and Bartlett P L 2015 Accelerated mirror descent in continuous and discrete time Advances in Neural Information Processing Systems ((New York:: Curran Associates)) vol 28
[7] Wilson A C, Recht B and Jordan M I 2018 A Lyapunov analysis of momentum methods in optimization (arXiv:1611.02635)
[8] Scieur D, Roulet V, Bach F and d’Aspremont A 2017 Integration methods and optimization algorithms Advances in Neural Information Processing Systems ((New York:: Curran Associates)) vol 30
[9] Betancourt M, Jordan M I and Wilson A C 2018 On symplectic optimization (arXiv:1802.03653)
[10] Zhang J, Mokhtari A, Sra S and Jadbabaie A 2018 Direct Runge-Kutta discretization achieves acceleration Advances in Neural Information Processing Systems ((New York:: Curran Associates)) vol 30
[11] Shi B, Du S S, Su W J and Jordan M I 2019 Acceleration via symplectic discretization of high-resolution differential equations Advances in Neural Information Processing Systems ((New York:: Curran Associates)) vol 32
[12] Muehlebach M and Jordan M I 2019 A dynamical systems perspective on Nesterov acceleration Proc. 36th Int. Conf. on Machine Learning 4656-62
[13] Diakonikolas J and Jordan M I 2020 Generalized momentum-based methods: a Hamiltonian perspective (arXiv:1906.00436)
[14] Nesterov Y 1983 Sov. Math. - Dokl.27 372-6 · Zbl 0535.90071
[15] Polyak B T 1964 Some methods of speeding up the convergence of iteration methods USSR Comput. Math. Math. Phys.4 1-17 · Zbl 0147.35301 · doi:10.1016/0041-5553(64)90137-5
[16] Berndt R 2001 An Introduction to Symplectic Geometry (Providence, RI: American Mathematical Society) · Zbl 0986.53028
[17] Aebischer B, Borer M, Kalin M and Leuenberger C 1994 Symplectic Geometry (Berlin: Springer) · Zbl 0932.53002 · doi:10.1007/978-3-0348-7512-7
[18] Benettin G and Giorgilli A 1994 J. Stat. Phys.74 1117-43 · Zbl 0842.58020 · doi:10.1007/bf02188219
[19] Reich S 1999 SIAM J. Numer. Anal.36 1549-70 · Zbl 0935.65142 · doi:10.1137/s0036142997329797
[20] Hairer E 1994 Ann. Numer. Math.1 107-32 · Zbl 0828.65097
[21] Leimkuhler B and Reich S 2004 Simulating Hamiltonian Dynamics (Cambridge: Cambridge University Press) · Zbl 1069.65139
[22] Hairer E, Lubich C and Wanner G 2006 Geometric Numerical Integration (Berlin: Springer) · Zbl 1094.65125
[23] Sanz-Serna J M 1992 Acta Numer.1 243-86 · Zbl 0762.65043 · doi:10.1017/s0962492900002282
[24] McLachlan R I and Quispel G R W 2006 J. Phys. A: Math. Gen.39 5251-85 · Zbl 1092.65055 · doi:10.1088/0305-4470/39/19/s01
[25] Iserles A and Quispel G R W 2018 Why geometric numerical integration? Discrete Mechanics, Geometric Integration and Lie-Butcher Series (Berlin: Springer) pp 1-28 · doi:10.1007/978-3-030-01397-4_1
[26] Bhatt A, Floyd D and Moore B E 2016 J. Sci. Comput.66 1234-59 · Zbl 1377.65165 · doi:10.1007/s10915-015-0062-z
[27] Moore B E 2017 J. Comput. Appl. Math.323 1-15 · Zbl 1364.65169 · doi:10.1016/j.cam.2017.04.008
[28] Bhatt A and Moore B E 2019 J. Comput. Appl. Math.352 341-51 · Zbl 1410.65298 · doi:10.1016/j.cam.2018.12.003
[29] Shang X and Öttinger H C 2020 Proc. R. Soc. A 476 20190446 · Zbl 1439.82069 · doi:10.1098/rspa.2019.0446
[30] Nussle T, Thibaudeau P and Nicolis S 2019 Eur. Phys. J. B 92 1-10 · Zbl 1515.82170 · doi:10.1140/epjb/e2019-90539-6
[31] Nussle T, Thibaudeau P and Nicolis S 2019 Phys. Rev. B 100 214428 · doi:10.1103/physrevb.100.214428
[32] McLachlan R and Perlmutter M 2001 J. Geom. Phys.39 276-300 · Zbl 1005.53058 · doi:10.1016/s0393-0440(01)00020-1
[33] Caldirola P 1941 Il Nuovo Cimento18 393-400 · doi:10.1007/bf02960144
[34] Kanai E 1948 Prog. Theor. Phys.3 440-2 · doi:10.1143/ptp/3.4.440
[35] Caldeira A O and Leggett A J 1981 Phys. Rev. Lett.46 211-4 · doi:10.1103/physrevlett.46.211
[36] Hairer E and Lubich C 1997 Numer. Math.76 441-62 · Zbl 0874.65061 · doi:10.1007/s002110050271
[37] Sternberg S 1964 Lectures on Differential Geometry (Englewood Cliffs, NJ: Prentice-Hall) · Zbl 0129.13102
[38] Tao M 2016 Phys. Rev. E 94 043303 · doi:10.1103/physreve.94.043303
[39] Torlai G and Melko R G 2016 Phys. Rev. B 94 165134 · doi:10.1103/physrevb.94.165134
[40] Carleo G and Troyer M 2017 Science355 602-6 · Zbl 1404.81313 · doi:10.1126/science.aag2302
[41] Melko R G, Carleo G, Carrasquilla J and Cirac J I 2019 Nat. Phys.15 887-92 · doi:10.1038/s41567-019-0545-1
[42] Morningstar A and Melko R G 2018 J. Mach. Learn. Res.18 1-17 · Zbl 1467.68157
[43] Hinton G E 2002 Neural Comput.14 1771-800 · Zbl 1010.68111 · doi:10.1162/089976602760128018
[44] Arnold V I, Kozlov V V and Neishtadt A I 2006 Mathematical Aspects of Classical and Celestial Mechanics (Berlin: Springer) · Zbl 1105.70002 · doi:10.1007/978-3-540-48926-9
[45] Abraham R and Marsden J E 1978 Foundations of Mechanics (Reading, MA: Addison-Wesley)
[46] Suzuki M 1990 Phys. Lett. A 146 319-23 · doi:10.1016/0375-9601(90)90962-n
[47] Yoshida H 1990 Phys. Lett. A 150 262-8 · doi:10.1016/0375-9601(90)90092-3
[48] McLachlan R I 2002 Numer. Algorithms31 233-46 · Zbl 1016.65044 · doi:10.1023/a:1021195019574
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.