×

Deep relaxation of controlled stochastic gradient descent via singular perturbations. (English) Zbl 1542.93276

Summary: We consider a singularly perturbed system of stochastic differential equations proposed by P. Chaudhari et al. [Res. Math. Sci. 5, No. 3, Paper No. 30, 30 p. (2018; Zbl 1427.82032)] to approximate the entropic gradient descent in the optimization of deep neural networks via homogenization. We embed it in a much larger class of two-scale stochastic control problems and rely on convergence results for Hamilton-Jacobi-Bellman equations with unbounded data proved recently by ourselves [ESAIM, Control Optim. Calc. Var. 29, Paper No. 52, 25 p. (2023; Zbl 1526.49014)]. We show that the limit of the value functions is itself the value function of an effective control problem with extended controls and that the trajectories of the perturbed system converge in a suitable sense to the trajectories of the limiting effective control system. These rigorous results improve the understanding of the convergence of the algorithms used by Chaudhari et al. [loc. cit.], as well as of their possible extensions where some tuning parameters are modeled as dynamic controls.

MSC:

93C70 Time-scale analysis and singular perturbations in control/observation systems
93E20 Optimal stochastic control
68T07 Artificial neural networks and deep learning

Software:

Entropy-SGD

References:

[1] Aubin, J.-P. and Frankowska, H., Set-Valued Analysis, Birkhäuser, Boston, 1990. · Zbl 0713.49021
[2] Baldassi, C., Ingrosso, A., Lucibello, C., Saglietti, L., and Zecchina, R., Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses, Phys. Rev. Lett., 115 (2015), 128101.
[3] Bardi, M. and Cesaroni, A., Optimal control with random parameters: A multiscale approach, Eur. J. Control, 17 (2011), pp. 30-45. · Zbl 1248.49038
[4] Bardi, M., Cesaroni, A., and Manca, L., Convergence by viscosity methods in multiscale financial models with stochastic volatility, SIAM J. Financial Math., 1 (2010), pp. 230-265, doi:10.1137/090748147. · Zbl 1189.35020
[5] Bardi, M. and Kouhkouh, H., Singular perturbations in stochastic optimal control with unbounded data, ESAIM Control Optim. Calc. Var., 29 (2023), 52. · Zbl 1526.49014
[6] Bhatia, R., Matrix Analysis, Springer-Verlag, New York, 1997.
[7] Billingsley, P., Probability and Measure, John Wiley & Sons, New York, 2008.
[8] Bogachev, V., Kirillov, A., and Shaposhnikov, S., Invariant measures of diffusions with gradient drifts, Dokl. Math., 82 (2010), pp. 790-793. · Zbl 1222.60058
[9] Bogachev, V. I., Kirillov, A. I., and Shaposhnikov, S. V., The Kantorovich and variation distances between invariant measures of diffusions and nonlinear stationary Fokker-Planck-Kolmogorov equations, Math. Notes, 96 (2014), pp. 855-863. · Zbl 1315.35221
[10] Bogachev, V. and Röckner, M., A generalization of Khasminskii’s theorem on the existence of invariant measures for locally integrable drifts, Theory Probab. Appl., 45 (2000), pp. 363-378, doi:10.1137/S0040585X97978348. · Zbl 1004.60061
[11] Bogachev, V. I., Rockner, M., and Stannat, W., Uniqueness of solutions of elliptic equations and uniqueness of invariant measures of diffusions, Sb. Math., 193 (2002), pp. 945-976. · Zbl 1055.58009
[12] Borkar, V. and Gaitsgory, V., Averaging of singularly perturbed controlled stochastic differential equations, Appl. Math. Optim., 56 (2007), pp. 169-209. · Zbl 1139.93022
[13] Borkar, V. S. and Gaitsgory, V., Singular perturbations in ergodic control of diffusions, SIAM J. Control Optim., 46 (2007), pp. 1562-1577, doi:10.1137/060657327. · Zbl 1143.93024
[14] Chaudhari, P., Choromanska, A., Soatto, S., LeCun, Y., Baldassi, C., Borgs, C., Chayes, J., Sagun, L., and Zecchina, R., Entropy-SGD: Biasing gradient descent into wide valleys, J. Stat. Mech. Theory Exp., 2019 (2019), 124018. · Zbl 1459.65091
[15] Chaudhari, P., Oberman, A., Osher, S., Soatto, S., and Carlier, G., Deep relaxation: Partial differential equations for optimizing deep neural networks, Res. Math. Sci., 5 (2018), 30. · Zbl 1427.82032
[16] Clarke, F. H., Ledyaev, Y. S., Stern, R. J., and Wolenski, P. R., Nonsmooth Analysis and Control Theory, Springer-Verlag, New York, 1998. · Zbl 1047.49500
[17] Da Lio, F. and Ley, O., Uniqueness results for second-order Bellman-Isaacs equations under quadratic growth assumptions and applications, SIAM J. Control Optim., 45 (2006), pp. 74-106, doi:10.1137/S0363012904440897. · Zbl 1116.49017
[18] Djete, M. F., Possamaï, D., and Tan, X., McKean-Vlasov optimal control: The dynamic programming principle, Ann. Probab., 50 (2022), pp. 791-833. · Zbl 1491.49018
[19] Karoui, N. El and Tan, X., Capacities, Measurable Selection and Dynamic Programming. Part II: Application in Stochastic Control Problems, preprint, arXiv:1310.3364, 2015.
[20] Fleming, W. H. and Soner, H. M., Controlled Markov Processes and Viscosity Solutions, 2nd ed., Springer, New York, 2006. · Zbl 1105.60005
[21] Fouque, J.-P., Papanicolaou, G., Sircar, R., and Sølna, K., Multiscale Stochastic Volatility for Equity, Interest Rate, and Credit Derivatives, Cambridge University Press, Cambridge, UK, 2011. · Zbl 1248.91003
[22] Kokotović, P., Khalil, H. K., and O’Reilly, J., Singular Perturbation Methods in Control: Analysis and Design, Academic Press, London, 1986. · Zbl 0646.93001
[23] Kouhkouh, H., Some Asymptotic Problems for Hamilton-Jacobi-Bellman Equations and Applications to Global Optimization, Ph.D. thesis, University of Padova, 2022, https://hdl.handle.net/11577/3444759.
[24] Kushner, H., Weak Convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems, Birkhäuser, Boston, 1990. · Zbl 0931.93003
[25] LeCun, Y., Bengio, Y., and Hinton, G., Deep learning, Nature, 521 (2015), pp. 436-444.
[26] Li, Q., Tai, C., and E, W., Stochastic modified equations and adaptive stochastic gradient algorithms, in Proceedings of the International Conference on Machine Learning, , PMLR, 2017, pp. 2101-2110.
[27] Pardoux, E. and Veretennikov, A. Y., On the Poisson equation and diffusion approximation, I, Ann. Probab., 29 (2001), pp. 1061-1085. · Zbl 1029.60053
[28] Pardoux, E. and Veretennikov, A. Y., On Poisson equation and diffusion approximation, II, Ann. Probab., 31 (2003), pp. 1166-1192. · Zbl 1054.60064
[29] Pardoux, E. and Veretennikov, A. Y., On the Poisson equation and diffusion approximation, III, Ann. Probab., 33 (2005), pp. 1111-1133. · Zbl 1071.60022
[30] Pavon, M., On local entropy, stochastic control, and deep neural networks, IEEE Control Syst. Lett., 7 (2022), pp. 437-441.
[31] Pittorino, F., Lucibello, C., Feinauer, C., Perugini, G., Baldassi, C., Demyanenko, E., and Zecchina, R., Entropic gradient descent algorithms and wide flat minima, J. Stat. Mech. Theory Exp., 2021 (2021), 124015. · Zbl 1539.68318
[32] Stroock, D. W. and Varadhan, S. S., Multidimensional Diffusion Processes, , Springer-Verlag, Berlin, New York, 1979. · Zbl 0426.60069
[33] Wihler, T. P., On the Hölder continuity of matrix functions for normal matrices, JIPAM J. Inequal. Pure Appl. Math., 10 (2009), 91. · Zbl 1180.15022
[34] Yong, J. and Zhou, X. Y., Stochastic Controls: Hamiltonian Systems and HJB Equations, Springer-Verlag, New York, 1999. · Zbl 0943.93002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.