×

A consensus-based global optimization method for high dimensional machine learning problems. (English) Zbl 1480.60195

Summary: We improve recently introduced consensus-based optimization method, proposed in [R. Pinnau et al., Math. Models Methods Appl. Sci. 27, No. 1, 183–204 (2017; Zbl 1388.90098)], which is a gradient-free optimization method for general non-convex functions. We first replace the isotropic geometric Brownian motion by the component-wise one, thus removing the dimensionality dependence of the drift rate, making the method more competitive for high dimensional optimization problems. Secondly, we utilize the random mini-batch ideas to reduce the computational cost of calculating the weighted average which the individual particles tend to relax toward. For its mean-field limit – a nonlinear Fokker-Planck equation – we prove, in both time continuous and semi-discrete settings, that the convergence of the method, which is exponential in time, is guaranteed with parameter constraints independent of the dimensionality. We also conduct numerical tests to high dimensional problems to check the success rate of the method.

MSC:

60H35 Computational methods for stochastic equations (aspects of stochastic analysis)
65K10 Numerical optimization and variational techniques
68T05 Learning and adaptive systems in artificial intelligence
70F10 \(n\)-body problems

Citations:

Zbl 1388.90098

References:

[1] G. Albi and L. Pareschi, Binary interaction algorithms for the simulation of flocking and swarming dynamics. Multisc. Model. Simul. 11 (2013) 1-29. · Zbl 1274.92053
[2] J.J. Alonso and J. Hicken, Introduction to multidisciplinary design optimization. In Vol. 222 of Aeronautics & Astronautics. Standford University (2012).
[3] N. Bellomo, A. Bellouquid and D. Knopoff, From the microscale to collective crowd dynamics. Multis. Model. Simul. 11 (2013) 943-963. · Zbl 1280.90019
[4] C.M. Bender and S.A. Orszag, Advanced Mathematical Methods for Scientists and Engineers. International Series in Pure and Applied Mathematics. McGraw-Hill (1978). · Zbl 0417.34001
[5] Y. Bengio, P. Simard and P. Frasconi, Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5 (1994) 157-166. · doi:10.1109/72.279181
[6] A.L. Bertozzi, J. Rosado, M.B. Short and L. Wang, Contagion shocks in one dimension. J. Stat. Phys. 158 (2015) 647-664. · Zbl 1318.35135
[7] F. Bolley, J.A. Cañizo and J.A. Carrillo, Stochastic mean-field limit: non-Lipschitz forces and swarming. Math. Models Methods Appl. Sci. 21 (2011) 2179-2210. · Zbl 1273.82041
[8] L. Bottou, Online learning and stochastic approximations. On-line Learn. Neural Netw. 17 (1998) 142. · Zbl 0968.68127
[9] S. Bubeck, Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn. 8 (2015) 231-357. · Zbl 1365.90196
[10] J.A. Carrillo, Y.-P. Choi and M. Hauray, The derivation of swarming models: mean-field limit and Wasserstein distances. Collective dynamics from bacteria to crowds, volume 553 of CISM Courses and Lectures. Springer, Vienna (2014) 1-46.
[11] J.A. Carrillo, Y.-P. Choi, C. Totzeck and O. Tse, An analytical framework for consensus-based global optimization method. Math. Models Methods Appl. Sci. 28 (2018) 1037-1066. · Zbl 1397.35311
[12] J.A. Carrillo, M. Fornasier, J. Rosado and G. Toscani, Asymptotic flocking dynamics for the kinetic Cucker-Smale model. SIAM J. Math. Anal. 42 (2010) 218-236. · Zbl 1223.35058
[13] J.A. Carrillo, M. Fornasier, G. Toscani and F. Vecil, Particle, kinetic, and hydrodynamic models of swarming. In Mathematical modeling of collective behavior in socio-economic and life sciences, Modelling and Simulation in Materials Science and Engineering. Birkhäuser Boston, Inc., Boston, MA (2010) 297-336. · Zbl 1211.91213
[14] J.A. Carrillo, L. Pareschi and M. Zanella, Particle based gPC methods for mean-field models of swarming with uncertainty. Commun. Comput. Phys. 25 (2018) 508-531. · Zbl 1473.65252
[15] F. Cucker and S. Smale, On the mathematics of emergence. Jpn. J. Math. 2 (2007) 197-227. · Zbl 1166.92323
[16] X. Dai and Y. Zhu, Towards theoretical understanding of large batch training in stochastic gradient descent. Preprint arXiv:1812.00542 (2018).
[17] A. Dembo and O. Zeitouni, Vol. 38 of Large deviations techniques and applications. Springer Science & Business Media (2009). · Zbl 0793.60030
[18] R. Eberhart and J. Kennedy, Particle swarm optimization. In Proc. of IEEE International Conference on Neural Networks 4 (1995) 1942-1948.
[19] S.-Y. Ha, S. Jin and D. Kim, Convergence of a first-order consensus-based global optimization algorithm. Preprint arXiv:1910.08239 (2019).
[20] S.-Y. Ha and E. Tadmor, From particle to kinetic and hydrodynamic descriptions of flocking. Kinetic Related Models 1 (2008) 415-435. · Zbl 1402.76108
[21] B. Hanin, Which neural net architectures give rise to exploding and vanishing gradients? In Adv. Neural Inf. Process. Syst. (2018) 582-591.
[22] M. Hauray and P.-E. Jabin, N-particles approximation of the Vlasov equations with singular potential. Arch. Ratl. Mech. Anal. 183 (2007) 489-524. · Zbl 1107.76066
[23] J.H. Holland, Genetic algorithms. Sci. Am. 267 (1992) 66-73.
[24] R.A. Holley, S. Kusuoka and D.W. Stroock, Asymptotics of the spectral gap with applications to the theory of simulated annealing. J. Funct. Anal. 83 (1989) 333-347. · Zbl 0706.58075
[25] L.C. Hsu, A theorem on the asymptotic behavior of a multiple integral. Duke Math. J. 15 (1948) 623-6323. · Zbl 0040.18002
[26] C.-R. Hwang and S.-J. Sheu, Large-time behavior of perturbed diffusion Markov processes with applications to the second eigenvalue problem for Fokker-Planck operators and simulated annealing. Acta Appl. Math. 19 (1990) 253-295. · Zbl 0708.60056
[27] T. Inglot and P. Majerski, Simple upper and lower bounds for the multivariate Laplace approximation. J. Approx. Theory 186 (2014) 1-11. · Zbl 1333.41009
[28] P.-E. Jabin, A review of the mean field limits for Vlasov equations. Kinetic Related Models 7 (2014) 661-711. · Zbl 1318.35129
[29] P.-E. Jabin and Z. Wang, Quantitative estimates of propagation of chaos for stochastic systems with W^−1,∞ kernels. Invent. Math. 214 (2018) 523-591. · Zbl 1402.35208
[30] S. Jastrzebski, Z. Kenton, D. Arpit, N. Ballas, A. Fischer, Y. Bengio and A. Storkey, Three factors influencing minima in SGD. Preprint arXiv:1711.04623 (2017).
[31] S. Jin, L. Li and J.-G. Liu, Random batch methods (RBM) for interacting particle systems. J. Comput. Phys. 400 (2020) 108877. · Zbl 1453.82065
[32] J. Kennedy, Swarm intelligence, handbook of nature-inspired and innovative computing. Springer (2006) 187-219.
[33] N.S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy and P.T.P. Tang, On large-batch training for deep learning: generalization gap and sharp minima. In International Conference on Learning Representations (2017).
[34] S. Kirkpatrick, C. Daniel Gelatt and M.P. Vecchi, Optimization by simulated annealing. Science 220 (1983) 671-680. · Zbl 1225.90162
[35] T. Kolokolnikov, J.A. Carrillo, A. Bertozzi, R. Fetecau and M. Lewis, Emergent behaviour in multi-particle systems with non-local interactions
[36] J.P. McClure and R. Wong, Error bounds for multidimensional Laplace approximation. J. Approx. Theory 37 (1983) 372-390. · Zbl 0514.41028
[37] P.J.M. van Laarhoven and E.H.L. Aarts, Simulated annealing: theory and applications. D. Reidel Publishing Co., Dordrecht (1987) 37. · Zbl 0643.65028
[38] S. Liu, D. Papailiopoulos and D. Achlioptas, Bad global minima exist and SGD can reach them. Preprint arXiv:1906.02613 (2019).
[39] P.D. Miller, Applied asymptotic analysis. American Mathematical Society (2006). · Zbl 1101.41031
[40] S. Motsch and E. Tadmor, Heterophilious dynamics enhances consensus. SIAM Rev. 56 (2014) 577-621. · Zbl 1310.92064
[41] J.A. Nelder and R. Mead, A simplex method for function minimization. Comput. J. 7 (1965) 308-313. · Zbl 0229.65053
[42] R. Pinnau, C. Totzeck, O. Tse and S. Martin, A consensus-based model for global optimization and its mean-field limit. Math. Models Methods Appl. Sci. 27 (2017) 183-204. · Zbl 1388.90098
[43] H. Robbins and S. Monro, A stochastic approximation method. Ann. Math. Stat. (1951) 400-407. · Zbl 0054.05901
[44] G. Toscani, Kinetic models of opinion formation. Commun. Math. Sci. 4 (2006) 481-496. · Zbl 1195.91128
[45] C. Totzeck, R. Pinnau, S. Blauth and S. Schotthófer, A numerical comparison of consensus-based global optimization to other particle-based global optimization schemes. Proc. Appl. Math. Mech. 18 (2018) e201800291.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.