×

LSOS: line-search second-order stochastic optimization methods for nonconvex finite sums. (English) Zbl 1511.65048

Summary: We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not make any convexity assumptions, but require the terms of the sum to be continuously differentiable and have Lipschitz-continuous gradients. The methods fitting into this framework combine line searches and suitably decaying step lengths. A key issue is a two-step sampling at each iteration, which allows us to control the error present in the line-search procedure. Stationarity of limit points is proved in the almost-sure sense, while almost-sure convergence of the sequence of approximations to the solution holds with the additional hypothesis that the functions are strongly convex. Numerical experiments, including comparisons with state-of-the art stochastic optimization methods, show the efficiency of our approach.

MSC:

65K05 Numerical mathematical programming methods
90C15 Stochastic programming
62L20 Stochastic approximation

Software:

SONIA; Saga; SGDLibrary

References:

[1] Bellavia, Stefania, Adaptive cubic regularization methods with dynamic inexact Hessian information and applications to finite-sum minimization, IMA J. Numer. Anal., 764-799 (2021) · Zbl 1460.65076 · doi:10.1093/imanum/drz076
[2] Bellavia, Stefania, Subsampled inexact Newton methods for minimizing large sums of convex functions, IMA J. Numer. Anal., 2309-2341 (2020) · Zbl 1466.65041 · doi:10.1093/imanum/drz027
[3] Berahas, A. S., Global convergence rate analysis of a generic line search algorithm with noise, SIAM J. Optim., 1489-1518 (2021) · Zbl 1470.90129 · doi:10.1137/19M1291832
[4] Berahas, Albert S., A robust multi-batch L-BFGS method for machine learning, Optim. Methods Softw., 191-219 (2020) · Zbl 1430.90523 · doi:10.1080/10556788.2019.1658107
[5] Bertsekas, Dimitri P., Nonlinear Programming, Athena Scientific Optimization and Computation Series, xiv+777 pp. (1999), Athena Scientific, Belmont, MA · Zbl 1015.90077
[6] Bollapragada, Raghu, Exact and inexact subsampled Newton methods for optimization, IMA J. Numer. Anal., 545-578 (2019) · Zbl 1462.65077 · doi:10.1093/imanum/dry009
[7] Bottou, L\'{e}on, Optimization methods for large-scale machine learning, SIAM Rev., 223-311 (2018) · Zbl 1397.65085 · doi:10.1137/16M1080173
[8] Byrd, R. H., A stochastic quasi-Newton method for large-scale optimization, SIAM J. Optim., 1008-1031 (2016) · Zbl 1382.65166 · doi:10.1137/140954362
[9] Byrd, Richard H., On the use of stochastic Hessian information in optimization methods for machine learning, SIAM J. Optim., 977-995 (2011) · Zbl 1245.65062 · doi:10.1137/10079923X
[10] Byrd, Richard H., Sample size selection in optimization methods for machine learning, Math. Program., 127-155 (2012) · Zbl 1252.49044 · doi:10.1007/s10107-012-0572-5
[11] Chen, Huiming, A stochastic quasi-Newton method for large-scale nonconvex optimization with applications, IEEE Trans. Neural Netw. Learn. Syst., 4776-4790 (2020)
[12] Curtis, Frank E., A fully stochastic second-order trust region method, Optim. Methods Softw., 844-877 (2022) · Zbl 1502.90115 · doi:10.1080/10556788.2020.1852403
[13] A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems 27, Curran Associates, Inc., 2014, pp. 1646-1654.
[14] D. di Serafino, G. Toraldo, and M. Viola, A gradient-based globalization strategy for the Newton method, Numerical Computations: Theory and Algorithms (Cham), Lecture Notes in Computer Science, vol. 11973, Springer International Publishing, 2020, pp. 177-185. · Zbl 07249926
[15] di Serafino, Daniela, Using gradient directions to get global convergence of Newton-type methods, Appl. Math. Comput., Paper No. 125612, 14 pp. (2021) · Zbl 1510.65116 · doi:10.1016/j.amc.2020.125612
[16] Goodfellow, Ian, Deep Learning, Adaptive Computation and Machine Learning, xxii+775 pp. (2016), MIT Press, Cambridge, MA · Zbl 1373.68009
[17] R. M. Gower, D. Goldfarb, and P. Richt\'arik, Stochastic block BFGS: squeezing more curvature out of data, Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, 2016, pp. 1869-1878, JMLR.org.
[18] Gower, Robert M., Stochastic quasi-gradient methods: variance reduction via Jacobian sketching, Math. Program., 135-192 (2021) · Zbl 1471.65051 · doi:10.1007/s10107-020-01506-0
[19] S. Horv\'ath and P. Richtarik, Nonconvex variance reduced optimization with arbitrary sampling, Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, PMLR, June 9-15, 2019, pp. 2781-2789.
[20] Iusem, Alfredo N., Variance-based extragradient methods with line search for stochastic variational inequalities, SIAM J. Optim., 175-206 (2019) · Zbl 1415.65145 · doi:10.1137/17M1144799
[21] M. Jahani, M. Nazari, R. Tappenden, A. Berahas, and M. Tak\'ac, SONIA: a symmetric blockwise truncated optimization algorithm, Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 130, PMLR, April 13-15, 2021, pp. 487-495.
[22] R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems 26, Curran Associates, Inc., 2013, pp. 315-323.
[23] Klenke, Achim, Probability Theory, Universitext, xii+638 pp. (2014), Springer, London · Zbl 1295.60001 · doi:10.1007/978-1-4471-5361-0
[24] Kreji\'{c}, Nata\v{s}a, Descent direction method with line search for unconstrained optimization in noisy environment, Optim. Methods Softw., 1164-1184 (2015) · Zbl 1328.90091 · doi:10.1080/10556788.2015.1025403
[25] Mokhtari, Aryan, IQN: an incremental quasi-Newton method with local superlinear convergence rate, SIAM J. Optim., 1670-1698 (2018) · Zbl 1401.90121 · doi:10.1137/17M1122943
[26] Mokhtari, Aryan, RES: regularized stochastic BFGS algorithm, IEEE Trans. Signal Process., 6089-6104 (2014) · Zbl 1394.94405 · doi:10.1109/TSP.2014.2357775
[27] Mokhtari, Aryan, Global convergence of online limited memory BFGS, J. Mach. Learn. Res., 3151-3181 (2015) · Zbl 1351.90124
[28] P. Moritz, R. Nishihara, and M. Jordan, A linearly-convergent stochastic L-BFGS algorithm, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (Cadiz, Spain), Proceedings of Machine Learning Research, vol. 51, PMLR, May 9-11, 2016, pp. 249-258.
[29] Paquette, Courtney, A stochastic line search method with expected complexity analysis, SIAM J. Optim., 349-376 (2020) · Zbl 1431.90153 · doi:10.1137/18M1216250
[30] Park, Seonho, Combining stochastic adaptive cubic regularization with negative curvature for nonconvex optimization, J. Optim. Theory Appl., 953-971 (2020) · Zbl 1432.90096 · doi:10.1007/s10957-019-01624-6
[31] Robbins, Herbert, A stochastic approximation method, Ann. Math. Statistics, 400-407 (1951) · Zbl 0054.05901 · doi:10.1214/aoms/1177729586
[32] Robbins, H., Optimizing methods in statistics. A convergence theorem for non negative almost supermartingales and some applications, 233-257 (1971), Academic Press, New York · Zbl 0286.60025
[33] Ruppert, David, A Newton-Raphson version of the multivariate Robbins-Monro procedure, Ann. Statist., 236-245 (1985) · Zbl 0571.62072 · doi:10.1214/aos/1176346589
[34] J. C. Spall, A second order stochastic approximation algorithm using only function measurements, Proceedings of 1994 33rd IEEE Conference on Decision and Control, vol. 3, 1994, pp. 2472-2477.
[35] J. C. Spall, Stochastic version of second-order (Newton-Raphson) optimization using only function measurements, Proceedings of the 27th Conference on Winter Simulation (USA), WSC ’95, IEEE Computer Society, 1995, pp. 347-352.
[36] J. C. Spall, Accelerated second-order stochastic optimization using only function measurements, Proceedings of the 36th IEEE Conference on Decision and Control, vol. 2, 1997, pp. 1417-1424.
[37] Wang, Xiao, Stochastic quasi-Newton methods for nonconvex stochastic optimization, SIAM J. Optim., 927-956 (2017) · Zbl 1365.90182 · doi:10.1137/15M1053141
[38] Wang, Xiaoyu, Stochastic proximal quasi-Newton methods for non-convex composite optimization, Optim. Methods Softw., 922-948 (2019) · Zbl 07111868 · doi:10.1080/10556788.2018.1471141
[39] P. Xu, F. Roosta, and M. W. Mahoney, Second-order optimization for non-convex machine learning: an empirical study, Proceedings of the 2020 SIAM International Conference on Data Mining (SDM), 2020, pp. 199-207.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.