×

Provably faster gradient descent via long steps. (English) Zbl 07887992

Summary: This work establishes new convergence guarantees for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster \(O(1/T\log T)\) rate for gradient descent is also motivated along with simple numerical validation.

MSC:

90C22 Semidefinite programming
90C25 Convex programming
65K05 Numerical mathematical programming methods

Software:

SPECTRA; ProxSDP

References:

[1] Agarwal, N., Goel, S., and Zhang, C., Acceleration via fractal learning rate schedules, in Proceedings of Machine Learning Research, , PMLR, 2021, pp. 87-99.
[2] Altschuler, J., Greed, Hedging, and Acceleration in Convex Optimization, Ph.D. thesis, Massachusetts Institute of Technology, 2018.
[3] Barré, M., Taylor, A., and Bach, F., Principled analyses and design of first-order methods with inexact proximal operators, Math. Program., 201 (2023), pp. 185-230. · Zbl 1522.90074
[4] Bellavia, S., Gondzio, J., and Porcelli, M., A relaxed interior point method for low-rank semidefinite programming problems with applications to matrix completion, J. Sci. Comput., 89 (2019), 46. · Zbl 1479.90152
[5] Bertsekas, D. P., Convex Optimization Algorithms, Athena Scientific, 2015. · Zbl 1347.90001
[6] Daccache, A., Performance Estimation of the Gradient Method with Fixed Arbitrary Step Sizes, master’s thesis, Université Catholique de Louvain, 2019.
[7] de Klerk, E., Glineur, F., and Taylor, A., On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions, Optim. Lett., 11 (2016), pp. 1185-1199. · Zbl 1381.90067
[8] de Klerk, E., Glineur, F., and Taylor, A. B., Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation, SIAM J. Optim., 30 (2020), pp. 2053-2082, doi:10.1137/19M1281368. · Zbl 1448.90070
[9] Díaz, M. and Grimmer, B., Optimal convergence rates for the proximal bundle method, SIAM J. Optim., 33 (2023), pp. 424-454, doi:10.1137/21M1428601. · Zbl 1519.90167
[10] Ding, L. and Grimmer, B., Revisiting spectral bundle methods: Primal-dual (sub)linear convergence rates, SIAM J. Optim., 33 (2023), pp. 1305-1332, doi:10.1137/21M1402340. · Zbl 1523.90270
[11] Ding, L., Yurtsever, A., Cevher, V., Tropp, J. A., and Udell, M., An optimal-storage approach to semidefinite programming using approximate complementarity, SIAM J. Optim., 31 (2021), pp. 2695-2725, doi:10.1137/19M1244603. · Zbl 1480.90188
[12] Dragomir, R.-A., Taylor, A., d’Aspremont, A., and Bolte, J., Optimal complexity and certification of Bregman first-order methods, Math. Program., 194 (2019), pp. 41-83. · Zbl 1494.90076
[13] Drori, Y., Contributions to the Complexity Analysis of Optimization Algorithms, Ph.D. thesis, Tel-Aviv University, 2014.
[14] Drori, Y. and Taylor, A., Efficient first-order methods for convex minimization: A constructive approach, Math. Program., 184 (2018), pp. 183-220. · Zbl 1451.90118
[15] Drori, Y. and Teboulle, M., Performance of first-order methods for smooth convex minimization: A novel approach, Math. Program., 145 (2012), pp. 451-482. · Zbl 1300.90068
[16] Eloi, D., Worst-Case Functions for the Gradient Method with Fixed Variable Step Sizes, master’s thesis, Université Catholique de Louvain, 2022.
[17] Goujaud, B., Scieur, D., Dieuleveut, A., Taylor, A. B., and Pedregosa, F., Super-acceleration with cyclical step-sizes, in Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, , PMLR, 2022, pp. 3028-3065.
[18] Gu, G. and Yang, J., Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problems, SIAM J. Optim., 30 (2020), pp. 1905-1921, doi:10.1137/19M1299049. · Zbl 1468.90083
[19] Gupta, S. D., Van Parys, B. P., and Ryu, E., Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods, Math. Program., 204 (2024), pp. 567-639. · Zbl 07807818
[20] Henrion, D., Naldi, S., and Din, M. S. E., SPECTRA—A Maple library for solving linear matrix inequalities in exact arithmetic, Optim. Methods Softw., 34 (2016), pp. 62-78.
[21] Jang, U., Gupta, S. D., and Ryu, E. K., Computer-Assisted Design of Accelerated Composite Optimization Methods: OptISTA, https://arxiv.org/abs/2305.15704, 2023.
[22] Kim, D., Accelerated proximal point method for maximally monotone operators, Math. Program., 190 (2021), pp. 57-87, doi:10.1007/s10107-021-01643-0. · Zbl 1478.90089
[23] Kim, D. and Fessler, J. A., Optimized first-order methods for smooth convex minimization, Math. Program., 159 (2016), pp. 81-107, doi:10.1007/s10107-015-0949-3. · Zbl 1345.90113
[24] Kim, D. and Fessler, J. A., Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions, J. Optim. Theory Appl., 188 (2021), pp. 192-219, doi:10.1007/s10957-020-01770-2. · Zbl 1468.90085
[25] Lebedev, V. and Finogenov, S., Ordering of the iterative parameters in the cyclical Chebyshev iterative method, USSR Comput. Math. Math. Phys., 11 (1971), pp. 155-170, doi:10.1016/0041-5553(71)90169-8. · Zbl 0249.65043
[26] Lee, C.-P. and Wright, S., First-order algorithms converge faster than \(O(1/k)\) on convex problems, in Proceedings of the 36th International Conference on Machine Learning, , , PMLR, 2019, pp. 3754-3762.
[27] Lieder, F., On the convergence rate of the Halpern-iteration, Optim. Lett., 15 (2021), pp. 405-418. · Zbl 1466.90067
[28] Loshchilov, I. and Hutter, F., SGDR: Stochastic Gradient Descent with Restarts, arXiv:abs/1608.03983, 2016.
[29] Nesterov, Y., Introductory Lectures on Convex Optimization: A Basic Course, 1st ed., Springer, 2014.
[30] Oymak, S., Provable super-convergence with a large cyclical learning rate, IEEE Signal Process. Lett., 28 (2021), pp. 1645-1649, doi:10.1109/LSP.2021.3101131.
[31] Pedregosa, F., Acceleration without momentum, 2021.
[32] Ryu, E. K., Taylor, A. B., Bergeling, C., and Giselsson, P., Operator splitting performance estimation: Tight contraction factors and optimal parameter selection, SIAM J. Optim., 30 (2020), pp. 2251-2271, doi:10.1137/19M1304854. · Zbl 1511.90322
[33] Smith, L. N., Cyclical learning rates for training neural networks, in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), , IEEE, 2015, pp. 464-472.
[34] Smith, L. N. and Topin, N., Super-convergence: Very fast training of neural networks using large learning rates, in Defense + Commercial Sensing, 2018.
[35] Souto, M., Garcia, J. D., and lvaro Veiga, Á., Exploiting low-rank structure in semidefinite programming by approximate operator splitting, Optimization, 71 (2022), pp. 117-144, doi:10.1080/02331934.2020.1823387. · Zbl 1486.90141
[36] Taylor, A. and Bach, F., Stochastic first-order methods: Non-asymptotic and computer-aided analyses via potential functions, Proc. Mach. Learn. Res. (PMLR), 99 (2019), pp. 2934-2992.
[37] Taylor, A. and Drori, Y., An optimal gradient method for smooth strongly convex minimization, Math. Program., 199 (2022), pp. 557-594, doi:10.1007/s10107-022-01839-y. · Zbl 1518.90071
[38] Taylor, A. B., Hendrickx, J. M., and Glineur, F., Exact worst-case performance of first-order methods for composite convex optimization, SIAM J. Optim., 27 (2017), pp. 1283-1313, doi:10.1137/16M108104X. · Zbl 1371.90108
[39] Taylor, A. B., Hendrickx, J. M., and Glineur, F., Smooth strongly convex interpolation and exact worst-case performance of first-order methods, Math. Program., 161 (2017), pp. 307-345. · Zbl 1359.90098
[40] Taylor, A., Van Scoy, B., and Lessard, L., Lyapunov functions for first-order methods: Tight automated convergence guarantees, in Proceedings of the 35th International Conference on Machine Learning, , , PMLR, 2018, pp. 4897-4906.
[41] Teboulle, M. and Vaisbourd, Y., An elementary approach to tight worst case complexity analysis of gradient based methods, Math. Program., 201 (2022), pp. 63-96, doi:10.1007/s10107-022-01899-0. · Zbl 1522.90105
[42] Wang, A. L. and Kilinç-Karzan, F., Accelerated first-order methods for a class of semidefinite programs, Math. Program., (2024), .
[43] Wang, A. L., Lu, Y., and Kilinç-Karzan, F., Implicit regularity and linear convergence rates for the generalized trust-region subproblem, SIAM J. Optim., 33 (2023), pp. 1250-1278, doi:10.1137/21M1468073. · Zbl 1522.90069
[44] Young, D., On Richardson’s method for solving linear systems with positive definite matrices, J. Math. Phys., 32 (1953), pp. 243-255, doi:10.1002/sapm1953321243. · Zbl 0055.11202
[45] Yurtsever, A., Tropp, J. A., Fercoq, O., Udell, M., and Cevher, V., Scalable semidefinite programming, SIAM J. Math. Data Sci., 3 (2021), pp. 171-200, doi:10.1137/19M1305045. · Zbl 1470.90068
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.