Abstract
We study the gradient method under the assumption that an additively inexact gradient is available for, generally speaking, non-convex problems. The non-convexity of the objective function, as well as the use of an inexactness specified gradient at iterations, can lead to various problems. For example, the trajectory of the gradient method may be far enough away from the starting point. On the other hand, the unbounded removal of the trajectory of the gradient method in the presence of noise can lead to the removal of the trajectory of the method from the desired global solution. The results of investigating the behavior of the trajectory of the gradient method are obtained under the assumption of the inexactness of the gradient and the condition of gradient dominance. It is well known that such a condition is valid for many important non-convex problems. Moreover, it leads to good complexity guarantees for the gradient method. A rule of early stopping of the gradient method is proposed. Firstly, it guarantees achieving an acceptable quality of the exit point of the method in terms of the function. Secondly, the stopping rule ensures a fairly moderate distance of this point from the chosen initial position. In addition to the gradient method with a constant step, its variant with adaptive step size is also investigated in detail, which makes it possible to apply the developed technique in the case of an unknown Lipschitz constant for the gradient. Some computational experiments have been carried out which demonstrate effectiveness of the proposed stopping rule for the investigated gradient methods.
Similar content being viewed by others
References
Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021)
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008)
Devolder, O.: Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization. Ph.D. thesis, ICTEAM and CORE, Université Catholique de Louvain (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)
Emelin, I.V., Krasnosel’skii, M.A.: The stoppage rule in iterative procedures of solving ill-posed problems. Autom. Remote Control 39, 1783–1787,: Translation from Avtom. Telemekh. 1978(12), 59–63 (1979). (in Russian)
Frei, S., Gu, Q.: Proxy convexity: a unified framework for the analysis of neural networks trained by gradient descent (2021). arXiv preprint arXiv:2106.13792
Garrigos, G., Gower, R.M.: Handbook of convergence theorems for (stochastic) gradient methods (2023). arXiv preprint arXiv:2301.11235
Gasnikov, A.V.: Modern numerical optimization methods: The universal gradient descent method (2021). arXiv preprint arXiv:1711.00394
Kabanikhin, S.I.: Inverse and Ill-posed Problems. In: Inverse and Ill-posed Problems. deGruyter (2011)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811. Springer (2016)
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1), 381–404 (2015)
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Nesterov, Y., Skokov, V.: On the issue of testing unconstrained optimization algorithms. In: Numerical methods of mathematical programming, pp. 77–91. Moscow (1980) (in Russian)
Polyak, B., Tremba, A.: New versions of Newton method: step-size choice, convergence domain and under-determined equations. Optim. Methods Softw. 35(6), 1272–1303 (2020)
Polyak, B.T.: Gradient methods for minimizing functionals. Comput. Math. Math. Phys. 3(4), 864–878 (1963). (in Russian)
Polyak, B.T.: Introduction to optimization. Optim. Softw. Inc. N. Y. 1, 32 (1987)
Polyak, B.T., Kuruzov, I.A., Stonyakin, F.S.: Stopping rules for gradient methods for non-convex problems with additive noise in gradient (2022) . arXiv preprint arXiv:2205.07544
Vasilyev, F.: Optimization Methods. FP, Moscow (2002). (in Russian)
Vasin, A., Gasnikov, A., Spokoiny, V.: Stopping rules for accelerated gradient methods with additive noise in gradient (2021). arXiv preprint arXiv:2102.02921
Vorontsova, E., Hildbrand, R., Gasnikov, A., Stonyakin, F.: Convex Optimization (2021). arXiv preprint arXiv:2106.01946
Acknowledgements
The research by F. Stonyakin in Sects. 2 and 3.1 was supported by the strategic academic leadership program "Priority 2030" (Agreement 075-02-2021-1316, 30.09.2021). The research by B. Polyak in Sect. 1 and by I. Kuruzov in Sect. 4 was supported by the Russian Science Foundation (project No. 21-71-30005), https://rscf.ru/en/project/21-71-30005/.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Paulo J.S. Silva.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Stonyakin, F., Kuruzov, I. & Polyak, B. Stopping Rules for Gradient Methods for Non-convex Problems with Additive Noise in Gradient. J Optim Theory Appl 198, 531–551 (2023). https://doi.org/10.1007/s10957-023-02245-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-023-02245-w