Stopping Rules for Gradient Methods for Non-convex Problems with Additive Noise in Gradient

265 Accesses
3 Citations
Explore all metrics

Abstract

We study the gradient method under the assumption that an additively inexact gradient is available for, generally speaking, non-convex problems. The non-convexity of the objective function, as well as the use of an inexactness specified gradient at iterations, can lead to various problems. For example, the trajectory of the gradient method may be far enough away from the starting point. On the other hand, the unbounded removal of the trajectory of the gradient method in the presence of noise can lead to the removal of the trajectory of the method from the desired global solution. The results of investigating the behavior of the trajectory of the gradient method are obtained under the assumption of the inexactness of the gradient and the condition of gradient dominance. It is well known that such a condition is valid for many important non-convex problems. Moreover, it leads to good complexity guarantees for the gradient method. A rule of early stopping of the gradient method is proposed. Firstly, it guarantees achieving an acceptable quality of the exit point of the method in terms of the function. Secondly, the stopping rule ensures a fairly moderate distance of this point from the chosen initial position. In addition to the gradient method with a constant step, its variant with adaptive step size is also investigated in detail, which makes it possible to apply the developed technique in the case of an unknown Lipschitz constant for the gradient. Some computational experiments have been carried out which demonstrate effectiveness of the proposed stopping rule for the investigated gradient methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stopping criteria for, and strong convergence of, stochastic gradient descent on Bottou-Curtis-Nocedal functions

Article 29 October 2021

Gradient-Type Methods for Optimization Problems with Polyak-Łojasiewicz Condition: Early Stopping and Adaptivity to Inexactness Parameter

On the Convergence Analysis of the Optimized Gradient Method

Article 05 October 2016

References

Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021)
Article MathSciNet MATH Google Scholar
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008)
Article MathSciNet MATH Google Scholar
Devolder, O.: Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization. Ph.D. thesis, ICTEAM and CORE, Université Catholique de Louvain (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)
Article MathSciNet MATH Google Scholar
Emelin, I.V., Krasnosel’skii, M.A.: The stoppage rule in iterative procedures of solving ill-posed problems. Autom. Remote Control 39, 1783–1787,: Translation from Avtom. Telemekh. 1978(12), 59–63 (1979). (in Russian)
Frei, S., Gu, Q.: Proxy convexity: a unified framework for the analysis of neural networks trained by gradient descent (2021). arXiv preprint arXiv:2106.13792
Garrigos, G., Gower, R.M.: Handbook of convergence theorems for (stochastic) gradient methods (2023). arXiv preprint arXiv:2301.11235
Gasnikov, A.V.: Modern numerical optimization methods: The universal gradient descent method (2021). arXiv preprint arXiv:1711.00394
Kabanikhin, S.I.: Inverse and Ill-posed Problems. In: Inverse and Ill-posed Problems. deGruyter (2011)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811. Springer (2016)
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1), 381–404 (2015)
Article MathSciNet MATH Google Scholar
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet MATH Google Scholar
Nesterov, Y., Skokov, V.: On the issue of testing unconstrained optimization algorithms. In: Numerical methods of mathematical programming, pp. 77–91. Moscow (1980) (in Russian)
Polyak, B., Tremba, A.: New versions of Newton method: step-size choice, convergence domain and under-determined equations. Optim. Methods Softw. 35(6), 1272–1303 (2020)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Gradient methods for minimizing functionals. Comput. Math. Math. Phys. 3(4), 864–878 (1963). (in Russian)
Article MathSciNet Google Scholar
Polyak, B.T.: Introduction to optimization. Optim. Softw. Inc. N. Y. 1, 32 (1987)
Google Scholar
Polyak, B.T., Kuruzov, I.A., Stonyakin, F.S.: Stopping rules for gradient methods for non-convex problems with additive noise in gradient (2022) . arXiv preprint arXiv:2205.07544
Vasilyev, F.: Optimization Methods. FP, Moscow (2002). (in Russian)
Google Scholar
Vasin, A., Gasnikov, A., Spokoiny, V.: Stopping rules for accelerated gradient methods with additive noise in gradient (2021). arXiv preprint arXiv:2102.02921
Vorontsova, E., Hildbrand, R., Gasnikov, A., Stonyakin, F.: Convex Optimization (2021). arXiv preprint arXiv:2106.01946

Download references

Acknowledgements

The research by F. Stonyakin in Sects. 2 and 3.1 was supported by the strategic academic leadership program "Priority 2030" (Agreement 075-02-2021-1316, 30.09.2021). The research by B. Polyak in Sect. 1 and by I. Kuruzov in Sect. 4 was supported by the Russian Science Foundation (project No. 21-71-30005), https://rscf.ru/en/project/21-71-30005/.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Moscow, Russia
Fedor Stonyakin, Ilya Kuruzov & Boris Polyak
V. I. Vernadsky Crimean Federal University, Simferopol, Russia
Fedor Stonyakin
Institute for Information Transmission Problems, Moscow, Russia
Ilya Kuruzov
Institute for Control Sciences, Moscow, Russia
Boris Polyak

Authors

Fedor Stonyakin
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Kuruzov
View author publications
You can also search for this author in PubMed Google Scholar
Boris Polyak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fedor Stonyakin.

Additional information

Communicated by Paulo J.S. Silva.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Stonyakin, F., Kuruzov, I. & Polyak, B. Stopping Rules for Gradient Methods for Non-convex Problems with Additive Noise in Gradient. J Optim Theory Appl 198, 531–551 (2023). https://doi.org/10.1007/s10957-023-02245-w

Download citation

Received: 20 October 2022
Accepted: 15 May 2023
Published: 01 June 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10957-023-02245-w

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stopping criteria for, and strong convergence of, stochastic gradient descent on Bottou-Curtis-Nocedal functions

Gradient-Type Methods for Optimization Problems with Polyak-Łojasiewicz Condition: Early Stopping and Adaptivity to Inexactness Parameter

On the Convergence Analysis of the Optimized Gradient Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Stopping Rules for Gradient Methods for Non-convex Problems with Additive Noise in Gradient

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stopping criteria for, and strong convergence of, stochastic gradient descent on Bottou-Curtis-Nocedal functions

Gradient-Type Methods for Optimization Problems with Polyak-Łojasiewicz Condition: Early Stopping and Adaptivity to Inexactness Parameter

On the Convergence Analysis of the Optimized Gradient Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation