Computer Science > Machine Learning

arXiv:2410.16561 (cs)

[Submitted on 21 Oct 2024]

Title:Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results

Abstract:This paper investigates Gradient Normalization Stochastic Gradient Descent without Clipping (NSGDC) and its variance reduction variant (NSGDC-VR) for nonconvex optimization under heavy-tailed noise. We present significant improvements in the theoretical results for both algorithms, including the removal of logarithmic factors from the convergence rates and the recovery of the convergence rate to match the deterministic case when the noise variance {\sigma} is zero. Additionally, we demonstrate that gradient normalization alone, assuming individual Lipschitz smoothness, is sufficient to ensure convergence of SGD under heavy-tailed noise, eliminating the need for gradient clipping. Furthermore, we introduce accelerated nonconvex algorithms that utilize second-order Lipschitz smoothness to achieve enhanced convergence rates in the presence of heavy-tailed noise. Our findings offer a deeper understanding of how gradient normalization and variance reduction techniques can be optimized for robust performance in challenging optimization scenarios.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2410.16561 [cs.LG]
	(or arXiv:2410.16561v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.16561

Submission history

From: Tao Sun [view email]
[v1] Mon, 21 Oct 2024 22:40:42 UTC (27 KB)

Computer Science > Machine Learning

Title:Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators