Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization

Abstract

The graduated optimization approach is a method for finding global optimal solutions for nonconvex functions by using a function smoothing operation with stochastic noise. This paper makes three contributions regarding graduated optimization. First, we extend the definition of function smoothing that is traditionally achieved through convolution with Gaussian noise and characterize for the first time function smoothing with heavy-tailed noise. Second, we show that light- or heavy-tailed stochastic noise in stochastic gradient descent (SGD) has the effect of smoothing the objective function, the degree of which is determined by the learning rate, batch size, and the moment of the stochastic noise. Using this finding, we propose and analyze a new graduated optimization algorithm that varies the degree of smoothing by varying the learning rate and batch size. Third, we relax the $σ$-nice property, a standard but restrictive condition in the analysis of graduated optimization. Our refinement enables convergence guarantees for a broader class of non-convex functions, thereby bridging the gap between theoretical assumptions and practical optimization landscapes.

Quick Read (beta)

loading the full paper ...