Laplacian Smoothing Gradient Descent

Abstract

We propose a very simple modification of gradient descent and stochasticgradient descent. We show that when applied to a variety of machine learningmodels including softmax regression, convolutional neural nets, generativeadversarial nets, and deep reinforcement learning, this very simple surrogatecan dramatically reduce the variance and improve the accuracy of thegeneralization. The new algorithm, (which depends on one nonnegative parameter)when applied to non-convex minimization, tends to avoid sharp local minima.Instead it seeks somewhat flatter local (and often global) minima. The methodonly involves preconditioning the gradient by the inverse of a tri-diagonalmatrix that is positive definite. The motivation comes from the theory ofHamilton-Jacobi partial differential equations. This theory demonstrates thatthe new algorithm is almost the same as doing gradient descent on a newfunction which (a) has the same global minima as the original function and (b)is "more convex". Again, the programming effort in doing this is minimal, incost, complexity and effort. We implement our algorithm into both PyTorch andTensorflow platforms, which will be made publicly available.

Quick Read (beta)

loading the full paper ...