Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

  • 2023-01-30 05:14:51
  • Anant Raj, Lingjiong Zhu, Mert Gürbüzbalaban, Umut Şimşekli
  • 0

Abstract

Heavy-tail phenomena in stochastic gradient descent (SGD) have been reportedin several empirical studies. Experimental evidence in previous works suggestsa strong interplay between the heaviness of the tails and generalizationbehavior of SGD. To address this empirical phenomena theoretically, severalworks have made strong topological and statistical assumptions to link thegeneralization error to heavy tails. Very recently, new generalization boundshave been proven, indicating a non-monotonic relationship between thegeneralization error and heavy tails, which is more pertinent to the reportedempirical observations. While these bounds do not require additionaltopological assumptions given that SGD can be modeled using a heavy-tailedstochastic differential equation (SDE), they can only apply to simple quadraticproblems. In this paper, we build on this line of research and developgeneralization bounds for a more general class of objective functions, whichincludes non-convex functions as well. Our approach is based on developingWasserstein stability bounds for heavy-tailed SDEs and their discretizations,which we then convert to generalization bounds. Our results do not require anynontrivial assumptions; yet, they shed more light to the empiricalobservations, thanks to the generality of the loss functions.

 

Quick Read (beta)

loading the full paper ...