A Guide Through the Zoo of Biased SGD

Abstract

Stochastic Gradient Descent (SGD) is arguably the most important singlealgorithm in modern machine learning. Although SGD with unbiased gradientestimators has been studied extensively over at least half a century, SGDvariants relying on biased estimators are rare. Nevertheless, there has been anincreased interest in this topic in recent years. However, existing literatureon SGD with biased estimators (BiasedSGD) lacks coherence since each new paperrelies on a different set of assumptions, without any clear understanding ofhow they are connected, which may lead to confusion. We address this gap byestablishing connections among the existing assumptions, and presenting acomprehensive map of the underlying relationships. Additionally, we introduce anew set of assumptions that is provably weaker than all previous assumptions,and use it to present a thorough analysis of BiasedSGD in both convex andnon-convex settings, offering advantages over previous results. We also provideexamples where biased estimators outperform their unbiased counterparts orwhere unbiased versions are simply not available. Finally, we demonstrate theeffectiveness of our framework through experimental results that validate ourtheoretical findings.

Quick Read (beta)

loading the full paper ...