Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

Abstract

Many popular adaptive gradient methods such as Adam and RMSProp rely on anexponential moving average (EMA) to normalize their stepsizes. While the EMAmakes these methods highly responsive to new gradient information, recentresearch has shown that it also causes divergence on at least one convexoptimization problem. We propose a novel method called Expectigrad, whichadjusts stepsizes according to a per-component unweighted mean of allhistorical gradients and computes a bias-corrected momentum term jointlybetween the numerator and denominator. We prove that Expectigrad cannot divergeon every instance of the optimization problem known to cause Adam to diverge.We also establish a regret bound in the general stochastic nonconvex settingthat suggests Expectigrad is less susceptible to gradient variance thanexisting methods are. Testing Expectigrad on several high-dimensional machinelearning tasks, we find it often performs favorably to state-of-the-art methodswith little hyperparameter tuning.

Quick Read (beta)

loading the full paper ...