Double Adaptive Stochastic Gradient Optimization

Abstract

Adaptive moment methods have been remarkably successful in deep learningoptimization, particularly in the presence of noisy and/or sparse gradients. Wefurther the advantages of adaptive moment techniques by proposing a family ofdouble adaptive stochastic gradient methods~\textsc{DASGrad}. They leverage thecomplementary ideas of the adaptive moment algorithms widely used by deeplearning community, and recent advances in adaptive probabilistic algorithms.Weanalyze the theoretical convergence improvements of our approach in astochastic convex optimization setting, and provide empirical validation of ourfindings with convex and non convex objectives. We observe that the benefitsof~\textsc{DASGrad} increase with the model complexity and variability of thegradients, and we explore the resulting utility in extensions ofdistribution-matching multitask learning.

Quick Read (beta)

loading the full paper ...