L4: Practical loss-based stepsize adaptation for deep learning

Abstract

We propose a stepsize adaptation scheme for stochastic gradient descent. Itoperates directly with the loss function and rescales the gradient in order tomake fixed predicted progress on the loss. We demonstrate its capabilities bystrongly improving the performance of Adam and Momentum optimizers. Theenhanced optimizers with default hyperparameters consistently outperform theirconstant stepsize counterparts, even the best ones, without a measurableincrease in computational cost. The performance is validated on multiplearchitectures including ResNets and the Differential Neural Computer. Aprototype implementation as a TensorFlow optimizer is released.

Quick Read (beta)

loading the full paper ...