Gradient Descent: The Ultimate Optimizer

Abstract

Working with any gradient-based machine learning algorithm involves thetedious task of tuning the optimizer's hyperparameters, such as the learningrate. There exist many techniques for automated hyperparameter optimization,but they typically introduce even more hyperparameters to control thehyperparameter optimization process. We propose to instead learn thehyperparameters themselves by gradient descent, and furthermore to learn thehyper-hyperparameters by gradient descent as well, and so on ad infinitum. Asthese towers of gradient-based optimizers grow, they become significantly lesssensitive to the choice of top-level hyperparameters, hence decreasing theburden on the user to search for optimal values.

Quick Read (beta)

loading the full paper ...