Zap Meets Momentum: Stochastic Approximation Algorithms with Optimal Convergence Rate

Abstract

There are two well known Stochastic Approximation techniques that are knownto have optimal rate of convergence (measured in terms of asymptotic variance):the Ruppert-Polyak averaging technique, and stochastic Newton-Raphson (SNR) (amatrix gain algorithm that resembles the deterministic Newton-Raphson method).The Zap algorithms introduced by the authors are a version of SNR designed tobehave more closely like their deterministic cousin. It is found that estimatesfrom the Zap Q-learning algorithm converge remarkably quickly, but theper-iteration complexity can be high. This paper introduces an entirely new class of stochastic approximationalgorithms based on matrix momentum. For a special choice of the matrixmomentum and gain sequences, it is found in simulations that the parameterestimates obtained from the algorithm couple with those obtained from the morecomplex stochastic Newton-Raphson algorithm. Conditions under which coupling isguaranteed are established for a class of linear recursions. Optimal finite-$n$error bounds are also obtained. The main objective of this work is to create more efficient algorithms forapplications to reinforcement learning. Numerical results illustrate the valueof these techniques in this setting.

Quick Read (beta)

loading the full paper ...