Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Abstract

Model-free deep reinforcement learning (RL) algorithms have been widely usedfor a range of complex control tasks. However, slow convergence and sampleinefficiency remain challenging problems in RL, especially when handlingcontinuous and high-dimensional state spaces. To tackle this problem, wepropose a general acceleration method for model-free, off-policy deep RLalgorithms by drawing the idea underlying regularized Anderson acceleration(RAA), which is an effective approach to accelerating the solving of fixedpoint problems with perturbations. Specifically, we first explain how policyiteration can be applied directly with Anderson acceleration. Then we extendRAA to the case of deep RL by introducing a regularization term to control theimpact of perturbation induced by function approximation errors. We furtherpropose two strategies, i.e., progressive update and adaptive restart, toenhance the performance. The effectiveness of our method is evaluated on avariety of benchmark tasks, including Atari 2600 and MuJoCo. Experimentalresults show that our approach substantially improves both the learning speedand final performance of state-of-the-art deep RL algorithms.

Quick Read (beta)

loading the full paper ...