Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

Abstract

The vast majority of Reinforcement Learning methods is largely impacted bythe computation effort and data requirements needed to obtain effectiveestimates of action-value functions, which in turn determine the quality of theoverall performance and the sample-efficiency of the learning procedure.Typically, action-value functions are estimated through an iterative schemethat alternates the application of an empirical approximation of the Bellmanoperator and a subsequent projection step onto a considered function space. Ithas been observed that this scheme can be potentially generalized to carry outmultiple iterations of the Bellman operator at once, benefiting the underlyinglearning algorithm. However, till now, it has been challenging to effectivelyimplement this idea, especially in high-dimensional problems. In this paper, weintroduce iterated $Q$-Network (i-QN), a novel principled approach that enablesmultiple consecutive Bellman updates by learning a tailored sequence ofaction-value functions where each serves as the target for the next. We showthat i-QN is theoretically grounded and that it can be seamlessly used invalue-based and actor-critic methods. We empirically demonstrate the advantagesof i-QN in Atari $2600$ games and MuJoCo continuous control problems.

Quick Read (beta)

loading the full paper ...