Deep Q-Learning with Gradient Target Tracking

Abstract

This paper introduces Q-learning with gradient target tracking, a novelreinforcement learning framework that provides a learned continuous targetupdate mechanism as an alternative to the conventional hard update paradigm. Inthe standard deep Q-network (DQN), the target network is a copy of the onlinenetwork's weights, held fixed for a number of iterations before beingperiodically replaced via a hard update. While this stabilizes training byproviding consistent targets, it introduces a new challenge: the hard updateperiod must be carefully tuned to achieve optimal performance. To address thisissue, we propose two gradient-based target update methods: DQN with asymmetricgradient target tracking (AGT2-DQN) and DQN with symmetric gradient targettracking (SGT2-DQN). These methods replace the conventional hard target updateswith continuous and structured updates using gradient descent, whicheffectively eliminates the need for manual tuning. We provide a theoreticalanalysis proving the convergence of these methods in tabular settings.Additionally, empirical evaluations demonstrate their advantages over standardDQN baselines, which suggest that gradient-based target updates can serve as aneffective alternative to conventional target update mechanisms in Q-learning.

Quick Read (beta)

loading the full paper ...