Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network

Abstract

The deep Q-network (DQN) and return-based reinforcement learning are twopromising algorithms proposed in recent years. DQN brings advances to complexsequential decision problems, while return-based algorithms have advantages inmaking use of sample trajectories. In this paper, we propose a generalframework to combine DQN and most of the return-based reinforcement learningalgorithms, named R-DQN. We show the performance of traditional DQN can beimproved effectively by introducing return-based reinforcement learning. Inorder to further improve the R-DQN, we design a strategy with two measurementswhich can qualitatively measure the policy discrepancy. Moreover, we give thetwo measurements' bounds in the proposed R-DQN framework. We show thatalgorithms with our strategy can accurately express the trace coefficient andachieve a better approximation to return. The experiments, conducted on severalrepresentative tasks from the OpenAI Gym library, validate the effectiveness ofthe proposed measurements. The results also show that the algorithms with ourstrategy outperform the state-of-the-art methods.

Quick Read (beta)

loading the full paper ...