ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control

Abstract

We consider deep deterministic policy gradient (DDPG) in the context ofreinforcement learning with sparse rewards. To enhance exploration, weintroduce a search procedure, \emph{${\epsilon}{t}$-greedy}, which generatesexploratory options for exploring less-visited states. We prove that searchusing $\epsilon t$-greedy has polynomial sample complexity under mild MDPassumptions. To more efficiently use the information provided by rewardedtransitions, we develop a new dual experience replay buffer framework,\emph{GDRB}, and implement \emph{longest n-step returns}. The resultingalgorithm, \emph{ETGL-DDPG}, integrates all three techniques: \bm{$\epsilont$}-greedy, \textbf{G}DRB, and \textbf{L}ongest $n$-step, into DDPG. Weevaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperformsDDPG, as well as other state-of-the-art methods, across all testedsparse-reward continuous environments. Ablation studies further highlight howeach strategy individually enhances the performance of DDPG in this setting.

Quick Read (beta)

loading the full paper ...