GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

Abstract

In continuous action domains, standard deep reinforcement learning algorithmslike DDPG suffer from inefficient exploration when facing sparse or deceptivereward problems. Conversely, evolutionary and developmental methods focusing onexploration like novelty search, quality-diversity or goal explorationprocesses are less sample efficient during exploitation. In this paper, wepresent the GEP-PG approach, taking the best of both worlds by sequentiallycombining two variants of a goal exploration process and two variants of DDPG.We study the learning performance of these components and their combination ona low dimensional deceptive reward problem and on the larger Half-Cheetahbenchmark. Among other things, we show that DDPG fails on the former and thatGEP-PG obtains performance above the state-of-the-art on the latter.

Quick Read (beta)

loading the full paper ...