Reinforcement Learning via Implicit Imitation Guidance

Abstract

We study the problem of sample efficient reinforcement learning, where priordata such as demonstrations are provided for initialization in lieu of a densereward signal. A natural approach is to incorporate an imitation learningobjective, either as regularization during training or to acquire a referencepolicy. However, imitation learning objectives can ultimately degrade long-termperformance, as it does not directly align with reward maximization. In thiswork, we propose to use prior data solely for guiding exploration via noiseadded to the policy, sidestepping the need for explicit behavior cloningconstraints. The key insight in our framework, Data-Guided Noise (DGN), is thatdemonstrations are most useful for identifying which actions should beexplored, rather than forcing the policy to take certain actions. Our approachachieves up to 2-3x improvement over prior reinforcement learning from offlinedata methods across seven simulated continuous control tasks.

Quick Read (beta)

loading the full paper ...