Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations

Abstract

Currently, deep reinforcement learning (RL) shows impressive results incomplex gaming and robotic environments. Often these results are achieved atthe expense of huge computational costs and require an incredible number ofepisodes of interaction between the agent and the environment. There are twomain approaches to improving the sample efficiency of reinforcement learningmethods - using hierarchical methods and expert demonstrations. In this paper,we propose a combination of these approaches that allow the agent to uselow-quality demonstrations in complex vision-based environments with multiplerelated goals. Our forgetful experience replay (ForgER) algorithm effectivelyhandles errors in expert data and reduces quality losses when adapting theaction space and states representation to the agent's capabilities. Ourproposed goal-oriented structuring of replay buffer allows the agent toautomatically highlight sub-goals for solving complex hierarchical tasks indemonstrations. Our method is universal and can be integrated into variousoff-policy methods. It surpasses all known existing state-of-the-art RL methodsusing expert demonstrations on various model environments. The solution basedon our algorithm beats all the solutions for the famous MineRL competition andallows the agent to mine a diamond in the Minecraft environment.

Quick Read (beta)

loading the full paper ...