Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning

Abstract

Deep Reinforcement Learning (DRL) algorithms are known to be datainefficient. One reason is that a DRL agent learns both the feature and thepolicy tabula rasa. Integrating prior knowledge into DRL algorithms is one wayto improve learning efficiency since it helps to build helpful representations.In this work, we consider incorporating human knowledge to accelerate theasynchronous advantage actor-critic (A3C) algorithm by pre-training a smallamount of non-expert human demonstrations. We leverage the supervisedautoencoder framework and propose a novel pre-training strategy that jointlytrains a weighted supervised classification loss, an unsupervisedreconstruction loss, and an expected return loss. The resulting pre-trainedmodel learns more useful features compared to independently training insupervised or unsupervised fashion. Our pre-training method drasticallyimproved the learning performance of the A3C agent in Atari games of Pong andMsPacman, exceeding the performance of the state-of-the-art algorithms at amuch smaller number of game interactions. Our method is light-weight and easyto implement in a single machine. For reproducibility, our code is available atgithub.com/gabrieledcjr/DeepRL/tree/A3C-ALA2019

Quick Read (beta)

loading the full paper ...