Abstract
Deep reinforcement learning (deep RL) has achieved superior performance incomplex sequential tasks by using a deep neural network as its functionapproximator and by learning directly from raw images. A drawback of using rawimages is that deep RL must learn the state feature representation from the rawimages in addition to learning a policy. As a result, deep RL can require aprohibitively large amount of training time and data to reach reasonableperformance, making it difficult to use deep RL in real-world applications,especially when data is expensive. In this work, we speed up training byaddressing half of what deep RL is trying to solve --- learning features. Ourapproach is to learn some of the important features by pre-training deep RLnetwork's hidden layers via supervised learning using a small set of humandemonstrations. We empirically evaluate our approach using deep Q-network (DQN)and asynchronous advantage actor-critic (A3C) algorithms on the Atari 2600games of Pong, Freeway, and Beamrider. Our results show that: 1) pre-trainingwith human demonstrations in a supervised learning manner is better atdiscovering features relative to pre-training naively in DQN, and 2)initializing a deep RL network with a pre-trained model provides a significantimprovement in training time even when pre-training from a small number ofhuman demonstrations.