Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

Abstract

Pretraining with expert demonstrations have been found useful in speeding upthe training process of deep reinforcement learning algorithms since lessonline simulation data is required. Some people use supervised learning tospeed up the process of feature learning, others pretrain the policies byimitating expert demonstrations. However, these methods are unstable and notsuitable for actor-critic reinforcement learning algorithms. Also, someexisting methods rely on the global optimum assumption, which is not true inmost scenarios. In this paper, we employ expert demonstrations in aactor-critic reinforcement learning framework, and meanwhile ensure that theperformance is not affected by the fact that expert demonstrations are notglobal optimal. We theoretically derive a method for computing policy gradientsand value estimators with only expert demonstrations. Our method istheoretically plausible for actor-critic reinforcement learning algorithms thatpretrains both policy and value functions. We apply our method to two of thetypical actor-critic reinforcement learning algorithms, DDPG and ACER, anddemonstrate with experiments that our method not only outperforms the RLalgorithms without pretraining process, but also is more simulation efficient.

Quick Read (beta)

loading the full paper ...