BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Abstract

The field of Deep Reinforcement Learning (DRL) has recently seen a surge inresearch in batch reinforcement learning, which aims for sample-efficientlearning from a given data set without additional interactions with theenvironment. In the batch DRL setting, commonly employed off-policy DRLalgorithms can perform poorly and sometimes even fail to learn altogether. Inthis paper, we propose a new algorithm, Best-Action Imitation Learning (BAIL),which unlike many off-policy DRL algorithms does not involve maximizing Qfunctions over the action space. Striving for simplicity as well asperformance, BAIL first selects from the batch the actions it believes to behigh-performing actions for their corresponding states; it then uses thosestate-action pairs to train a policy network using imitation learning. AlthoughBAIL is simple, we demonstrate that BAIL achieves state of the art performanceon the Mujoco benchmark.

Quick Read (beta)

loading the full paper ...