Continuous Control with Action Quantization from Demonstrations

Abstract

In Reinforcement Learning (RL), discrete actions, as opposed to continuousactions, result in less complex exploration problems and the immediatecomputation of the maximum of the action-value function which is central todynamic programming-based methods. In this paper, we propose a novel method:Action Quantization from Demonstrations (AQuaDem) to learn a discretization ofcontinuous action spaces by leveraging the priors of demonstrations. Thisdramatically reduces the exploration problem, since the actions faced by theagent not only are in a finite number but also are plausible in light of thedemonstrator's behavior. By discretizing the action space we can apply anydiscrete action deep RL algorithm to the continuous control problem. Weevaluate the proposed method on three different setups: RL with demonstrations,RL with play data --demonstrations of a human playing in an environment but notsolving any specific task-- and Imitation Learning. For all three setups, weonly consider human data, which is more challenging than synthetic data. Wefound that AQuaDem consistently outperforms state-of-the-art continuous controlmethods, both in terms of performance and sample efficiency. We providevisualizations and videos in the paper's website:https://google-research.github.io/aquadem.

Quick Read (beta)

loading the full paper ...