Latent Action Priors for Locomotion with Deep Reinforcement Learning

Abstract

Deep Reinforcement Learning (DRL) enables robots to learn complex behaviorsthrough interaction with the environment. However, due to the unrestrictednature of the learning algorithms, the resulting solutions are often brittleand appear unnatural. This is especially true for learning direct joint-leveltorque control, as inductive biases are difficult to integrate into thelearning process. We propose an inductive bias for learning locomotion that isespecially useful for torque control: latent actions learned from a smalldataset of expert demonstrations. This prior allows the policy to directlyleverage knowledge contained in the expert's actions and facilitates moreefficient exploration. We observe that the agent is not restricted to thereward levels of the demonstration, and performance in transfer tasks isimproved significantly. Latent action priors combined with style rewards forimitation lead to a closer replication of the expert's behavior. Videos andcode are available at https://sites.google.com/view/latent-action-priors.

Quick Read (beta)

loading the full paper ...