PLAS: Latent Action Space for Offline Reinforcement Learning

Abstract

The goal of offline reinforcement learning is to learn a policy from a fixeddataset, without further interactions with the environment. This setting willbe an increasingly more important paradigm for real-world applications ofreinforcement learning such as robotics, in which data collection is slow andpotentially dangerous. Existing off-policy algorithms have limited performanceon static datasets due to extrapolation errors from out-of-distributionactions. This leads to the challenge of constraining the policy to selectactions within the support of the dataset during training. We propose to simplylearn the Policy in the Latent Action Space (PLAS) such that this requirementis naturally satisfied. We evaluate our method on continuous control benchmarksin simulation and a deformable object manipulation task with a physical robot.We demonstrate that our method provides competitive performance consistentlyacross various continuous control tasks and different types of datasets,outperforming existing offline reinforcement learning methods with explicitconstraints. Videos and code are available athttps://sites.google.com/view/latent-policy.

Quick Read (beta)

loading the full paper ...