Temporal Abstraction in Reinforcement Learning with Offline Data

Abstract

Standard reinforcement learning algorithms with a single policy performpoorly on tasks in complex environments involving sparse rewards, diversebehaviors, or long-term planning. This led to the study of algorithms thatincorporate temporal abstraction by training a hierarchy of policies that planover different time scales. The options framework has been introduced toimplement such temporal abstraction by learning low-level options that act asextended actions controlled by a high-level policy. The main challenge inapplying these algorithms to real-world problems is that they suffer from highsample complexity to train multiple levels of the hierarchy, which isimpossible in online settings. Motivated by this, in this paper, we propose anoffline hierarchical RL method that can learn options from existing offlinedatasets collected by other unknown agents. This is a very challenging problemdue to the distribution mismatch between the learned options and the policiesresponsible for the offline dataset and to our knowledge, this is the firstwork in this direction. In this work, we propose a framework by which an onlinehierarchical reinforcement learning algorithm can be trained on an offlinedataset of transitions collected by an unknown behavior policy. We validate ourmethod on Gym MuJoCo locomotion environments and robotic gripper block-stackingtasks in the standard as well as transfer and goal-conditioned settings.

Quick Read (beta)

loading the full paper ...