Compositional Transfer in Hierarchical Reinforcement Learning

Abstract

The successful application of general reinforcement learning algorithms toreal-world robotics applications is often limited by their high datarequirements. We introduce Regularized Hierarchical Policy Optimization (RHPO)to improve data-efficiency for domains with multiple dominant tasks andultimately reduce required platform time. To this end, we employ compositionalinductive biases on multiple levels and corresponding mechanisms for sharingoff-policy transition data across low-level controllers and tasks as well asscheduling of tasks. The presented algorithm enables stable and fast learningfor complex, real-world domains in the parallel multitask and sequentialtransfer case. We show that the investigated types of hierarchy enable positivetransfer while partially mitigating negative interference and evaluate thebenefits of additional incentives for efficient, compositional task solutionsin single task domains. Finally, we demonstrate substantial data-efficiency andfinal performance gains over competitive baselines in a week-long, physicalrobot stacking experiment.

Quick Read (beta)

loading the full paper ...