Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

Abstract

A common strategy to deal with the expensive reinforcement learning (RL) ofcomplex tasks is to decompose them into a collection of subtasks that areusually simpler to learn as well as reusable for new problems. However, when arobot learns the policies for these subtasks, common approaches treat everypolicy learning process separately. Therefore, all these individual(composable) policies need to be learned before tackling the learning processof the complex task through policies composition. Moreover, such composition ofindividual policies is usually performed sequentially, which is not suitablefor tasks that require to perform the subtasks concurrently. In this paper, wepropose to combine a set of composable Gaussian policies corresponding to thesesubtasks using a set of activation vectors, resulting in a complex Gaussianpolicy that is a function of the means and covariances matrices of thecomposable policies. Moreover, we propose an algorithm for learning bothcompound and composable policies within the same learning process by exploitingthe off-policy data generated from the compound policy. The algorithm is builton a maximum entropy RL approach to favor exploration during the learningprocess. The results of the experiments show that the experience collected withthe compound policy permits not only to solve the complex task but also toobtain useful composable policies that successfully perform in theircorresponding subtasks.

Quick Read (beta)

loading the full paper ...