Abstract
Meta-Reinforcement Learning (Meta-RL) enables fast adaptation to new testingtasks. Despite recent advancements, it is still challenging to learn performantpolicies across multiple complex and high-dimensional tasks. To address this,we propose a novel architecture with three hierarchical levels for 1) learningtask representations, 2) discovering task-agnostic macro-actions in anautomated manner, and 3) learning primitive actions. The macro-action can guidethe low-level primitive policy learning to more efficiently transition to goalstates. This can address the issue that the policy may forget previouslylearned behavior while learning new, conflicting tasks. Moreover, thetask-agnostic nature of the macro-actions is enabled by removing task-specificcomponents from the state space. Hence, this makes them amenable tore-composition across different tasks and leads to promising fast adaptation tonew tasks. Also, the prospective instability from the tri-level hierarchies iseffectively mitigated by our innovative, independently tailored trainingschemes. Experiments in the MetaWorld framework demonstrate the improved sampleefficiency and success rate of our approach compared to previousstate-of-the-art methods.