Abstract
Meta reinforcement learning (meta-RL) extracts knowledge from previous tasksand achieves fast adaptation to new tasks. Despite recent progress, efficientexploration in meta-RL remains a key challenge in sparse-reward tasks, as itrequires quickly finding informative task-relevant experiences in bothmeta-training and adaptation. To address this challenge, we explicitly model anexploration policy learning problem for meta-RL, which is separated fromexploitation policy learning, and introduce a novel empowerment-drivenexploration objective, which aims to maximize information gain for taskidentification. We derive a corresponding intrinsic reward and develop a newoff-policy meta-RL framework, which efficiently learns separate context-awareexploration and exploitation policies by sharing the knowledge of taskinference. Experimental evaluation shows that our meta-RL method significantlyoutperforms state-of-the-art baselines on various sparse-reward MuJoColocomotion tasks and more complex sparse-reward Meta-World tasks.