Meta-Reinforcement Learning of Structured Exploration Strategies

Abstract

Exploration is a fundamental challenge in reinforcement learning (RL). Manyof the current exploration methods for deep RL use task-agnostic objectives,such as information gain or bonuses based on state visitation. However, manypractical applications of RL involve learning more than a single task, andprior tasks can be used to inform how exploration should be performed in newtasks. In this work, we explore how prior tasks can inform an agent about howto explore effectively in new situations. We introduce a novel gradient-basedfast adaptation algorithm -- model agnostic exploration with structured noise(MAESN) -- to learn exploration strategies from prior experience. The priorexperience is used both to initialize a policy and to acquire a latentexploration space that can inject structured stochasticity into a policy,producing exploration strategies that are informed by prior knowledge and aremore effective than random action-space noise. We show that MAESN is moreeffective at learning exploration strategies when compared to prior meta-RLmethods, RL without learned exploration strategies, and task-agnosticexploration methods. We evaluate our method on a variety of simulated tasks:locomotion with a wheeled robot, locomotion with a quadrupedal walker, andobject manipulation.

Quick Read (beta)

loading the full paper ...