Abstract
The goal of meta-reinforcement learning (meta-RL) is to build agents that canquickly learn new tasks by leveraging prior experience on related tasks.Learning a new task often requires both exploring to gather task-relevantinformation and exploiting this information to solve the task. In principle,optimal exploration and exploitation can be learned end-to-end by simplymaximizing task performance. However, such meta-RL approaches struggle withlocal optima due to a chicken-and-egg problem: learning to explore requiresgood exploitation to gauge the exploration's utility, but learning to exploitrequires information gathered via exploration. Optimizing separate objectivesfor exploration and exploitation can avoid this problem, but prior meta-RLexploration objectives yield suboptimal policies that gather informationirrelevant to the task. We alleviate both concerns by constructing anexploitation objective that automatically identifies task-relevant informationand an exploration objective to recover only this information. This avoidslocal optima in end-to-end training, without sacrificing optimal exploration.Empirically, DREAM substantially outperforms existing approaches on complexmeta-RL problems, such as sparse-reward 3D visual navigation. Videos of DREAM:https://ezliu.github.io/dream/