MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

Abstract

Exploration in reinforcement learning is a challenging problem: in the worstcase, the agent must search for reward states that could be hidden anywhere inthe state space. Can we define a more tractable class of RL problems, where theagent is provided with examples of successful outcomes? In this problemsetting, the reward function can be obtained automatically by training aclassifier to categorize states as successful or not. If trained properly, sucha classifier can not only afford a reward function, but actually provide awell-shaped objective landscape that both promotes progress toward good statesand provides a calibrated exploration bonus. In this work, we we show that anuncertainty aware classifier can solve challenging reinforcement learningproblems by both encouraging exploration and provided directed guidance towardspositive outcomes. We propose a novel mechanism for obtaining these calibrated,uncertainty-aware classifiers based on an amortized technique for computing thenormalized maximum likelihood (NML) distribution, also showing how thesetechniques can be made computationally tractable by leveraging tools frommeta-learning. We show that the resulting algorithm has a number of intriguingconnections to both count-based exploration methods and prior algorithms forlearning reward functions, while also providing more effective guidance towardsthe goal. We demonstrate that our algorithm solves a number of challengingnavigation and robotic manipulation tasks which prove difficult or impossiblefor prior methods.

Quick Read (beta)

loading the full paper ...