Abstract
Exploration in reinforcement learning is a challenging problem: in the worstcase, the agent must search for high-reward states that could be hiddenanywhere in the state space. Can we define a more tractable class of RLproblems, where the agent is provided with examples of successful outcomes? Inthis problem setting, the reward function can be obtained automatically bytraining a classifier to categorize states as successful or not. If trainedproperly, such a classifier can provide a well-shaped objective landscape thatboth promotes progress toward good states and provides a calibrated explorationbonus. In this work, we show that an uncertainty aware classifier can solvechallenging reinforcement learning problems by both encouraging exploration andprovided directed guidance towards positive outcomes. We propose a novelmechanism for obtaining these calibrated, uncertainty-aware classifiers basedon an amortized technique for computing the normalized maximum likelihood (NML)distribution. To make this tractable, we propose a novel method for computingthe NML distribution by using meta-learning. We show that the resultingalgorithm has a number of intriguing connections to both count-basedexploration methods and prior algorithms for learning reward functions, whilealso providing more effective guidance towards the goal. We demonstrate thatour algorithm solves a number of challenging navigation and roboticmanipulation tasks which prove difficult or impossible for prior methods.