Go-Explore: a New Approach for Hard-Exploration Problems

Abstract

A grand challenge in reinforcement learning is intelligent exploration,especially when rewards are sparse or deceptive. Two Atari games serve asbenchmarks for such hard-exploration domains: Montezuma's Revenge and Pitfall.On both games, current RL algorithms perform poorly, even those with intrinsicmotivation, which is the dominant method to improve performance onhard-exploration domains. To address this shortfall, we introduce a newalgorithm called Go-Explore. It exploits the following principles: (1) rememberpreviously visited states, (2) first return to a promising state (withoutexploration), then explore from it, and (3) solve simulated environmentsthrough any available means (including by introducing determinism), thenrobustify via imitation learning. The combined effect of these principles is adramatic performance improvement on hard-exploration problems. On Montezuma'sRevenge, Go-Explore scores a mean of over 43k points, almost 4 times theprevious state of the art. Go-Explore can also harness human-provided domainknowledge and, when augmented with it, scores a mean of over 650k points onMontezuma's Revenge. Its max performance of nearly 18 million surpasses thehuman world record, meeting even the strictest definition of "superhuman"performance. On Pitfall, Go-Explore with domain knowledge is the firstalgorithm to score above zero. Its mean score of almost 60k points exceedsexpert human performance. Because Go-Explore produces high-performingdemonstrations automatically and cheaply, it also outperforms imitationlearning work where humans provide solution demonstrations. Go-Explore opens upmany new research directions into improving it and weaving its insights intocurrent RL algorithms. It may also enable progress on previously unsolvablehard-exploration problems in many domains, especially those that harness asimulator during training (e.g. robotics).

Quick Read (beta)

loading the full paper ...