Large-Scale Study of Curiosity-Driven Learning

Abstract

Reinforcement learning algorithms rely on carefully engineering environmentrewards that are extrinsic to the agent. However, annotating each environmentwith hand-designed, dense rewards is not scalable, motivating the need fordeveloping reward functions that are intrinsic to the agent. Curiosity is atype of intrinsic reward function which uses prediction error as reward signal.In this paper: (a) We perform the first large-scale study of purelycuriosity-driven learning, i.e. without any extrinsic rewards, across 54standard benchmark environments, including the Atari game suite. Our resultsshow surprisingly good performance, and a high degree of alignment between theintrinsic curiosity objective and the hand-designed extrinsic rewards of manygame environments. (b) We investigate the effect of using different featurespaces for computing prediction error and show that random features aresufficient for many popular RL game benchmarks, but learned features appear togeneralize better (e.g. to novel game levels in Super Mario Bros.). (c) Wedemonstrate limitations of the prediction-based rewards in stochastic setups.Game-play videos and code are athttps://pathak22.github.io/large-scale-curiosity/

Quick Read (beta)

loading the full paper ...