Abstract
Self-supervised representation learning has achieved remarkable success inrecent years. By subverting the need for supervised labels, such approaches areable to utilize the numerous unlabeled images that exist on the Internet and inphotographic datasets. Yet to build truly intelligent agents, we must constructrepresentation learning algorithms that can learn not only from datasets butalso learn from environments. An agent in a natural environment will nottypically be fed curated data. Instead, it must explore its environment toacquire the data it will learn from. We propose a framework, curiousrepresentation learning (CRL), which jointly learns a reinforcement learningpolicy and a visual representation model. The policy is trained to maximize theerror of the representation learner, and in doing so is incentivized to exploreits environment. At the same time, the learned representation becomes strongerand stronger as the policy feeds it ever harder data to learn from. Our learnedrepresentations enable promising transfer to downstream navigation tasks,performing better than or comparably to ImageNet pretraining without using anysupervision at all. In addition, despite being trained in simulation, ourlearned representations can obtain interpretable results on real images.