Successor-Predecessor Intrinsic Exploration

Abstract

Exploration is essential in reinforcement learning, particularly inenvironments where external rewards are sparse. Here we focus on explorationwith intrinsic rewards, where the agent transiently augments the externalrewards with self-generated intrinsic rewards. Although the study of intrinsicrewards has a long history, existing methods focus on composing the intrinsicreward based on measures of future prospects of states, ignoring theinformation contained in the retrospective structure of transition sequences.Here we argue that the agent can utilise retrospective information to generateexplorative behaviour with structure-awareness, facilitating efficientexploration based on global instead of local information. We proposeSuccessor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithmbased on a novel intrinsic reward combining prospective and retrospectiveinformation. We show that SPIE yields more efficient and ethologicallyplausible exploratory behaviour in environments with sparse rewards andbottleneck states than competing methods. We also implement SPIE in deepreinforcement learning agents, and show that the resulting agent achievesstronger empirical performance than existing methods on sparse-reward Atarigames.

Quick Read (beta)

loading the full paper ...