Self-Imitation Learning

Abstract

This paper proposes Self-Imitation Learning (SIL), a simple off-policyactor-critic algorithm that learns to reproduce the agent's past gooddecisions. This algorithm is designed to verify our hypothesis that exploitingpast good experiences can indirectly drive deep exploration. Our empiricalresults show that SIL significantly improves advantage actor-critic (A2C) onseveral hard exploration Atari games and is competitive to the state-of-the-artcount-based exploration methods. We also show that SIL improves proximal policyoptimization (PPO) on MuJoCo tasks.

Quick Read (beta)

loading the full paper ...