Prioritizing Starting States for Reinforcement Learning

Abstract

Online, off-policy reinforcement learning algorithms are able to use anexperience memory to remember and replay past experiences. In prior work, thisapproach was used to stabilize training by breaking the temporal correlationsof the updates and avoiding the rapid forgetting of possibly rare experiences.In this work, we propose a conceptually simple framework that uses anexperience memory to help exploration by prioritizing the starting states fromwhich the agent starts acting in the environment, importantly, in a fashionthat is also compatible with on-policy algorithms. Given the capacity torestart the agent in states corresponding to its past observations, we achievethis objective by (i) enabling the agent to restart in states belonging tosignificant past experiences (e.g., nearby goals), and (ii) promoting fastercoverage of the state space through starting from a more diverse set of states.While, using a good measure of priority to identify significant pasttransitions, we expect case (i) to more considerably help exploration incertain problems (e.g., sparse reward tasks), we hypothesize that case (ii)will generally be beneficial, even without any prioritization. We showempirically that our approach improves learning performance for both off-policyand on-policy deep reinforcement learning methods, with the most notableimprovement in a significantly sparse reward task.

Quick Read (beta)

loading the full paper ...