Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Abstract

In many environments only a tiny subset of all states yield high reward. Inthese cases, few of the interactions with the environment provide a relevantlearning signal. Hence, we may want to preferentially train on thosehigh-reward states and the probable trajectories leading to them. To this end,we advocate for the use of a backtracking model that predicts the precedingstates that terminate at a given high-reward state. We can train a model which,starting from a high value state (or one that is estimated to have high value),predicts and sample for which the (state, action)-tuples may have led to thathigh value state. These traces of (state, action) pairs, which we refer to asRecall Traces, sampled from this backtracking model starting from a high valuestate, are informative as they terminate in good states, and hence we can usethese traces to improve a policy. We provide a variational interpretation forthis idea and a practical algorithm in which the backtracking model samplesfrom an approximate posterior distribution over trajectories which lead tolarge rewards. Our method improves the sample efficiency of both on- andoff-policy RL algorithms across several environments and tasks.

Quick Read (beta)

loading the full paper ...