Hindsight policy gradients

Abstract

A reinforcement learning agent that needs to pursue different goals acrossepisodes requires a goal-conditional policy. In addition to their potential togeneralize desirable behavior to unseen goals, such policies may also enablehigher-level planning based on subgoals. In sparse-reward environments, thecapacity to exploit information about the degree to which an arbitrary goal hasbeen achieved while another goal was intended appears crucial to enable sampleefficient learning. However, reinforcement learning agents have only recentlybeen endowed with such capacity for hindsight. In this paper, we demonstratehow hindsight can be introduced to policy gradient methods, generalizing thisidea to a broad class of successful algorithms. Our experiments on a diverseselection of sparse-reward environments show that hindsight leads to aremarkable increase in sample efficiency.

Quick Read (beta)

loading the full paper ...