Outcome-Driven Reinforcement Learning via Variational Inference

  • 2021-04-20 18:16:21
  • Tim G. J. Rudner, Vitchyr H. Pong, Rowan McAllister, Yarin Gal, Sergey Levine
  • 5

Abstract

While reinforcement learning algorithms provide automated acquisition ofoptimal policies, practical application of such methods requires a number ofdesign decisions, such as manually designing reward functions that not onlydefine the task, but also provide sufficient shaping to accomplish it. In thispaper, we discuss a new perspective on reinforcement learning, recasting it asthe problem of inferring actions that achieve desired outcomes, rather than aproblem of maximizing rewards. To solve the resulting outcome-directedinference problem, we establish a novel variational inference formulation thatallows us to derive a well-shaped reward function which can be learned directlyfrom environment interactions. From the corresponding variational objective, wealso derive a new probabilistic Bellman backup operator reminiscent of thestandard Bellman backup operator and use it to develop an off-policy algorithmto solve goal-directed tasks. We empirically demonstrate that this methodeliminates the need to design reward functions and leads to effectivegoal-directed behaviors.

 

Quick Read (beta)

loading the full paper ...