Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

  • 2018-05-20 20:03:59
  • Sergey Levine
  • 0

Abstract

The framework of reinforcement learning or optimal control provides amathematical formalization of intelligent decision making that is powerful andbroadly applicable. While the general form of the reinforcement learningproblem enables effective reasoning about uncertainty, the connection betweenreinforcement learning and inference in probabilistic models is not immediatelyobvious. However, such a connection has considerable value when it comes toalgorithm design: formalizing a problem as probabilistic inference in principleallows us to bring to bear a wide array of approximate inference tools, extendthe model in flexible and powerful ways, and reason about compositionality andpartial observability. In this article, we will discuss how a generalization ofthe reinforcement learning or optimal control problem, which is sometimestermed maximum entropy reinforcement learning, is equivalent to exactprobabilistic inference in the case of deterministic dynamics, and variationalinference in the case of stochastic dynamics. We will present a detailedderivation of this framework, overview prior work that has drawn on this andrelated ideas to propose new reinforcement learning and control algorithms, anddescribe perspectives on future research.

 

Quick Read (beta)

loading the full paper ...