A Unifying View of Optimism in Episodic Reinforcement Learning

  • 2020-07-03 18:10:30
  • Gergely Neu, Ciara Pike-Burke
  • 0

Abstract

The principle of optimism in the face of uncertainty underpins manytheoretically successful reinforcement learning algorithms. In this paper weprovide a general framework for designing, analyzing and implementing suchalgorithms in the episodic reinforcement learning problem. This framework isbuilt upon Lagrangian duality, and demonstrates that every model-optimisticalgorithm that constructs an optimistic MDP has an equivalent representation asa value-optimistic dynamic programming algorithm. Typically, it was thoughtthat these two classes of algorithms were distinct, with model-optimisticalgorithms benefiting from a cleaner probabilistic analysis whilevalue-optimistic algorithms are easier to implement and thus more practical.With the framework developed in this paper, we show that it is possible to getthe best of both worlds by providing a class of algorithms which have acomputationally efficient dynamic-programming implementation and also a simpleprobabilistic analysis. Besides being able to capture many existing algorithmsin the tabular setting, our framework can also address largescale problemsunder realizable function approximation, where it enables a simple model-basedanalysis of some recently proposed methods.

 

Quick Read (beta)

loading the full paper ...