Abstract
The principle of optimism in the face of uncertainty underpins manytheoretically successful reinforcement learning algorithms. In this paper weprovide a general framework for designing, analyzing and implementing suchalgorithms in the episodic reinforcement learning problem. This framework isbuilt upon Lagrangian duality, and demonstrates that every model-optimisticalgorithm that constructs an optimistic MDP has an equivalent representation asa value-optimistic dynamic programming algorithm. Typically, it was thoughtthat these two classes of algorithms were distinct, with model-optimisticalgorithms benefiting from a cleaner probabilistic analysis whilevalue-optimistic algorithms are easier to implement and thus more practical.With the framework developed in this paper, we show that it is possible to getthe best of both worlds by providing a class of algorithms which have acomputationally efficient dynamic-programming implementation and also a simpleprobabilistic analysis. Besides being able to capture many existing algorithmsin the tabular setting, our framework can also address largescale problemsunder realizable function approximation, where it enables a simple model-basedanalysis of some recently proposed methods.