Discounted Reinforcement Learning Is Not an Optimization Problem

Abstract

Discounted reinforcement learning is fundamentally incompatible with functionapproximation for control in continuing tasks. It is not an optimizationproblem in its usual formulation, so when using function approximation there isno optimal policy. We substantiate these claims, then go on to address somemisconceptions about discounting and its connection to the average rewardformulation. We encourage researchers to adopt rigorous optimizationapproaches, such as maximizing average reward, for reinforcement learning incontinuing tasks.

Quick Read (beta)

loading the full paper ...