Discounted Reinforcement Learning is Not an Optimization Problem

Abstract

Discounted reinforcement learning is fundamentally incompatible with functionapproximation for control in continuing tasks. This is because it is not anoptimization problem --- it lacks an objective function. After substantiatingthese claims, we go on to address some misconceptions about discounting and itsconnection to the average reward formulation. We encourage researchers to adoptrigorous optimization approaches for reinforcement learning in continuingtasks, such as average reward.

Quick Read (beta)

loading the full paper ...