Three Dogmas of Reinforcement Learning

Abstract

Modern reinforcement learning has been conditioned by at least three dogmas.The first is the environment spotlight, which refers to our tendency to focuson modeling environments rather than agents. The second is our treatment oflearning as finding the solution to a task, rather than adaptation. The thirdis the reward hypothesis, which states that all goals and purposes can be wellthought of as maximization of a reward signal. These three dogmas shape much ofwhat we think of as the science of reinforcement learning. While each of thedogmas have played an important role in developing the field, it is time webring them to the surface and reflect on whether they belong as basicingredients of our scientific paradigm. In order to realize the potential ofreinforcement learning as a canonical frame for researching intelligent agents,we suggest that it is time we shed dogmas one and two entirely, and embrace anuanced approach to the third.

Quick Read (beta)

loading the full paper ...