Abstract
Algorithms and approaches for continual reinforcement learning have gainedincreasing attention. Much of this early progress rests on the foundations andstandard practices of traditional reinforcement learning, without questioningif they are well-suited to the challenges of continual learning agents. Wesuggest that many core foundations of traditional RL are, in fact, antitheticalto the goals of continual reinforcement learning. We enumerate four suchfoundations: the Markov decision process formalism, a focus on optimalpolicies, the expected sum of rewards as the primary evaluation metric, andepisodic benchmark environments that embrace the other three foundations.Shedding such sacredly held and taught concepts is not easy. They areself-reinforcing in that each foundation depends upon and holds up the others,making it hard to rethink each in isolation. We propose an alternative set ofall four foundations that are better suited to the continual learning setting.We hope to spur on others in rethinking the traditional foundations, proposingand critiquing alternatives, and developing new algorithms and approachesenabled by better-suited foundations.