A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

Abstract

In this paper we consider the problem of how a reinforcement learning agentthat is tasked with solving a sequence of reinforcement learning problems (asequence of Markov decision processes) can use knowledge acquired early in itslifetime to improve its ability to solve new problems. We argue that previousexperience with similar problems can provide an agent with information abouthow it should explore when facing a new but related problem. We show that thesearch for an optimal exploration strategy can be formulated as a reinforcementlearning problem itself and demonstrate that such strategy can leveragepatterns found in the structure of related problems. We conclude withexperiments that show the benefits of optimizing an exploration strategy usingour proposed approach.

Quick Read (beta)

loading the full paper ...