A Framework for Reinforcement Learning and Planning

  • 2020-06-26 14:30:41
  • Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker
Sequential decision making, commonly formalized as Markov Decision Processoptimization, is a key challenge in artificial intelligence. Two successfulapproaches to MDP optimization are planning and reinforcement learning. Bothresearch fields largely have their own research communities. However, if bothresearch fields solve the same problem, then we should be able to disentanglethe common factors in their solution approaches. Therefore, this paper presentsa unifying framework for reinforcement learning and planning (FRAP), whichidentifies the underlying dimensions on which any planning or learningalgorithm has to decide. At the end of the paper, we compare - in a singletable - a variety of well-known planning, model-free and model-based RLalgorithms along the dimensions of our framework, illustrating the validity ofthe framework. Altogether, FRAP provides deeper insight into the algorithmicspace of planning and reinforcement learning, and also suggests new approachesto integration of both fields.


