Lipschitz Lifelong Reinforcement Learning

  • 2020-01-15 16:29:30
  • Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu Jinnai, Emmanuel Rachelson, Michael L. Littman
  • 2

Abstract

We consider the problem of knowledge transfer when an agent is facing aseries of Reinforcement Learning (RL) tasks. We introduce a novel metricbetween Markov Decision Processes and establish that close MDPs have closeoptimal value functions. Formally, the optimal value functions are Lipschitzcontinuous with respect to the tasks space. These theoretical results lead usto a value transfer method for Lifelong RL, which we use to build a PAC-MDPalgorithm with improved convergence rate. We illustrate the benefits of themethod in Lifelong RL experiments.

 

Quick Read (beta)

loading the full paper ...