Accelerating Goal-Directed Reinforcement Learning by Model Characterization

  • 2019-01-04 19:04:37
  • Shoubhik Debnath, Gaurav Sukhatme, Lantao Liu
  • 7

Abstract

We propose a hybrid approach aimed at improving the sample efficiency ingoal-directed reinforcement learning. We do this via a two-step mechanism wherefirstly, we approximate a model from Model-Free reinforcement learning. Then,we leverage this approximate model along with a notion of reachability usingMean First Passage Times to perform Model-Based reinforcement learning. Builton such a novel observation, we design two new algorithms - Mean First PassageTime based Q-Learning (MFPT-Q) and Mean First Passage Time based DYNA(MFPT-DYNA), that have been fundamentally modified from the state-of-the-artreinforcement learning techniques. Preliminary results have shown that ourhybrid approaches converge with much fewer iterations than their correspondingstate-of-the-art counterparts and therefore requiring much fewer samples andmuch fewer training trials to converge.

 

Introduction (beta)

None

 

Conclusion (beta)

None