Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning

  • 2020-02-15 23:46:29
  • Yannick Schroecker, Charles Isbell
  • 1

Abstract

This work considers two distinct settings: imitation learning andgoal-conditioned reinforcement learning. In either case, effective solutionsrequire the agent to reliably reach a specified state (a goal), or set ofstates (a demonstration). Drawing a connection between probabilistic long-termdynamics and the desired value function, this work introduces an approach whichutilizes recent advances in density estimation to effectively learn to reach agiven state. As our first contribution, we use this approach forgoal-conditioned reinforcement learning and show that it is both efficient anddoes not suffer from hindsight bias in stochastic domains. As our secondcontribution, we extend the approach to imitation learning and show that itachieves state-of-the art demonstration sample-efficiency on standard benchmarktasks.

 

Quick Read (beta)

loading the full paper ...