Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

  • 2018-07-04 16:51:56
  • Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee
  • 13

Abstract

Integrating model-free and model-based approaches in reinforcement learninghas the potential to achieve the high performance of model-free algorithms withlow sample complexity. However, this is difficult because an imperfect dynamicsmodel can degrade the performance of the learning algorithm, and insufficiently complex environments, the dynamics model will almost always beimperfect. As a result, a key challenge is to combine model-based approacheswith model-free learning in such a way that errors in the model do not degradeperformance. We propose stochastic ensemble value expansion (STEVE), a novelmodel-based technique that addresses this issue. By dynamically interpolatingbetween model rollouts of various horizon lengths for each individual example,STEVE ensures that the model is only utilized when doing so does not introducesignificant errors. Our approach outperforms model-free baselines onchallenging continuous control benchmarks with an order-of-magnitude increasein sample efficiency, and in contrast to previous model-based approaches,performance does not degrade in complex environments.

 

Quick Read (beta)

loading the full paper ...