Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Abstract

Model-based reinforcement learning (RL) algorithms can attain excellentsample efficiency, but often lag behind the best model-free algorithms in termsof asymptotic performance. This is especially true with high-capacityparametric function approximators, such as deep networks. In this paper, westudy how to bridge this gap, by employing uncertainty-aware dynamics models.We propose a new algorithm called probabilistic ensembles with trajectorysampling (PETS) that combines uncertainty-aware deep network dynamics modelswith sampling-based uncertainty propagation. Our comparison to state-of-the-artmodel-based and model-free deep RL algorithms shows that our approach matchesthe asymptotic performance of model-free algorithms on several challengingbenchmark tasks, while requiring significantly fewer samples (e.g., 8 and 125times fewer samples than Soft Actor Critic and Proximal Policy Optimizationrespectively on the half-cheetah task).

Quick Read (beta)

loading the full paper ...