Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning

Abstract

Substantial advancements to model-based reinforcement learning algorithmshave been impeded by the model-bias induced by the collected data, whichgenerally hurts performance. Meanwhile, their inherent sample efficiencywarrants utility for most robot applications, limiting potential damage to therobot and its environment during training. Inspired by information theoreticmodel predictive control and advances in deep reinforcement learning, weintroduce Model Predictive Actor-Critic (MoPAC), a hybridmodel-based/model-free method that combines model predictive rollouts withpolicy optimization as to mitigate model bias. MoPAC leverages optimaltrajectories to guide policy learning, but explores via its model-free method,allowing the algorithm to learn more expressive dynamics models. Thiscombination guarantees optimal skill learning up to an approximation error andreduces necessary physical interaction with the environment, making it suitablefor real-robot training. We provide extensive results showcasing how ourproposed method generally outperforms current state-of-the-art and conclude byevaluating MoPAC for learning on a physical robotic hand performing valverotation and finger gaiting--a task that requires grasping, manipulation, andthen regrasping of an object.

Quick Read (beta)

loading the full paper ...