Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Abstract

We propose a general and model-free approach for Reinforcement Learning (RL)on real robotics with sparse rewards. We build upon the Deep DeterministicPolicy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations andactual interactions are used to fill a replay buffer and the sampling ratiobetween demonstrations and transitions is automatically tuned via a prioritizedreplay mechanism. Typically, carefully engineered shaping rewards are requiredto enable the agents to efficiently explore on high dimensional controlproblems such as robotics. They are also required for model-based accelerationmethods relying on local solvers such as iLQG (e.g. Guided Policy Search andNormalized Advantage Function). The demonstrations replace the need forcarefully engineered rewards, and reduce the exploration problem encountered byclassical RL approaches in these domains. Demonstrations are collected by arobot kinesthetically force-controlled by a human demonstrator. Results on foursimulated insertion tasks show that DDPG from demonstrations out-performs DDPG,and does not require engineered rewards. Finally, we demonstrate the method ona real robotics task consisting of inserting a clip (flexible object) into arigid object.

Quick Read (beta)

loading the full paper ...