Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning

Abstract

In this work, we present and study a training set-up that achieves fastpolicy generation for real-world robotic tasks by using massive parallelism ona single workstation GPU. We analyze and discuss the impact of differenttraining algorithm components in the massively parallel regime on the finalpolicy performance and training times. In addition, we present a novelgame-inspired curriculum that is well suited for training with thousands ofsimulated robots in parallel. We evaluate the approach by training thequadrupedal robot ANYmal to walk on challenging terrain. The parallel approachallows training policies for flat terrain in under four minutes, and in twentyminutes for uneven terrain. This represents a speedup of multiple orders ofmagnitude compared to previous work. Finally, we transfer the policies to thereal robot to validate the approach. We open-source our training code to helpaccelerate further research in the field of learned legged locomotion.

Quick Read (beta)

loading the full paper ...