Sim-to-Real Transfer for Biped Locomotion

Abstract

We present a new approach for transfer of dynamic robot control policies suchas biped locomotion from simulation to real hardware. Key to our approach is toperform system identification of the model parameters {\mu} of the hardware(e.g. friction, center-of-mass) in two distinct stages, before policy learning(pre-sysID) and after policy learning (post-sysID). Pre-sysID begins bycollecting trajectories from the physical hardware based on a set of genericmotion sequences. Because the trajectories may not be related to the task ofinterest, presysID does not attempt to accurately identify the true value of{\mu}, but only to approximate the range of {\mu} to guide the policy learning.Next, a Projected Universal Policy (PUP) is created by simultaneously traininga network that projects {\mu} to a low-dimensional latent variable {\eta} and afamily of policies that are conditioned on {\eta}. The second round of systemidentification (post-sysID) is then carried out by deploying the PUP on therobot hardware using task-relevant trajectories. We use Bayesian Optimizationto determine the values for {\eta} that optimizes the performance of PUP on thereal hardware. We have used this approach to create three successful bipedlocomotion controllers (walk forward, walk backwards, walk sideways) on theDarwin OP2 robot.

Quick Read (beta)

loading the full paper ...