One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion

Abstract

Deep Reinforcement Learning techniques are achieving state-of-the-art resultsin robust legged locomotion. While there exists a wide variety of leggedplatforms such as quadruped, humanoids, and hexapods, the field is stillmissing a single learning framework that can control all these differentembodiments easily and effectively and possibly transfer, zero or few-shot, tounseen robot embodiments. We introduce URMA, the Unified Robot MorphologyArchitecture, to close this gap. Our framework brings the end-to-end Multi-TaskReinforcement Learning approach to the realm of legged robots, enabling thelearned policy to control any type of robot morphology. The key idea of ourmethod is to allow the network to learn an abstract locomotion controller thatcan be seamlessly shared between embodiments thanks to our morphology-agnosticencoders and decoders. This flexible architecture can be seen as a potentialfirst step in building a foundation model for legged robot locomotion. Ourexperiments show that URMA can learn a locomotion policy on multipleembodiments that can be easily transferred to unseen robot platforms insimulation and the real world.

Quick Read (beta)

loading the full paper ...