CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning

Abstract

Deep Reinforcement Learning (RL) has demonstrated impressive results insolving complex robotic tasks such as quadruped locomotion. Yet, currentsolvers fail to produce efficient policies respecting hard constraints. In thiswork, we advocate for integrating constraints into robot learning and presentConstraints as Terminations (CaT), a novel constrained RL algorithm. Departingfrom classical constrained RL formulations, we reformulate constraints throughstochastic terminations during policy learning: any violation of a constrainttriggers a probability of terminating potential future rewards the RL agentcould attain. We propose an algorithmic approach to this formulation, byminimally modifying widely used off-the-shelf RL algorithms in robot learning(such as Proximal Policy Optimization). Our approach leads to excellentconstraint adherence without introducing undue complexity and computationaloverhead, thus mitigating barriers to broader adoption. Through empiricalevaluation on the real quadruped robot Solo crossing challenging obstacles, wedemonstrate that CaT provides a compelling solution for incorporatingconstraints into RL frameworks. Videos and code are available athttps://constraints-as-terminations.github.io.

Quick Read (beta)

loading the full paper ...