Abstract
Reinforcement learning combined with sim-to-real transfer offers a generalframework for developing locomotion controllers for legged robots. Tofacilitate successful deployment in the real world, smoothing techniques, suchas low-pass filters and smoothness rewards, are often employed to developpolicies with smooth behaviors. However, because these techniques arenon-differentiable and usually require tedious tuning of a large set ofhyperparameters, they tend to require extensive manual tuning for each roboticplatform. To address this challenge and establish a general technique forenforcing smooth behaviors, we propose a simple and effective method thatimposes a Lipschitz constraint on a learned policy, which we refer to asLipschitz-Constrained Policies (LCP). We show that the Lipschitz constraint canbe implemented in the form of a gradient penalty, which provides adifferentiable objective that can be easily incorporated with automaticdifferentiation frameworks. We demonstrate that LCP effectively replaces theneed for smoothing rewards or low-pass filters and can be easily integratedinto training frameworks for many distinct humanoid robots. We extensivelyevaluate LCP in both simulation and real-world humanoid robots, producingsmooth and robust locomotion controllers. All simulation and deployment code,along with complete checkpoints, is available on our project page:https://lipschitz-constrained-policy.github.io.