Learning Emergent Gaits with Decentralized Phase Oscillators: on the role of Observations, Rewards, and Feedback

Abstract

We present a minimal phase oscillator model for learning quadrupedallocomotion. Each of the four oscillators is coupled only to itself and itscorresponding leg through local feedback of the ground reaction force, whichcan be interpreted as an observer feedback gain. We interpret the oscillatoritself as a latent contact state-estimator. Through a systematic ablationstudy, we show that the combination of phase observations, simple phase-basedrewards, and the local feedback dynamics induces policies that exhibit emergentgait preferences, while using a reduced set of simple rewards, and withoutprescribing a specific gait. The code is open-source, and a video synopsisavailable at https://youtu.be/1NKQ0rSV3jU.

Quick Read (beta)

loading the full paper ...