Abstract
Model-free reinforcement learning (RL) has enabled adaptable and agilequadruped locomotion; however, policies often converge to a single gait,leading to suboptimal performance. Traditionally, Model Predictive Control(MPC) has been extensively used to obtain task-specific optimal policies butlacks the ability to adapt to varying environments. To address theselimitations, we propose an optimization framework for real-time gait adaptationin a continuous gait space, combining the Model Predictive Path Integral (MPPI)algorithm with a Dreamer module to produce adaptive and optimal policies forquadruped locomotion. At each time step, MPPI jointly optimizes the actions andgait variables using a learned Dreamer reward that promotes velocity tracking,energy efficiency, stability, and smooth transitions, while penalizing abruptgait changes. A learned value function is incorporated as terminal reward,extending the formulation to an infinite-horizon planner. We evaluate ourframework in simulation on the Unitree Go1, demonstrating an average reductionof up to 36.48\% in energy consumption across varying target speeds, whilemaintaining accurate tracking and adaptive, task-appropriate gaits.