Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach

Abstract

As the complexity of tasks addressed through reinforcement learning (RL)increases, the definition of reward functions also has become highlycomplicated. We introduce an RL method aimed at simplifying the reward-shapingprocess through intuitive strategies. Initially, instead of a single rewardfunction composed of various terms, we define multiple reward and costfunctions within a constrained multi-objective RL (CMORL) framework. For tasksinvolving sequential complex movements, we segment the task into distinctstages and define multiple rewards and costs for each stage. Finally, weintroduce a practical CMORL algorithm that maximizes objectives based on theserewards while satisfying constraints defined by the costs. The proposed methodhas been successfully demonstrated across a variety of acrobatic tasks in bothsimulation and real-world environments. Additionally, it has been shown tosuccessfully perform tasks compared to existing RL and constrained RLalgorithms. Our code is available athttps://github.com/rllab-snu/Stage-Wise-CMORL.

Quick Read (beta)

loading the full paper ...