Learning to Locomote: Understanding How Environment Design Matters for Deep Reinforcement Learning

Abstract

Learning to locomote is one of the most common tasks in physics-basedanimation and deep reinforcement learning (RL). A learned policy is the productof the problem to be solved, as embodied by the RL environment, and the RLalgorithm. While enormous attention has been devoted to RL algorithms, muchless is known about the impact of design choices for the RL environment. Inthis paper, we show that environment design matters in significant ways anddocument how it can contribute to the brittle nature of many RL results.Specifically, we examine choices related to state representations, initialstate distributions, reward structure, control frequency, episode terminationprocedures, curriculum usage, the action space, and the torque limits. We aimto stimulate discussion around such choices, which in practice strongly impactthe success of RL when applied to continuous-action control problems ofinterest to animation, such as learning to locomote.

Quick Read (beta)

loading the full paper ...