BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning

Abstract

Model-free Reinforcement Learning (RL) offers an attractive approach to learncontrol policies for high-dimensional systems, but its relatively poor samplecomplexity often forces training in simulated environments. Even in simulation,goal-directed tasks whose natural reward function is sparse remain intractablefor state-of-the-art model-free algorithms for continuous control. Thebottleneck in these tasks is the prohibitive amount of exploration required toobtain a learning signal from the initial state of the system. In this work, weleverage physical priors in the form of an approximate system dynamics model todesign a curriculum scheme for a model-free policy optimization algorithm. OurBackward Reachability Curriculum (BaRC) begins policy training from states thatrequire a small number of actions to accomplish the task, and expands theinitial state distribution backwards in a dynamically-consistent manner oncethe policy optimization algorithm demonstrates sufficient performance. BaRC isgeneral, in that it can accelerate training of any model-free RL algorithm on abroad class of goal-directed continuous control MDPs. Its curriculum strategyis physically intuitive, easy-to-tune, and allows incorporating physical priorsto accelerate training without hindering the performance, flexibility, andapplicability of the model-free RL algorithm. We evaluate our approach on tworepresentative dynamic robotic learning problems and find substantialperformance improvement relative to previous curriculum generation techniquesand naive exploration strategies.

Quick Read (beta)

loading the full paper ...