Abstract
Model-free Reinforcement Learning (RL) offers an attractive approach to learncontrol policies for high-dimensional systems, but its relatively poor samplecomplexity often forces training in simulated environments. Even in simulation,goal-directed tasks whose natural reward function is sparse remain intractablefor state-of-the-art model-free algorithms for continuous control. Thebottleneck in these tasks is the prohibitive amount of exploration required toobtain a learning signal from the initial state of the system. In this work, weleverage physical priors in the form of an approximate system dynamics model todesign a curriculum scheme for a model-free policy optimization algorithm. OurBackward Reachability Curriculum (BaRC) begins policy training from states thatrequire a small number of actions to accomplish the task, and expands theinitial state distribution backwards in a dynamically-consistent manner oncethe policy optimization algorithm demonstrates sufficient performance. BaRC isgeneral, in that it can accelerate training of any model-free RL algorithm on abroad class of goal-directed continuous control MDPs. Its curriculum strategyis physically intuitive, easy-to-tune, and allows incorporating physical priorsto accelerate training without hindering the performance, flexibility, andapplicability of the model-free RL algorithm. We evaluate our approach on tworepresentative dynamic robotic learning problems and find substantialperformance improvement relative to previous curriculum generation techniquesand naive exploration strategies.