Trajectory First: A Curriculum for Discovering Diverse Policies

Abstract

Being able to solve a task in diverse ways makes agents more robust to taskvariations and less prone to local optima. In this context, constraineddiversity optimization has emerged as a powerful reinforcement learning (RL)framework to train a diverse set of agents in parallel. However, existingconstrained-diversity RL methods often under-explore in complex tasks such asrobotic manipulation, leading to a lack in policy diversity. To improvediversity optimization in RL, we therefore propose a curriculum that firstexplores at the trajectory level before learning step-based policies. In ourempirical evaluation, we provide novel insights into the shortcoming ofskill-based diversity optimization, and demonstrate empirically that ourcurriculum improves the diversity of the learned skills.

Quick Read (beta)

loading the full paper ...