Generating Automatic Curricula via Self-Supervised Active Domain Randomization

Abstract

Goal-directed Reinforcement Learning (RL) traditionally considers an agentinteracting with an environment, prescribing a real-valued reward to an agentproportional to the completion of some goal. Goal-directed RL has seen largegains in sample efficiency, due to the ease of reusing or generating newexperience by proposing goals. In this work, we build on the framework ofself-play, allowing an agent to interact with itself in order to make progresson some unknown task. We use Active Domain Randomization and self-play tocreate a novel, coupled environment-goal curriculum, where agents learn throughprogressively more difficult tasks and environment variations. Our method,Self-Supervised Active Domain Randomization (SS-ADR), generates a growingcurriculum, encouraging the agent to try tasks that are just outside of itscurrent capabilities, while building a domain-randomization curriculum thatenables state-of-the-art results on various sim2real transfer tasks. Ourresults show that a curriculum of co-evolving the environment difficulty alongwith the difficulty of goals set in each environment provides practicalbenefits in the goal-directed tasks tested.

Quick Read (beta)

loading the full paper ...