Abstract
Reinforcement learning (RL) has made a lot of advances for solving a singleproblem in a given environment; but learning policies that generalize to unseenvariations of a problem remains challenging. To improve sample efficiency forlearning on such instances of a problem domain, we present Self-Paced ContextEvaluation (SPaCE). Based on self-paced learning, \spc automatically generates\task curricula online with little computational overhead. To this end, SPaCEleverages information contained in state values during training to accelerateand improve training performance as well as generalization capabilities to newinstances from the same problem domain. Nevertheless, SPaCE is independent ofthe problem domain at hand and can be applied on top of any RL agent withstate-value function approximation. We demonstrate SPaCE's ability to speed uplearning of different value-based RL agents on two environments, showing bettergeneralization capabilities and up to 10x faster learning compared to naiveapproaches such as round robin or SPDRL, as the closest state-of-the-artapproach.