A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning

Abstract

Across machine learning, the use of curricula has shown strong empiricalpotential to improve learning from data by avoiding local optima of trainingobjectives. For reinforcement learning (RL), curricula are especiallyinteresting, as the underlying optimization has a strong tendency to get stuckin local optima due to the exploration-exploitation trade-off. Recently, anumber of approaches for an automatic generation of curricula for RL have beenshown to increase performance while requiring less expert knowledge compared tomanually designed curricula. However, these approaches are seldomlyinvestigated from a theoretical perspective, preventing a deeper understandingof their mechanics. In this paper, we present an approach for automatedcurriculum generation in RL with a clear theoretical underpinning. Moreprecisely, we formalize the well-known self-paced learning paradigm as inducinga distribution over training tasks, which trades off between task complexityand the objective to match a desired task distribution. Experiments show thattraining on this induced distribution helps to avoid poor local optima acrossRL algorithms in different tasks with uninformative rewards and challengingexploration requirements.

Quick Read (beta)

loading the full paper ...