Abstract
Deep reinforcement learning (RL) is a powerful approach to complex decisionmaking. However, one issue that limits its practical application is itsbrittleness, sometimes failing to train in the presence of small changes in theenvironment. Motivated by the success of zero-shot transfer-where pre-trainedmodels perform well on related tasks-we consider the problem of selecting agood set of training tasks to maximize generalization performance across arange of tasks. Given the high cost of training, it is critical to selecttraining tasks strategically, but not well understood how to do so. We henceintroduce Model-Based Transfer Learning (MBTL), which layers on top of existingRL methods to effectively solve contextual RL problems. MBTL models thegeneralization performance in two parts: 1) the performance set point, modeledusing Gaussian processes, and 2) performance loss (generalization gap), modeledas a linear function of contextual similarity. MBTL combines these two piecesof information within a Bayesian optimization (BO) framework to strategicallyselect training tasks. We show theoretically that the method exhibits sublinearregret in the number of training tasks and discuss conditions to furthertighten regret bounds. We experimentally validate our methods using urbantraffic and standard continuous control benchmarks. The experimental resultssuggest that MBTL can achieve up to 50x improved sample efficiency comparedwith canonical independent training and multi-task training. Furtherexperiments demonstrate the efficacy of BO and the insensitivity to theunderlying RL algorithm and hyperparameters. This work lays the foundations forinvestigating explicit modeling of generalization, thereby enabling principledyet effective methods for contextual RL.