How Should We Meta-Learn Reinforcement Learning Algorithms?

Abstract

The process of meta-learning algorithms from data, instead of relying onmanual design, is growing in popularity as a paradigm for improving theperformance of machine learning systems. Meta-learning shows particular promisefor reinforcement learning (RL), where algorithms are often adapted fromsupervised or unsupervised learning despite their suboptimality for RL.However, until now there has been a severe lack of comparison between differentmeta-learning algorithms, such as using evolution to optimise over black-boxfunctions or LLMs to propose code. In this paper, we carry out this empiricalcomparison of the different approaches when applied to a range of meta-learnedalgorithms which target different parts of the RL pipeline. In addition tometa-train and meta-test performance, we also investigate factors including theinterpretability, sample cost and train time for each meta-learning algorithm.Based on these findings, we propose several guidelines for meta-learning new RLalgorithms which will help ensure that future learned algorithms are asperformant as possible.

Quick Read (beta)

loading the full paper ...