Abstract
Meta-learning empowers artificial intelligence to increase its efficiency bylearning how to learn. Unlocking this potential involves overcoming achallenging meta-optimisation problem that often exhibits ill-conditioning, andmyopic meta-objectives. We propose an algorithm that tackles these issues byletting the meta-learner teach itself. The algorithm first bootstraps a targetfrom the meta-learner, then optimises the meta-learner by minimising thedistance to that target under a chosen (pseudo-)metric. Focusing onmeta-learning with gradients, we establish conditions that guaranteeperformance improvements and show that the improvement is related to the targetdistance. Thus, by controlling curvature, the distance measure can be used toease meta-optimization, for instance by reducing ill-conditioning. Further, thebootstrapping mechanism can extend the effective meta-learning horizon withoutrequiring backpropagation through all updates. The algorithm is versatile andeasy to implement. We achieve a new state-of-the art for model-free agents onthe Atari ALE benchmark, improve upon MAML in few-shot learning, anddemonstrate how our approach opens up new possibilities by meta-learningefficient exploration in a Q-learning agent.