We propose a method for meta-learning reinforcement learning algorithms bysearching over the space of computational graphs which compute the lossfunction for a value-based model-free RL agent to optimize. The learnedalgorithms are domain-agnostic and can generalize to new environments not seenduring training. Our method can both learn from scratch and bootstrap off knownexisting algorithms, like DQN, enabling interpretable modifications whichimprove performance. Learning from scratch on simple classical control andgridworld tasks, our method rediscovers the temporal-difference (TD) algorithm.Bootstrapped from DQN, we highlight two learned algorithms which obtain goodgeneralization performance over other classical control tasks, gridworld typetasks, and Atari games. The analysis of the learned algorithm behavior showsresemblance to recently proposed RL algorithms that address overestimation invalue-based methods.