Abstract
Deep reinforcement learning (RL) is notoriously impractical to deploy due tosample inefficiency. Meta-RL directly addresses this sample inefficiency bylearning to perform few-shot learning when a distribution of related tasks isavailable for meta-training. While many specialized meta-RL methods have beenproposed, recent work suggests that end-to-end learning in conjunction with anoff-the-shelf sequential model, such as a recurrent network, is a surprisinglystrong baseline. However, such claims have been controversial due to limitedsupporting evidence, particularly in the face of prior work establishingprecisely the opposite. In this paper, we conduct an empirical investigation.While we likewise find that a recurrent network can achieve strong performance,we demonstrate that the use of hypernetworks is crucial to maximizing theirpotential. Surprisingly, when combined with hypernetworks, the recurrentbaselines that are far simpler than existing specialized methods actuallyachieve the strongest performance of all methods evaluated.