The ability to exploit prior experience to solve novel problems rapidly is ahallmark of biological learning systems and of great practical importance forartificial ones. In the meta reinforcement learning literature much recent workhas focused on the problem of optimizing the learning process itself. In thispaper we study a complementary approach which is conceptually simple, general,modular and built on top of recent improvements in off-policy learning. Theframework is inspired by ideas from the probabilistic inference literature andcombines robust off-policy learning with a behavior prior, or default behaviorthat constrains the space of solutions and serves as a bias for exploration; aswell as a representation for the value function, both of which are easilylearned from a number of training tasks in a multi-task scenario. Our approachachieves competitive adaptation performance on hold-out tasks compared to metareinforcement learning baselines and can scale to complex sparse-rewardscenarios.