Multi-task Deep Reinforcement Learning with PopArt

Abstract

The reinforcement learning community has made great strides in designingalgorithms capable of exceeding human performance on specific tasks. Thesealgorithms are mostly trained one task at the time, each new task requiring totrain a brand new agent instance. This means the learning algorithm is general,but each solution is not; each agent can only solve the one task it was trainedon. In this work, we study the problem of learning to master not one butmultiple sequential-decision tasks at once. A general issue in multi-tasklearning is that a balance must be found between the needs of multiple taskscompeting for the limited resources of a single learning system. Many learningalgorithms can get distracted by certain tasks in the set of tasks to solve.Such tasks appear more salient to the learning process, for instance because ofthe density or magnitude of the in-task rewards. This causes the algorithm tofocus on those salient tasks at the expense of generality. We propose toautomatically adapt the contribution of each task to the agent's updates, sothat all tasks have a similar impact on the learning dynamics. This resulted instate of the art performance on learning to play all games in a set of 57diverse Atari games. Excitingly, our method learned a single trained policy -with a single set of weights - that exceeds median human performance. To ourknowledge, this was the first time a single agent surpassed human-levelperformance on this multi-task domain. The same approach also demonstratedstate of the art performance on a set of 30 tasks in the 3D reinforcementlearning platform DeepMind Lab.

Quick Read (beta)

loading the full paper ...