MERL: Multi-Head Reinforcement Learning

Abstract

A common challenge in reinforcement learning is how to convert the agent'sinteractions with an environment into fast and robust learning. For instance,earlier work makes use of domain knowledge to improve existing reinforcementlearning algorithms in complex tasks. While promising, previously acquiredknowledge is often costly and challenging to scale up. Instead, we decide toconsider problem knowledge with signals from quantities relevant to solve anytask, e.g., self-performance assessment and accurate expectations.$\mathcal{V}^{ex}$ is such a quantity. It is the fraction of variance explainedby the value function $V$ and measures the discrepancy between $V$ and thereturns. Taking advantage of $\mathcal{V}^{ex}$, we propose MERL, a generalframework for structuring reinforcement learning by injecting problem knowledgeinto policy gradient updates. As a result, the agent is not only optimized fora reward but learns using problem-focused quantities provided by MERL,applicable out-of-the-box to any task. In this paper: (a) We introduce anddefine MERL, the multi-head reinforcement learning framework we use throughoutthis work. (b) We conduct experiments across a variety of standard benchmarkenvironments, including 9 continuous control tasks, where results show improvedperformance. (c) We demonstrate that MERL also improves transfer learning on aset of challenging pixel-based tasks. (d) We ponder how MERL tackles theproblem of reward sparsity and better conditions the feature space ofreinforcement learning agents.

Quick Read (beta)

loading the full paper ...