Abstract
One of the remaining challenges in reinforcement learning is to developagents that can generalise to novel scenarios they might encounter oncedeployed. This challenge is often framed in a multi-task setting where agentstrain on a fixed set of tasks and have to generalise to new tasks. Recent workhas shown that in this setting increased exploration during training can beleveraged to increase the generalisation performance of the agent. This makessense when the states encountered during testing can actually be exploredduring training. In this paper, we provide intuition why exploration can alsobenefit generalisation to states that cannot be explicitly encountered duringtraining. Additionally, we propose a novel method Explore-Go that exploits thisintuition by increasing the number of states on which the agent trains.Explore-Go effectively increases the starting state distribution of the agentand as a result can be used in conjunction with most existing on-policy oroff-policy reinforcement learning algorithms. We show empirically that ourmethod can increase generalisation performance in an illustrative environmentand on the Procgen benchmark.