RTFM: Generalising to Novel Environment Dynamics via Reading

Abstract

Obtaining policies that can generalise to new environments in reinforcementlearning is challenging. In this work, we demonstrate that languageunderstanding via a reading policy learner is a promising vehicle forgeneralisation to new environments. We propose a grounded policy learningproblem, Read to Fight Monsters (RTFM), in which the agent must jointly reasonover a language goal, relevant dynamics described in a document, andenvironment observations. We procedurally generate environment dynamics andcorresponding language descriptions of the dynamics, such that agents must readto understand new environment dynamics instead of memorising any particularinformation. In addition, we propose txt2$\pi$, a model that captures three-wayinteractions between the goal, document, and observations. On RTFM, txt2$\pi$generalises to new environments with dynamics not seen during training viareading. Furthermore, our model outperforms baselines such as FiLM andlanguage-conditioned CNNs on RTFM. Through curriculum learning, txt2$\pi$produces policies that excel on complex RTFM tasks requiring several reasoningand coreference steps.

Quick Read (beta)

loading the full paper ...