Abstract
Recent work has shown that deep reinforcement-learning agents can learn tofollow language-like instructions from infrequent environment rewards. However,for many real-world natural language commands that involve a degree ofunderspecification or ambiguity, such as "tidy the room", it would bechallenging or impossible to program an appropriate reward function. Toovercome this, we present a method for learning to follow commands from atraining set of instructions and corresponding example goal-states, rather thanan explicit reward function. Importantly, the example goal-states are not seenat test time. The approach effectively separates the representation of whatinstructions require from how they can be executed. In a simple grid world, themethod enables an agent to learn a range of commands requiring interaction withblocks and understanding of spatial relations and underspecified abstractarrangements. We further show the method allows our agent to adapt to changesin the environment without requiring new training examples.