In the Vision-and-Language Navigation (VLN) task an embodied agent navigatesa 3D environment, following natural language instructions. A challenge in thistask is how to handle 'off the path' scenarios where an agent veers from areference path. Prior work supervises the agent with actions based on theshortest path from the agent's location to the goal, but such goal-orientedsupervision is often not in alignment with the instruction. Furthermore, theevaluation metrics employed by prior work do not measure how much of a languageinstruction the agent is able to follow. In this work, we propose a simple andeffective language-aligned supervision scheme, and a new metric that measuresthe number of sub-instructions the agent has completed during navigation.