Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule

  • 2020-10-08 17:16:49
  • Shuhei Kurita, Kyunghyun Cho
  • 0

Abstract

Vision-and-language navigation (VLN) is a task in which an agent is embodiedin a realistic 3D environment and follows an instruction to reach the goalnode. While most of the previous studies have built and investigated adiscriminative approach, we notice that there are in fact two possibleapproaches to building such a VLN agent: discriminative \textit{and}generative. In this paper, we design and investigate a generativelanguage-grounded policy which uses a language model to compute thedistribution over all possible instructions i.e. all possible sequences ofvocabulary tokens given action and the transition history. In experiments, weshow that the proposed generative approach outperforms the discriminativeapproach in the Room-2-Room (R2R) and Room-4-Room (R4R) datasets, especially inthe unseen environments. We further show that the combination of the generativeand discriminative policies achieves close to the state-of-the art results inthe R2R dataset, demonstrating that the generative and discriminative policiescapture the different aspects of VLN.

 

Quick Read (beta)

loading the full paper ...