Abstract
We propose an approach for mapping natural language instructions and rawobservations to continuous control of a quadcopter drone. Our model predictsinterpretable position-visitation distributions indicating where the agentshould go during execution and where it should stop, and uses the predicteddistributions to select the actions to execute. This two-step modeldecomposition allows for simple and efficient training using a combination ofsupervised learning and imitation learning. We evaluate our approach with arealistic drone simulator, and demonstrate absolute task-completion accuracyimprovements of 16.85% over two state-of-the-art instruction-following methods.
Quick Read (beta)
loading the full paper ...