Language2Pose: Natural Language Grounded Pose Forecasting

  • 2019-11-27 19:06:42
  • Chaitanya Ahuja, Louis-Philippe Morency
  • 0

Abstract

Generating animations from natural language sentences finds its applicationsin a a number of domains such as movie script visualization, virtual humananimation and, robot motion planning. These sentences can describe differentkinds of actions, speeds and direction of these actions, and possibly a targetdestination. The core modeling challenge in this language-to-pose applicationis how to map linguistic concepts to motion animations. In this paper, we address this multimodal problem by introducing a neuralarchitecture called Joint Language to Pose (or JL2P), which learns a jointembedding of language and pose. This joint embedding space is learnedend-to-end using a curriculum learning approach which emphasizes shorter andeasier sequences first before moving to longer and harder ones. We evaluate ourproposed model on a publicly available corpus of 3D pose data andhuman-annotated sentences. Both objective metrics and human judgment evaluationconfirm that our proposed approach is able to generate more accurate animationsand are deemed visually more representative by humans than other data drivenapproaches.

 

Quick Read (beta)

loading the full paper ...