The KIT Motion-Language Dataset

Abstract

Linking human motion and natural language is of great interest for thegeneration of semantic representations of human activities as well as for thegeneration of robot activities based on natural language input. However, whilethere have been years of research in this area, no standardized and openlyavailable dataset exists to support the development and evaluation of suchsystems. We therefore propose the KIT Motion-Language Dataset, which is large,open, and extensible. We aggregate data from multiple motion capture databasesand include them in our dataset using a unified representation that isindependent of the capture system or marker set, making it easy to work withthe data regardless of its origin. To obtain motion annotations in naturallanguage, we apply a crowd-sourcing approach and a web-based tool that wasspecifically build for this purpose, the Motion Annotation Tool. We thoroughlydocument the annotation process itself and discuss gamification methods that weused to keep annotators motivated. We further propose a novel method,perplexity-based selection, which systematically selects motions for furtherannotation that are either under-represented in our dataset or that haveerroneous annotations. We show that our method mitigates the two aforementionedproblems and ensures a systematic annotation process. We provide an in-depthanalysis of the structure and contents of our resulting dataset, which, as ofOctober 10, 2016, contains 3911 motions with a total duration of 11.23 hoursand 6278 annotations in natural language that contain 52,903 words. We believethis makes our dataset an excellent choice that enables more transparent andcomparable research in this important area.

Quick Read (beta)

loading the full paper ...