Automatic dense annotation of large-vocabulary sign language videos

Abstract

Recently, sign language researchers have turned to sign language interpretedTV broadcasts, comprising (i) a video of continuous signing and (ii) subtitlescorresponding to the audio content, as a readily available and large-scalesource of training data. One key challenge in the usability of such data is thelack of sign annotations. Previous work exploiting such weakly-aligned dataonly found sparse correspondences between keywords in the subtitle andindividual signs. In this work, we propose a simple, scalable framework tovastly increase the density of automatic annotations. Our contributions are thefollowing: (1) we significantly improve previous annotation methods by makinguse of synonyms and subtitle-signing alignment; (2) we show the value ofpseudo-labelling from a sign recognition model as a way of sign spotting; (3)we propose a novel approach for increasing our annotations of known and unknownclasses based on in-domain exemplars; (4) on the BOBSL BSL sign languagecorpus, we increase the number of confident automatic annotations from 670K to5M. We make these annotations publicly available to support the sign languageresearch community.

Quick Read (beta)

loading the full paper ...