Abstract
Recently, sign language researchers have turned to sign language interpretedTV broadcasts, comprising (i) a video of continuous signing and (ii) subtitlescorresponding to the audio content, as a readily available and large-scalesource of training data. One key challenge in the usability of such data is thelack of sign annotations. Previous work exploiting such weakly-aligned dataonly found sparse correspondences between keywords in the subtitle andindividual signs. In this work, we propose a simple, scalable framework tovastly increase the density of automatic annotations. Our contributions are thefollowing: (1) we significantly improve previous annotation methods by makinguse of synonyms and subtitle-signing alignment; (2) we show the value ofpseudo-labelling from a sign recognition model as a way of sign spotting; (3)we propose a novel approach for increasing our annotations of known and unknownclasses based on in-domain exemplars; (4) on the BOBSL BSL sign languagecorpus, we increase the number of confident automatic annotations from 670K to5M. We make these annotations publicly available to support the sign languageresearch community.