BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

  • 2020-07-23 16:59:01
  • Samuel Albanie, Gül Varol, Liliane Momeni, Triantafyllos Afouras, Joon Son Chung, Neil Fox, Andrew Zisserman
Recent progress in fine-grained gesture and action classification, andmachine translation, point to the possibility of automated sign languagerecognition becoming a reality. A key stumbling block in making progresstowards this goal is a lack of appropriate training data, stemming from thehigh complexity of sign annotation and a limited supply of qualifiedannotators. In this work, we introduce a new scalable approach to datacollection for sign recognition in continuous videos. We make use ofweakly-aligned subtitles for broadcast footage together with a keyword spottingmethod to automatically localise sign-instances for a vocabulary of 1,000 signsin 1,000 hours of video. We make the following contributions: (1) We show howto use mouthing cues from signers to obtain high-quality annotations from videodata - the result is the BSL-1K dataset, a collection of British Sign Language(BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to trainstrong sign recognition models for co-articulated signs in BSL and that thesemodels additionally form excellent pretraining for other sign languages andbenchmarks - we exceed the state of the art on both the MSASL and WLASLbenchmarks. Finally, (3) we propose new large-scale evaluation sets for thetasks of sign recognition and sign spotting and provide baselines which we hopewill serve to stimulate research in this area.


