MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language

Abstract

Sign language recognition is a challenging and often underestimated problemcomprising multi-modal articulators (handshape, orientation, movement, upperbody and face) that integrate asynchronously on multiple streams. Learningpowerful statistical models in such a scenario requires much data, particularlyto apply recent advances of the field. However, labeled data is a scarceresource for sign language due to the enormous cost of transcribing theseunwritten languages. We propose the first real-life large-scale sign language data set comprisingover 25,000 annotated videos, which we thoroughly evaluate withstate-of-the-art methods from sign and related action recognition. Unlike thecurrent state-of-the-art, the data set allows to investigate the generalizationto unseen individuals (signer-independent test) in a realistic setting withover 200 signers. Previous work mostly deals with limited vocabulary tasks,while here, we cover a large class count of 1000 signs in challenging andunconstrained real-life recording conditions. We further propose I3D, knownfrom video classifications, as a powerful and suitable architecture for signlanguage recognition, outperforming the current state-of-the-art by a largemargin. The data set is publicly available to the community.

Quick Read (beta)

loading the full paper ...