MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language

Abstract

Computer Vision has been improved significantly in the past few decades. Ithas enabled machine to do many human tasks. However, the real challenge is inenabling machine to carry out tasks that an average human does not have theskills for. One such challenge that we have tackled in this paper is providingaccessibility for deaf individual by providing means of communication withothers with the aid of computer vision. Unlike other frequent works focusing onmultiple camera, depth camera, electrical glove or visual gloves, we focused onthe sole use of RGB which allows everybody to communicate with a deafindividual through their personal devices. This is not a new approach but thelack of realistic large-scale data set prevented recent computer vision trendson video classification in this filed. In this paper, we propose the first large scale ASL data set that covers over200 signers, signer independent sets, challenging and unconstrained recordingconditions and a large class count of 1000 signs. We evaluate baselines fromaction recognition techniques on the data set. We propose I3D, known from videoclassifications, as a powerful and suitable architecture for sign languagerecognition. We also propose new pre-trained model more appropriate for signlanguage recognition. Finally, We estimate the effect of number of classes andnumber of training samples on the recognition accuracy.

Quick Read (beta)

loading the full paper ...