Abstract
Even for better-studied sign languages like American Sign Language (ASL),data is the bottleneck for machine learning research. The situation is worseyet for the many other sign languages used by Deaf/Hard of Hearing communitiesaround the world. In this paper, we present YouTube-SL-25, a large-scale,open-domain multilingual corpus of sign language videos with seeminglywell-aligned captions drawn from YouTube. With >3000 hours of videos across >25sign languages, YouTube-SL-25 is a) >3x the size of YouTube-ASL, b) the largestparallel sign language dataset to date, and c) the first or largest paralleldataset for many of its component languages. We provide baselines forsign-to-text tasks using a unified multilingual multitask model based on T5 andreport scores on benchmarks across 4 sign languages. The results demonstratethat multilingual transfer benefits both higher- and lower-resource signlanguages within YouTube-SL-25.