JWSign: A Highly Multilingual Corpus of Bible Translations for more Diversity in Sign Language Processing

Abstract

Advancements in sign language processing have been hindered by a lack ofsufficient data, impeding progress in recognition, translation, and productiontasks. The absence of comprehensive sign language datasets across the world'ssign languages has widened the gap in this field, resulting in a few signlanguages being studied more than others, making this research area extremelyskewed mostly towards sign languages from high-income countries. In this workwe introduce a new large and highly multilingual dataset for sign languagetranslation: JWSign. The dataset consists of 2,530 hours of Bible translationsin 98 sign languages, featuring more than 1,500 individual signers. On thisdataset, we report neural machine translation experiments. Apart from bilingualbaseline systems, we also train multilingual systems, including some that takeinto account the typological relatedness of signed or spoken languages. Ourexperiments highlight that multilingual systems are superior to bilingualbaselines, and that in higher-resource scenarios, clustering language pairsthat are related improves translation quality.

Quick Read (beta)

loading the full paper ...