Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production

Abstract

Sign language, essential for the deaf and hard-of-hearing, presents uniquechallenges in translation and production due to its multimodal nature and theinherent ambiguity in mapping sign language motion to spoken language words.Previous methods often rely on gloss annotations, requiring time-intensivelabor and specialized expertise in sign language. Gloss-free methods haveemerged to address these limitations, but they often depend on external signlanguage data or dictionaries, failing to completely eliminate the need forgloss annotations. There is a clear demand for a comprehensive approach thatcan supplant gloss annotations and be utilized for both Sign LanguageTranslation (SLT) and Sign Language Production (SLP). We introduce UniversalGloss-level Representation (UniGloR), a unified and self-supervised solutionfor both SLT and SLP, trained on multiple datasets including PHOENIX14T,How2Sign, and NIASL2021. Our results demonstrate UniGloR's effectiveness in thetranslation and production tasks. We further report an encouraging result forthe Sign Language Recognition (SLR) on previously unseen data. Our studysuggests that self-supervised learning can be made in a unified manner, pavingthe way for innovative and practical applications in future research.

Quick Read (beta)

loading the full paper ...