Progressive Transformers for End-to-End Sign Language Production

Abstract

The goal of automatic Sign Language Production (SLP) is to translate spokenlanguage to a continuous stream of sign language video at a level comparable toa human translator. If this was achievable, then it would revolutionise Deafhearing communications. Previous work on predominantly isolated SLP has shownthe need for architectures that are better suited to the continuous domain offull sign sequences. In this paper, we propose Progressive Transformers, a novel architecture thatcan translate from discrete spoken language sentences to continuous 3D skeletonpose outputs representing sign language. We present two model configurations,an end-to-end network that produces sign direct from text and a stacked networkthat utilises a gloss intermediary. Our transformer network architecture introduces a counter that enablescontinuous sequence generation at training and inference. We also provideseveral data augmentation processes to overcome the problem of drift andimprove the performance of SLP models. We propose a back translation evaluationmechanism for SLP, presenting benchmark quantitative results on the challengingRWTH-PHOENIX-Weather-2014T(PHOENIX14T) dataset and setting baselines for futureresearch.

Quick Read (beta)

loading the full paper ...