Sign Language Translation with Transformers

  • 2020-04-01 17:20:04
  • Kayo Yin
  • 8

Abstract

Sign Language Translation (SLT) first uses a Sign Language Recognition (SLR)system to extract sign language glosses from videos. Then, a translation systemgenerates spoken language translations from the sign language glosses. ThoughSLT has gathered interest recently, little study has been performed on thetranslation system. This paper focuses on the translation system and improvesperformance by utilizing Transformer networks. We report a wide range ofexperimental results for various Transformer setups and introduce the use ofSpatial-Temporal Multi-Cue (STMC) networks in an end-to-end SLT system withTransformer. We perform experiments on RWTH-PHOENIX-Weather 2014T, a challenging SLTbenchmark dataset of German sign language, and ASLG-PC12, a dataset involvingAmerican Sign Language (ASL) recently used in gloss-to-text translation. Ourmethodology improves on the current state-of-the-art by over 5 and 7 pointsrespectively in BLEU-4 score on ground truth glosses and by using an STMCnetwork to predict glosses of the RWTH-PHOENIX-Weather 2014T dataset. On theASLG-PC12 corpus, we report an improvement of over 16 points in BLEU-4. Ourfindings also demonstrate that end-to-end translation on predicted glossesprovides even better performance than translation on ground truth glosses. Thisshows potential for further improvement in SLT by either jointly training theSLR and translation systems or by revising the gloss annotation system.

 

Quick Read (beta)

loading the full paper ...