Sign Language Translation with Hierarchical Spatio-TemporalGraph Neural Network

  • 2021-11-14 07:02:28
  • Jichao Kan, Kun Hu, Markus Hagenbuchner, Ah Chung Tsoi, Mohammed Bennamounm, Zhiyong Wang
  • 5

Abstract

Sign language translation (SLT), which generates text in a spoken languagefrom visual content in a sign language, is important to assist thehard-of-hearing community for their communications. Inspired by neural machinetranslation (NMT), most existing SLT studies adopted a general sequence tosequence learning strategy. However, SLT is significantly different fromgeneral NMT tasks since sign languages convey messages through multiplevisual-manual aspects. Therefore, in this paper, these unique characteristicsof sign languages are formulated as hierarchical spatio-temporal graphrepresentations, including high-level and fine-level graphs of which a vertexcharacterizes a specified body part and an edge represents their interactions.Particularly, high-level graphs represent the patterns in the regions such ashands and face, and fine-level graphs consider the joints of hands andlandmarks of facial regions. To learn these graph patterns, a novel deeplearning architecture, namely hierarchical spatio-temporal graph neural network(HST-GNN), is proposed. Graph convolutions and graph self-attentions withneighborhood context are proposed to characterize both the local and the globalgraph properties. Experimental results on benchmark datasets demonstrated theeffectiveness of the proposed method.

 

Quick Read (beta)

loading the full paper ...