Context Matters: Self-Attention for Sign Language Recognition

Abstract

This paper proposes an attentional network for the task of Continuous SignLanguage Recognition. The proposed approach exploits co-independent streams ofdata to model the sign language modalities. These different channels ofinformation can share a complex temporal structure between each other. For thatreason, we apply attention to synchronize and help capture entangleddependencies between the different sign language components. Even though SignLanguage is multi-channel, handshapes represent the central entities in signinterpretation. Seeing handshapes in their correct context defines the meaningof a sign. Taking that into account, we utilize the attention mechanism toefficiently aggregate the hand features with their appropriate spatio-temporalcontext for better sign recognition. We found that by doing so the model isable to identify the essential Sign Language components that revolve around thedominant hand and the face areas. We test our model on the benchmark datasetRWTH-PHOENIX-Weather 2014, yielding competitive results.

Quick Read (beta)

loading the full paper ...