SF-Net: Structured Feature Network for Continuous Sign Language Recognition

Abstract

Continuous sign language recognition (SLR) aims to translate a signingsequence into a sentence. It is very challenging as sign language is rich invocabulary, while many among them contain similar gestures and motions.Moreover, it is weakly supervised as the alignment of signing glosses is notavailable. In this paper, we propose Structured Feature Network (SF-Net) toaddress these challenges by effectively learn multiple levels of semanticinformation in the data. The proposed SF-Net extracts features in a structuredmanner and gradually encodes information at the frame level, the gloss leveland the sentence level into the feature representation. The proposed SF-Net canbe trained end-to-end without the help of other models or pre-training. Wetested the proposed SF-Net on two large scale public SLR datasets collectedfrom different continuous SLR scenarios. Results show that the proposed SF-Netclearly outperforms previous sequence level supervision based methods in termsof both accuracy and adaptability.

Quick Read (beta)

loading the full paper ...