Keypoint based Sign Language Translation without Glosses

Abstract

Sign Language Translation (SLT) is a task that has not been studiedrelatively much compared to the study of Sign Language Recognition (SLR).However, the SLR is a study that recognizes the unique grammar of signlanguage, which is different from the spoken language and has a problem thatnon-disabled people cannot easily interpret. So, we're going to solve theproblem of translating directly spoken language in sign language video. To thisend, we propose a new keypoint normalization method for performing translationbased on the skeleton point of the signer and robustly normalizing these pointsin sign language translation. It contributed to performance improvement by acustomized normalization method depending on the body parts. In addition, wepropose a stochastic frame selection method that enables frame augmentation andsampling at the same time. Finally, it is translated into the spoken languagethrough an Attention-based translation model. Our method can be applied tovarious datasets in a way that can be applied to datasets without glosses. Inaddition, quantitative experimental evaluation proved the excellence of ourmethod.

Quick Read (beta)

loading the full paper ...