Abstract
Labeling is the cornerstone of supervised machine learning, which has beenexploited in a plethora of various applications, with sign language recognitionbeing one of them. However, such algorithms must be fed with a huge amount ofconsistently labeled data during the training process to elaborate awell-generalizing model. In addition, there is a great need for an automatedsolution that works with any nationally diversified sign language. Althoughthere are language-agnostic transcription systems, such as the Hamburg SignLanguage Notation System (HamNoSys) that describe the signer's initial positionand body movement instead of the glosses' meanings, there are still issues withproviding accurate and reliable labels for every real-world use case. In thiscontext, the industry relies heavily on manual attribution and labeling of theavailable video data. In this work, we tackle this issue and thoroughly analyzethe HamNoSys labels provided by various maintainers of open sign languagecorpora in five sign languages, in order to examine the challenges encounteredin labeling video data. We also investigate the consistency and objectivity ofHamNoSys-based labels for the purpose of training machine learning models. Ourfindings provide valuable insights into the limitations of the current labelingmethods and pave the way for future research on developing more accurate andefficient solutions for sign language recognition.