Abstract
Inspired by recent developments in natural language processing, we propose anovel approach to sign language processing based on phonological propertiesvalidated by American Sign Language users. By taking advantage of datasetscomposed of phonological data and people speaking sign language, we use apretrained deep model based on mesh reconstruction to extract the 3Dcoordinates of the signers keypoints. Then, we train standard statistical anddeep machine learning models in order to assign phonological classes to eachtemporal sequence of coordinates. Our paper introduces the idea of exploiting the phonological propertiesmanually assigned by sign language users to classify videos of peopleperforming signs by regressing a 3D mesh. We establish a new baseline for thisproblem based on the statistical distribution of 725 different signs. Ourbest-performing models achieve a micro-averaged F1-score of 58% for the majorlocation class and 70% for the sign type using statistical and deep learningalgorithms, compared to their corresponding baselines of 35% and 39%.