Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network

Abstract

Isolated Sign Language Recognition (ISLR) is challenged by gestures that aremorphologically similar yet semantically distinct, a problem rooted in thecomplex interplay between hand shape and motion trajectory. Existing methods,often relying on a single reference frame, struggle to resolve this geometricambiguity. This paper introduces Dual-SignLanguageNet (DSLNet), adual-reference, dual-stream architecture that decouples and models gesturemorphology and trajectory in separate, complementary coordinate systems. Ourapproach utilizes a wrist-centric frame for view-invariant shape analysis and afacial-centric frame for context-aware trajectory modeling. These streams areprocessed by specialized networks-a topology-aware graph convolution for shapeand a Finsler geometry-based encoder for trajectory-and are integrated via ageometry-driven optimal transport fusion mechanism. DSLNet sets a newstate-of-the-art, achieving 93.70%, 89.97% and 99.79% accuracy on thechallenging WLASL-100, WLASL-300 and LSA64 datasets, respectively, withsignificantly fewer parameters than competing models.

Quick Read (beta)

loading the full paper ...