Abstract
In skeleton-based action recognition, a key challenge is distinguishingbetween actions with similar trajectories of joints due to the lack ofimage-level details in skeletal representations. Recognizing that thedifferentiation of similar actions relies on subtle motion details in specificbody parts, we direct our approach to focus on the fine-grained motion of localskeleton components. To this end, we introduce ProtoGCN, a Graph ConvolutionalNetwork (GCN)-based model that breaks down the dynamics of entire skeletonsequences into a combination of learnable prototypes representing core motionpatterns of action units. By contrasting the reconstruction of prototypes,ProtoGCN can effectively identify and enhance the discriminative representationof similar actions. Without bells and whistles, ProtoGCN achievesstate-of-the-art performance on multiple benchmark datasets, including NTURGB+D, NTU RGB+D 120, Kinetics-Skeleton, and FineGYM, which demonstrates theeffectiveness of the proposed method. The code is available athttps://github.com/firework8/ProtoGCN.