Abstract
In this paper, we study current and upcoming frontiers across the landscapeof skeleton-based human action recognition. To study skeleton-actionrecognition in the wild, we introduce Skeletics-152, a curated and 3-Dpose-annotated subset of RGB videos sourced from Kinetics-700, a large-scaleaction dataset. We extend our study to include out-of-context actions byintroducing Skeleton-Mimetics, a dataset derived from the recently introducedMimetics dataset. We also introduce Metaphorics, a dataset with caption-styleannotated YouTube videos of the popular social game Dumb Charades andinterpretative dance performances. We benchmark state-of-the-art models on theNTU-120 dataset and provide multi-layered assessment of the results. Theresults from benchmarking the top performers of NTU-120 on the newly introduceddatasets reveal the challenges and domain gap induced by actions in the wild.Overall, our work characterizes the strengths and limitations of existingapproaches and datasets. Via the introduced datasets, our work enables newfrontiers for human action recognition.