Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition

Abstract

Acquiring spatio-temporal states of an action is the most crucial step foraction classification. In this paper, we propose a data level fusion strategy,Motion Fused Frames (MFFs), designed to fuse motion information into staticimages as better representatives of spatio-temporal states of an action. MFFscan be used as input to any deep learning architecture with very littlemodification on the network. We evaluate MFFs on hand gesture recognition tasksusing three video datasets - Jester, ChaLearn LAP IsoGD and NVIDIA Dynamic HandGesture Datasets - which require capturing long-term temporal relations of handmovements. Our approach obtains very competitive performance on Jester andChaLearn benchmarks with the classification accuracies of 96.28% and 57.4%,respectively, while achieving state-of-the-art performance with 84.7% accuracyon NVIDIA benchmark.

Quick Read (beta)

loading the full paper ...