Abstract
Human action recognition in 3D skeleton sequences has attracted a lot ofresearch attention. Recently, Long Short-Term Memory (LSTM) networks have shownpromising performance in this task due to their strengths in modeling thedependencies and dynamics in sequential data. As not all skeletal joints areinformative for action recognition, and the irrelevant joints often bring noisewhich can degrade the performance, we need to pay more attention to theinformative ones. However, the original LSTM network does not have explicitattention ability. In this paper, we propose a new class of LSTM network,Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based actionrecognition. This network is capable of selectively focusing on the informativejoints in each frame of each skeleton sequence by using a global context memorycell. To further improve the attention capability of our network, we alsointroduce a recurrent attention mechanism, with which the attention performanceof the network can be enhanced progressively. Moreover, we propose a stepwisetraining scheme in order to train our network effectively. Our approachachieves state-of-the-art performance on five challenging benchmark datasetsfor skeleton based action recognition.