Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition

Abstract

Spatio-temporal feature encoding is essential for encoding the dynamics invideo sequences. Recurrent neural networks, particularly long short-term memory(LSTM) units, have been popular as an efficient tool for encodingspatio-temporal features in sequences. In this work, we investigate the effectof mode variations on the encoded spatio-temporal features using LSTMs. We showthat the LSTM retains information related to the mode variation in thesequence, which is irrelevant to the task at hand (e.g. classification facialexpressions). Actually, the LSTM forget mechanism is not robust enough to modevariations and preserves information that could negatively affect the encodedspatio-temporal features. We propose the mode variational LSTM to encodespatio-temporal features robust to unseen modes of variation. The modevariational LSTM modifies the original LSTM structure by adding an additionalcell state that focuses on encoding the mode variation in the input sequence.To efficiently regulate what features should be stored in the additional cellstate, additional gating functionality is also introduced. The effectiveness ofthe proposed mode variational LSTM is verified using the facial expressionrecognition task. Comparative experiments on publicly available datasetsverified that the proposed mode variational LSTM outperforms existing methods.Moreover, a new dynamic facial expression dataset with different modes ofvariation, including various modes like pose and illumination variations, wascollected to comprehensively evaluate the proposed mode variational LSTM.Experimental results verified that the proposed mode variational LSTM encodesspatio-temporal features robust to unseen modes of variation.

Quick Read (beta)

loading the full paper ...