Reversible Graph Neural Network-based Reaction Distribution Learning for Multiple Appropriate Facial Reactions Generation

Abstract

Generating facial reactions in a human-human dyadic interaction is complexand highly dependent on the context since more than one facial reactions can beappropriate for the speaker's behaviour. This has challenged existing machinelearning (ML) methods, whose training strategies enforce models to reproduce aspecific (not multiple) facial reaction from each input speaker behaviour. Thispaper proposes the first multiple appropriate facial reaction generationframework that re-formulates the one-to-many mapping facial reaction generationproblem as a one-to-one mapping problem. This means that we approach thisproblem by considering the generation of a distribution of the listener'sappropriate facial reactions instead of multiple different appropriate facialreactions, i.e., 'many' appropriate facial reaction labels are summarised as'one' distribution label during training. Our model consists of a perceptualprocessor, a cognitive processor, and a motor processor. The motor processor isimplemented with a novel Reversible Multi-dimensional Edge Graph Neural Network(REGNN). This allows us to obtain a distribution of appropriate real facialreactions during the training process, enabling the cognitive processor to betrained to predict the appropriate facial reaction distribution. At theinference stage, the REGNN decodes an appropriate facial reaction by using thisdistribution as input. Experimental results demonstrate that our approachoutperforms existing models in generating more appropriate, realistic, andsynchronized facial reactions. The improved performance is largely attributedto the proposed appropriate facial reaction distribution learning strategy andthe use of a REGNN. The code is available athttps://github.com/TongXu-05/REGNN-Multiple-Appropriate-Facial-Reaction-Generation.

Quick Read (beta)

loading the full paper ...