Abstract
In this paper, we present our approach to addressing the challenges of the7th ABAW competition. The competition comprises three sub-challenges: ValenceArousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU)detection. To tackle these challenges, we employ state-of-the-art models toextract powerful visual features. Subsequently, a Transformer Encoder isutilized to integrate these features for the VA, Expr, and AU sub-challenges.To mitigate the impact of varying feature dimensions, we introduce an affinemodule to align the features to a common dimension. Overall, our resultssignificantly outperform the baselines.
Quick Read (beta)
loading the full paper ...