Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

  • 2021-02-25 18:54:30
  • Mariana-Iuliana Georgescu, Radu Tudor Ionescu
  • 0

Abstract

In this paper, we study the task of facial expression recognition understrong occlusion. We are particularly interested in cases where 50% of the faceis occluded, e.g. when the subject wears a Virtual Reality (VR) headset. Whileprevious studies show that pre-training convolutional neural networks (CNNs) onfully-visible (non-occluded) faces improves the accuracy, we propose to employknowledge distillation to achieve further improvements. First of all, we employthe classic teacher-student training strategy, in which the teacher is a CNNtrained on fully-visible faces and the student is a CNN trained on occludedfaces. Second of all, we propose a new approach for knowledge distillationbased on triplet loss. During training, the goal is to reduce the distancebetween an anchor embedding, produced by a student CNN that takes occludedfaces as input, and a positive embedding (from the same class as the anchor),produced by a teacher CNN trained on fully-visible faces, so that it becomessmaller than the distance between the anchor and a negative embedding (from adifferent class than the anchor), produced by the student CNN. Third of all, wepropose to combine the distilled embeddings obtained through the classicteacher-student strategy and our novel teacher-student strategy based ontriplet loss into a single embedding vector. We conduct experiments on twobenchmarks, FER+ and AffectNet, with two CNN architectures, VGG-f and VGG-face,showing that knowledge distillation can bring significant improvements over thestate-of-the-art methods designed for occluded faces in the VR setting.

 

Quick Read (beta)

loading the full paper ...