Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition

  • 2021-10-19 16:52:39
  • Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo
  • 1

Abstract

First person action recognition is becoming an increasingly researched areathanks to the rising popularity of wearable cameras. This is bringing to lightcross-domain issues that are yet to be addressed in this context. Indeed, theinformation extracted from learned representations suffers from an intrinsic"environmental bias". This strongly affects the ability to generalize to unseenscenarios, limiting the application of current methods to real settings wherelabeled data are not available during training. In this work, we introduce thefirst domain generalization approach for egocentric activity recognition, byproposing a new audio-visual loss, called Relative Norm Alignment loss. Itre-balances the contributions from the two modalities during training, overdifferent domains, by aligning their feature norm representations. Our approachleads to strong results in domain generalization on both EPIC-Kitchens-55 andEPIC-Kitchens-100, as demonstrated by extensive experiments, and can beextended to work also on domain adaptation settings with competitive results.

 

Quick Read (beta)

loading the full paper ...