Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation

Abstract

Appearance-based gaze estimation has been actively studied in recent years.However, its generalization performance for unseen head poses is still asignificant limitation for existing methods. This work proposes a generalizablemulti-view gaze estimation task and a cross-view feature fusion method toaddress this issue. In addition to paired images, our method takes the relativerotation matrix between two cameras as additional input. The proposed networklearns to extract rotatable feature representation by using relative rotationas a constraint and adaptively fuses the rotatable features via stacked fusionmodules. This simple yet efficient approach significantly improvesgeneralization performance under unseen head poses without significantlyincreasing computational cost. The model can be trained with randomcombinations of cameras without fixing the positioning and can generalize tounseen camera pairs during inference. Through experiments using multipledatasets, we demonstrate the advantage of the proposed method over baselinemethods, including state-of-the-art domain generalization approaches. The codewill be available at \url{https://github.com/ut-vision/Rot-MVGaze}.

Quick Read (beta)

loading the full paper ...