Abstract
Human affect recognition has been a significant topic in psychophysics andcomputer vision. However, the currently published datasets have manylimitations. For example, most datasets contain frames that contain onlyinformation about facial expressions. Due to the limitations of previousdatasets, it is very hard to either understand the mechanisms for affectrecognition of humans or generalize well on common cases for computer visionmodels trained on those datasets. In this work, we introduce a brand new largedataset, the Video-based Emotion and Affect Tracking in Context Dataset(VEATIC), that can conquer the limitations of the previous datasets. VEATIC has124 video clips from Hollywood movies, documentaries, and home videos withcontinuous valence and arousal ratings of each frame via real-time annotation.Along with the dataset, we propose a new computer vision task to infer theaffect of the selected character via both context and character information ineach video frame. Additionally, we propose a simple model to benchmark this newcomputer vision task. We also compare the performance of the pretrained modelusing our dataset with other similar datasets. Experiments show the competingresults of our pretrained model via VEATIC, indicating the generalizability ofVEATIC. Our dataset is available at https://veatic.github.io.