Abstract
Computer vision analysis of camera trap video footage is essential forwildlife conservation, as captured behaviours offer some of the earliestindicators of changes in population health. Recently, several high-impactanimal behaviour datasets and methods have been introduced to encourage theiruse; however, the role of behaviour-correlated background information and itssignificant effect on out-of-distribution generalisation remain unexplored. Inresponse, we present the PanAf-FGBG dataset, featuring 20 hours of wildchimpanzee behaviours, recorded at over 350 individual camera locations.Uniquely, it pairs every video with a chimpanzee (referred to as a foregroundvideo) with a corresponding background video (with no chimpanzee) from the samecamera location. We present two views of the dataset: one with overlappingcamera locations and one with disjoint locations. This setup enables, for thefirst time, direct evaluation of in-distribution and out-of-distributionconditions, and for the impact of backgrounds on behaviour recognition modelsto be quantified. All clips come with rich behavioural annotations and metadataincluding unique camera IDs and detailed textual scene descriptions.Additionally, we establish several baselines and present a highly effectivelatent-space normalisation technique that boosts out-of-distributionperformance by +5.42% mAP for convolutional and +3.75% mAP fortransformer-based models. Finally, we provide an in-depth analysis on the roleof backgrounds in out-of-distribution behaviour recognition, including the sofar unexplored impact of background durations (i.e., the count of backgroundframes within foreground videos).