Abstract
Multi-animal tracking is crucial for understanding animal ecology andbehavior. However, it remains a challenging task due to variations in habitat,motion patterns, and species appearance. Traditional approaches typicallyrequire extensive model fine-tuning and heuristic design for each applicationscenario. In this work, we explore the potential of recent vision foundationmodels for zero-shot multi-animal tracking. By combining a Grounding Dinoobject detector with the Segment Anything Model 2 (SAM 2) tracker and carefullydesigned heuristics, we develop a tracking framework that can be applied to newdatasets without any retraining or hyperparameter adaptation. Evaluations onChimpAct, Bird Flock Tracking, AnimalTrack, and a subset of GMOT-40 demonstratestrong and consistent performance across diverse species and environments. Thecode is available at https://github.com/ecker-lab/SAM2-Animal-Tracking.