Abstract
Perch is a performant pre-trained model for bioacoustics. It was trained insupervised fashion, providing both off-the-shelf classification scores forthousands of vocalizing species as well as strong embeddings for transferlearning. In this new release, Perch 2.0, we expand from training exclusivelyon avian species to a large multi-taxa dataset. The model is trained withself-distillation using a prototype-learning classifier as well as a newsource-prediction training criterion. Perch 2.0 obtains state-of-the-artperformance on the BirdSet and BEANS benchmarks. It also outperformsspecialized marine models on marine transfer learning tasks, despite havingalmost no marine training data. We present hypotheses as to why fine-grainedspecies classification is a particularly robust pre-training task forbioacoustics.