Abstract
Challenges in managing linguistic diversity and integrating various musicalmodalities are faced by current music information retrieval systems. Theselimitations reduce their effectiveness in a global, multimodal musicenvironment. To address these issues, we introduce CLaMP 2, a system compatiblewith 101 languages that supports both ABC notation (a text-based musicalnotation format) and MIDI (Musical Instrument Digital Interface) for musicinformation retrieval. CLaMP 2, pre-trained on 1.5 million ABC-MIDI-texttriplets, includes a multilingual text encoder and a multimodal music encoderaligned via contrastive learning. By leveraging large language models, weobtain refined and consistent multilingual descriptions at scale, significantlyreducing textual noise and balancing language distribution. Our experimentsshow that CLaMP 2 achieves state-of-the-art results in both multilingualsemantic search and music classification across modalities, thus establishing anew standard for inclusive and global music information retrieval.