CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models

  • 2025-01-23 19:38:53
  • Shangda Wu, Yashan Wang, Ruibin Yuan, Zhancheng Guo, Xu Tan, Ge Zhang, Monan Zhou, Jing Chen, Xuefeng Mu, Yuejie Gao, Yuanliang Dong, Jiafeng Liu, Xiaobing Li, Feng Yu, Maosong Sun
  • 0

Abstract

Challenges in managing linguistic diversity and integrating various musicalmodalities are faced by current music information retrieval systems. Theselimitations reduce their effectiveness in a global, multimodal musicenvironment. To address these issues, we introduce CLaMP 2, a system compatiblewith 101 languages that supports both ABC notation (a text-based musicalnotation format) and MIDI (Musical Instrument Digital Interface) for musicinformation retrieval. CLaMP 2, pre-trained on 1.5 million ABC-MIDI-texttriplets, includes a multilingual text encoder and a multimodal music encoderaligned via contrastive learning. By leveraging large language models, weobtain refined and consistent multilingual descriptions at scale, significantlyreducing textual noise and balancing language distribution. Our experimentsshow that CLaMP 2 achieves state-of-the-art results in both multilingualsemantic search and music classification across modalities, thus establishing anew standard for inclusive and global music information retrieval.

 

Quick Read (beta)

loading the full paper ...