Towards Deep Modeling of Music Semantics using EEG Regularizers

  • 2017-12-14 12:27:11
  • Francisco Raposo, David Martins de Matos, Ricardo Ribeiro, Suhua Tang, Yi Yu
  • 3

Abstract

Modeling of music audio semantics has been previously tackled throughlearning of mappings from audio data to high-level tags or latent unsupervisedspaces. The resulting semantic spaces are theoretically limited, either becausethe chosen high-level tags do not cover all of music semantics or because audiodata itself is not enough to determine music semantics. In this paper, wepropose a generic framework for semantics modeling that focuses on theperception of the listener, through EEG data, in addition to audio data. Weimplement this framework using a novel end-to-end 2-view Neural Network (NN)architecture and a Deep Canonical Correlation Analysis (DCCA) loss functionthat forces the semantic embedding spaces of both views to be maximallycorrelated. We also detail how the EEG dataset was collected and use it totrain our proposed model. We evaluate the learned semantic space in a transferlearning context, by using it as an audio feature extractor in an independentdataset and proxy task: music audio-lyrics cross-modal retrieval. We show thatour embedding model outperforms Spotify features and performs comparably to astate-of-the-art embedding model that was trained on 700 times more data. Wefurther propose improvements to the model that are likely to improve itsperformance.

 

Quick Read (beta)

loading the full paper ...