Efficient Learning of Harmonic Priors for Pitch Detection in Polyphonic Music

  • 2018-11-16 16:58:37
  • Pablo A. Alvarado, Dan Stowell
  • 0

Abstract

Automatic music transcription (AMT) aims to infer a latent symbolicrepresentation of a piece of music (piano-roll), given a corresponding observedaudio recording. Transcribing polyphonic music (when multiple notes are playedsimultaneously) is a challenging problem, due to highly structured overlappingbetween harmonics. We study whether the introduction of physically inspiredGaussian process (GP) priors into audio content analysis models improves theextraction of patterns required for AMT. Audio signals are described as alinear combination of sources. Each source is decomposed into the product of anamplitude-envelope, and a quasi-periodic component process. We introduce theMat\'ern spectral mixture (MSM) kernel for describing frequency content ofsingles notes. We consider two different regression approaches. In the sigmoidmodel every pitch-activation is independently non-linear transformed. In thesoftmax model several activation GPs are jointly non-linearly transformed. Thisintroduce cross-correlation between activations. We use variational Bayes forapproximate inference. We empirically evaluate how these models work inpractice transcribing polyphonic music. We demonstrate that rather thanencourage dependency between activations, what is relevant for improving pitchdetection is to learnt priors that fit the frequency content of the soundevents to detect.

 

Quick Read (beta)

loading the full paper ...