A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation

  • 2022-05-12 17:24:07
  • Rachel M. Bittner, Juan José Bosch, David Rubinstein, Gabriel Meseguer-Brocal, Sebastian Ewert
  • 0


Automatic Music Transcription (AMT) has been recognized as a key enablingtechnology with a wide range of applications. Given the task's complexity, bestresults have typically been reported for systems focusing on specific settings,e.g. instrument-specific systems tend to yield improved results overinstrument-agnostic methods. Similarly, higher accuracy can be obtained whenonly estimating frame-wise $f_0$ values and neglecting the harder note eventdetection. Despite their high accuracy, such specialized systems often cannotbe deployed in the real-world. Storage and network constraints prohibit the useof multiple specialized models, while memory and run-time constraints limittheir complexity. In this paper, we propose a lightweight neural network formusical instrument transcription, which supports polyphonic outputs andgeneralizes to a wide variety of instruments (including vocals). Our model istrained to jointly predict frame-wise onsets, multipitch and note activations,and we experimentally show that this multi-output structure improves theresulting frame-level note accuracy. Despite its simplicity, benchmark resultsshow our system's note estimation to be substantially better than a comparablebaseline, and its frame-level accuracy to be only marginally below those ofspecialized state-of-the-art AMT systems. With this work we hope to encouragethe community to further investigate low-resource, instrument-agnostic AMTsystems.


