The Cone of Silence: Speech Separation by Localization

  • 2020-10-12 20:19:23
  • Teerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira Kemelmacher-Shlizerman
Given a multi-microphone recording of an unknown number of speakers talkingconcurrently, we simultaneously localize the sources and separate theindividual speakers. At the core of our method is a deep network, in thewaveform domain, which isolates sources within an angular region $\theta \pmw/2$, given an angle of interest $\theta$ and angular window size $w$. Byexponentially decreasing $w$, we can perform a binary search to localize andseparate all sources in logarithmic time. Our algorithm allows for an arbitrarynumber of potentially moving speakers at test time, including more speakersthan seen during training. Experiments demonstrate state-of-the-art performancefor both source separation and source localization, particularly in high levelsof background noise.


