Speaker Diarization as a Fully Online Learning Problem in MiniVox

Abstract

We proposed a novel AI framework to conduct real-time multi-speakerdiarization and recognition without prior registration and pretraining in afully online learning setting. Our contributions are two-fold. First, weproposed a new benchmark to evaluate the rarely studied fully online speakerdiarization problem. We built upon existing datasets of real world utterancesto automatically curate MiniVox, an experimental environment which generatesinfinite configurations of continuous multi-speaker speech stream. Secondly, weconsidered the practical problem of online learning with episodically revealedrewards and introduced a solution based on semi-supervised and self-supervisedlearning methods. Lastly, we provided a workable web-based recognition systemwhich interactively handles the cold start problem of new user's addition bytransferring representations of old arms to new ones with an extendablecontextual bandit. We demonstrated that our proposed method obtained robustperformance in the online MiniVox framework.

Quick Read (beta)

loading the full paper ...