Few-Shot Keyword Spotting in Any Language

Abstract

We introduce a few-shot transfer learning method for keyword spotting in anylanguage. Leveraging open speech corpora in nine languages, we automate theextraction of a large multilingual keyword bank and use it to train anembedding model. With just five training examples, we fine-tune the embeddingmodel for keyword spotting and achieve an average F1 score of 0.75 on keywordclassification for 180 new keywords unseen by the embedding model in these ninelanguages. This embedding model also generalizes to new languages. We achievean average F1 score of 0.65 on 5-shot models for 260 keywords sampled across 13new languages unseen by the embedding model. We investigate streaming accuracyfor our 5-shot models in two contexts: keyword spotting and keyword search.Across 440 keywords in 22 languages, we achieve an average streaming keywordspotting accuracy of 85.2% with a false acceptance rate of 1.2%, and observepromising initial results on keyword search.

Quick Read (beta)

loading the full paper ...