A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery

Abstract

In this work, we propose a hierarchical subspace model for acoustic unitdiscovery. In this approach, we frame the task as one of learning embeddings ona low-dimensional phonetic subspace, and simultaneously specify the subspaceitself as an embedding on a hyper-subspace. We train the hyper-subspace on aset of transcribed languages and transfer it to the target language. In thetarget language, we infer both the language and unit embeddings in anunsupervised manner, and in so doing, we simultaneously learn a subspace ofunits specific to that language and the units that dwell on it. We conduct ourexperiments on TIMIT and two low-resource languages: Mboshi and Yoruba. Resultsshow that our model outperforms major acoustic unit discovery techniques, bothin terms of clustering quality and segmentation accuracy.

Quick Read (beta)

loading the full paper ...