Bayesian Models for Unit Discovery on a Very Low Resource Language

  • 2018-02-20 15:35:32
  • Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur
  • 0

Abstract

Developing speech technologies for low-resource languages has become a veryactive research field over the last decade. Among others, Bayesian models haveshown some promising results on artificial examples but still lack of in situexperiments. Our work applies state-of-the-art Bayesian models to unsupervisedAcoustic Unit Discovery (AUD) in a real low-resource language scenario. We alsoshow that Bayesian models can naturally integrate information from otherresourceful languages by means of informative prior leading to more consistentdiscovered units. Finally, discovered acoustic units are used, either as the1-best sequence or as a lattice, to perform word segmentation. Wordsegmentation results show that this Bayesian approach clearly outperforms aSegmental-DTW baseline on the same corpus.

 

Quick Read (beta)

loading the full paper ...