Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

  • 2021-11-22 20:48:53
  • Ondrej Klejch, Electra Wallington, Peter Bell
  0


We present a method for cross-lingual training an ASR system using absolutelyno transcribed training data from the target language, and with no phoneticknowledge of the language in question. Our approach uses a novel application ofa decipherment algorithm, which operates given only unpaired speech and textdata from the target language. We apply this decipherment to phone sequencesgenerated by a universal phone recogniser trained on out-of-language speechcorpora, which we follow with flat-start semi-supervised training to obtain anacoustic model for the new language. To the best of our knowledge, this is thefirst practical approach to zero-resource cross-lingual ASR which does not relyon any hand-crafted phonetic information. We carry out experiments on readspeech from the GlobalPhone corpus, and show that it is possible to learn adecipherment model on just 20 minutes of data from the target language. Whenused to generate pseudo-labels for semi-supervised training, we obtain WERsthat range from 25% to just 5% absolute worse than the equivalent fullysupervised models trained on the same data.


