A Pilot Study of GSLM-based Simulation of Foreign Accentuation Only Using Native Speech Corpora

Abstract

We propose a method of simulating the human process of foreign accentuationusing Generative Spoken Language Model (GSLM) only with native speech corpora.When one listens to spoken words of a foreign language and repeats them, therepeated speech is often with the accent of that listener's L1. This is said tobe because the spoken words are mentally represented as a sequence ofphonological units of the L1, and those units are used for oral reproduction.We simulate this process by inputting speech of language A into GSLM oflanguage B to add B's accent onto the input speech. The process of running ASRof the L1 for foreign input speech and giving the ASR result to TTS of the L1can be viewed as a naive implementation of this approach. The results of ourexperiments show that the synthesized accent of the output speech is highlynatural, compared to real samples of A generated by speakers whose L1 is B, andthat the degree of accentuation is controllable.

Quick Read (beta)

loading the full paper ...