Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

Abstract

Ainu is an unwritten language that has been spoken by Ainu people who are oneof the ethnic groups in Japan. It is recognized as critically endangered byUNESCO and archiving and documentation of its language heritage is of paramountimportance. Although a considerable amount of voice recordings of Ainu folklorehas been produced and accumulated to save their culture, only a quite limitedparts of them are transcribed so far. Thus, we started a project of automaticspeech recognition (ASR) for the Ainu language in order to contribute to thedevelopment of annotated language archives. In this paper, we report speechcorpus development and the structure and performance of end-to-end ASR forAinu. We investigated four modeling units (phone, syllable, word piece, andword) and found that the syllable-based model performed best in terms of bothword and phone recognition accuracy, which were about 60% and over 85%respectively in speaker-open condition. Furthermore, word and phone accuracy of80% and 90% has been achieved in a speaker-closed setting. We also found outthat a multilingual ASR training with additional speech corpora of English andJapanese further improves the speaker-open test accuracy.

Quick Read (beta)

loading the full paper ...