Speech Resources in the Tamasheq Language

  • 2022-01-14 09:26:49
  • Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier, Yannick Estève
  • 0

Abstract

In this paper we present two datasets for Tamasheq, a developing languagemainly spoken in Mali and Niger. These two datasets were made available for theIWSLT 2022 low-resource speech translation track, and they consist ofcollections of radio recordings from the Studio Kalangou (Niger) and StudioTamani (Mali) daily broadcast news. We share (i) a massive amount of unlabeledaudio data (671 hours) in five languages: French from Niger, Fulfulde, Hausa,Tamasheq and Zarma, and (ii) a smaller parallel corpus of audio recordings (17hours) in Tamasheq, with utterance-level translations in the French language.All this data is shared under the Creative Commons BY-NC-ND 3.0 license. Wehope these resources will inspire the speech community to develop and benchmarkmodels using the Tamasheq language.

 

Quick Read (beta)

loading the full paper ...