AI4D -- African Language Program

  • 2021-04-06 13:51:16
  • Kathleen Siminyu, Godson Kalipe, Davor Orlic, Jade Abbott, Vukosi Marivate, Sackey Freshia, Prateek Sibal, Bhanu Neupane, David I. Adelani, Amelia Taylor, Jamiil Toure ALI, Kevin Degila, Momboladji Balogoun, Thierno Ibrahima DIOP, Davis David, Chayma Fourati, Hatem Haddad, Malek Naski
  • 18

Abstract

Advances in speech and language technologies enable tools such asvoice-search, text-to-speech, speech recognition and machine translation. Theseare however only available for high resource languages like English, French orChinese. Without foundational digital resources for African languages, whichare considered low-resource in the digital context, these advanced tools remainout of reach. This work details the AI4D - African Language Program, a 3-partproject that 1) incentivised the crowd-sourcing, collection and curation oflanguage datasets through an online quantitative and qualitative challenge, 2)supported research fellows for a period of 3-4 months to create datasetsannotated for NLP tasks, and 3) hosted competitive Machine Learning challengeson the basis of these datasets. Key outcomes of the work so far include 1) thecreation of 9+ open source, African language datasets annotated for a varietyof ML tasks, and 2) the creation of baseline models for these datasets throughhosting of competitive ML challenges.

 

Quick Read (beta)

loading the full paper ...