Abstract
Advances in speech and language technologies enable tools such asvoice-search, text-to-speech, speech recognition and machine translation. Theseare however only available for high resource languages like English, French orChinese. Without foundational digital resources for African languages, whichare considered low-resource in the digital context, these advanced tools remainout of reach. This work details the AI4D - African Language Program, a 3-partproject that 1) incentivised the crowd-sourcing, collection and curation oflanguage datasets through an online quantitative and qualitative challenge, 2)supported research fellows for a period of 3-4 months to create datasetsannotated for NLP tasks, and 3) hosted competitive Machine Learning challengeson the basis of these datasets. Key outcomes of the work so far include 1) thecreation of 9+ open source, African language datasets annotated for a varietyof ML tasks, and 2) the creation of baseline models for these datasets throughhosting of competitive ML challenges.