Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languages

  • 2020-09-28 17:57:40
  • Daniel J Wu, Andrew C Yang, Vinay U Prabhu
  • 1

Abstract

We present Afro-MNIST, a set of synthetic MNIST-style datasets for fourorthographies used in Afro-Asiatic and Niger-Congo languages: Ge`ez (Ethiopic),Vai, Osmanya, and N'Ko. These datasets serve as "drop-in" replacements forMNIST. We also describe and open-source a method for synthetic MNIST-styledataset generation from single examples of each digit. These datasets can befound at https://github.com/Daniel-Wu/AfroMNIST. We hope that MNIST-styledatasets will be developed for other numeral systems, and that these datasetsvitalize machine learning education in underrepresented nations in the researchcommunity.

 

Quick Read (beta)

loading the full paper ...