Abstract
This paper describes the corrections made to the FLORES evaluation (dev anddevtest) dataset for four African languages, namely Hausa, Northern Sotho(Sepedi), Xitsonga, and isiZulu. The original dataset, though groundbreaking inits coverage of low-resource languages, exhibited various inconsistencies andinaccuracies in the reviewed languages that could potentially hinder theintegrity of the evaluation of downstream tasks in natural language processing(NLP), especially machine translation. Through a meticulous review process bynative speakers, several corrections were identified and implemented, improvingthe overall quality and reliability of the dataset. For each language, weprovide a concise summary of the errors encountered and corrected and alsopresent some statistical analysis that measures the difference between theexisting and corrected datasets. We believe that our corrections improve thelinguistic accuracy and reliability of the data and, thereby, contribute to amore effective evaluation of NLP tasks involving the four African languages.Finally, we recommend that future translation efforts, particularly inlow-resource languages, prioritize the active involvement of native speakers atevery stage of the process to ensure linguistic accuracy and culturalrelevance.