Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese

Abstract

Large Language Models (LLMs) are increasingly being used to generatesynthetic data for training and evaluating models. However, it is unclearwhether they can generate a good quality of question answering (QA) datasetthat incorporates knowledge and cultural nuance embedded in a language,especially for low-resource languages. In this study, we investigate theeffectiveness of using LLMs in generating culturally relevant commonsense QAdatasets for Indonesian and Sundanese languages. To do so, we create datasetsfor these languages using various methods involving both LLMs and humanannotators, resulting in ~4.5K questions per language (~9K in total), makingour dataset the largest of its kind. Our experiments show that automatic dataadaptation from an existing English dataset is less effective for Sundanese.Interestingly, using the direct generation method on the target language, GPT-4Turbo can generate questions with adequate general knowledge in both languages,albeit not as culturally 'deep' as humans. We also observe a higher occurrenceof fluency errors in the Sundanese dataset, highlighting the discrepancybetween medium- and lower-resource languages.

Quick Read (beta)

loading the full paper ...