Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish

Abstract

Understanding procedural natural language (e.g., step-by-step instructions)is a crucial step to execution and planning. However, while there are amplecorpora and downstream tasks available in English, the field lacks suchresources for most languages. To address this gap, we conduct a case study onTurkish procedural texts. We first expand the number of tutorials in TurkishwikiHow from 2,000 to 52,000 using automated translation tools, where thetranslation quality and loyalty to the original meaning are validated by a teamof experts on a random set. Then, we generate several downstream tasks on thecorpus, such as linking actions, goal inference, and summarization. To tacklethese tasks, we implement strong baseline models via fine-tuning largelanguage-specific models such as TR-BART and BERTurk, as well as multilingualmodels such as mBART, mT5, and XLM. We find that language-specific modelsconsistently outperform their multilingual models by a significant marginacross most procedural language understanding (PLU) tasks. We release ourcorpus, downstream tasks and the baseline models with https://github.com/GGLAB-KU/turkish-plu.

Quick Read (beta)

loading the full paper ...