FarsInstruct: Empowering Large Language Models for Persian Instruction Understanding

  • 2024-07-17 21:03:55
  • Hojjat Mokhtarabadi, Ziba Zamani, Abbas Maazallahi, Hossein Manshaei
  • 0

Abstract

Instruction-tuned large language models, such as T0, have demonstratedremarkable capabilities in following instructions across various domains.However, their proficiency remains notably deficient in many low-resourcelanguages. To address this challenge, we introduce FarsInstruct: acomprehensive instruction dataset designed to enhance the instruction-followingability of large language models specifically for the Persian language, asignificant yet underrepresented language globally. FarsInstruct encompasses awide range of task types and datasets, each containing a mix of straightforwardto complex manual written instructions, as well as translations from PublicPool of Prompts, ensuring a rich linguistic and cultural representation.Furthermore, we introduce Co-CoLA, a framework designed to enhance themulti-task adaptability of LoRA-tuned models. Through extensive experimentalanalyses, our study showcases the effectiveness of FarsInstruct dataset coupledwith training by Co-CoLA framework, in improving the performance of largelanguage models within the Persian context. As of the current writing,FarsInstruct comprises more than 200 templates across 21 distinct datasets, andwe intend to update it consistently, thus augmenting its applicability.

 

Quick Read (beta)

loading the full paper ...