Abstract
Large language models (LLMs) are increasingly being adopted in educationalsettings. These applications expand beyond English, though current LLMs remainprimarily English-centric. In this work, we ascertain if their use in educationsettings in non-English languages is warranted. We evaluated the performance ofpopular LLMs on four educational tasks: identifying student misconceptions,providing targeted feedback, interactive tutoring, and grading translations insix languages (Hindi, Arabic, Farsi, Telugu, Ukrainian, Czech) in addition toEnglish. We find that the performance on these tasks somewhat corresponds tothe amount of language represented in training data, with lower-resourcelanguages having poorer task performance. Although the models performreasonably well in most languages, the frequent performance drop from Englishis significant. Thus, we recommend that practitioners first verify that the LLMworks well in the target language for their educational task before deployment.