Abstract
Grammatical Error Detection (GED) methods rely heavily on human annotatederror corpora. However, these annotations are unavailable in many low-resourcelanguages. In this paper, we investigate GED in this context. Leveraging thezero-shot cross-lingual transfer capabilities of multilingual pre-trainedlanguage models, we train a model using data from a diverse set of languages togenerate synthetic errors in other languages. These synthetic error corpora arethen used to train a GED model. Specifically we propose a two-stage fine-tuningpipeline where the GED model is first fine-tuned on multilingual synthetic datafrom target languages followed by fine-tuning on human-annotated GED corporafrom source languages. This approach outperforms current state-of-the-artannotation-free GED methods. We also analyse the errors produced by our methodand other strong baselines, finding that our approach produces errors that aremore diverse and more similar to human errors.