Abstract
The extensive utilization of large language models (LLMs) underscores thecrucial necessity for precise and contemporary knowledge embedded within theirintrinsic parameters. Existing research on knowledge editing primarilyconcentrates on monolingual scenarios, neglecting the complexities presented bymultilingual contexts and multi-hop reasoning. To address these challenges, ourstudy introduces MLaKE (Multilingual Language Knowledge Editing), a novelbenchmark comprising 4072 multi-hop and 5360 single-hop questions designed toevaluate the adaptability of knowledge editing methods across five languages:English, Chinese, Japanese, French, and German. MLaKE aggregates fact chainsfrom Wikipedia across languages and utilizes LLMs to generate questions in bothfree-form and multiple-choice. We evaluate the multilingual knowledge editinggeneralization capabilities of existing methods on MLaKE. Existing knowledgeediting methods demonstrate higher success rates in English samples compared toother languages. However, their generalization capabilities are limited inmulti-language experiments. Notably, existing knowledge editing methods oftenshow relatively high generalization for languages within the same languagefamily compared to languages from different language families. These resultsunderscore the imperative need for advancements in multilingual knowledgeediting and we hope MLaKE can serve as a valuable resource for benchmarking andsolution development.