ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer

  • 2025-02-28 16:59:30
  • Omer Goldman, Uri Shaham, Dan Malkin, Sivan Eiger, Avinatan Hassidim, Yossi Matias, Joshua Maynez, Adi Mayrav Gilady, Jason Riesa, Shruti Rijhwani, Laura Rimell, Idan Szpektor, Reut Tsarfaty, Matan Eyal
  • 0

Abstract

To achieve equitable performance across languages, multilingual largelanguage models (LLMs) must be able to abstract knowledge beyond the languagein which it was acquired. However, the current literature lacks reliable waysto measure LLMs' capability of cross-lingual knowledge transfer. To that end,we present ECLeKTic, a multilingual closed-book QA (CBQA) dataset thatEvaluates Cross-Lingual Knowledge Transfer in a simple, black-box manner. Wedetected information with uneven coverage across languages by controlling forpresence and absence of Wikipedia articles in 12 languages. We generatedknowledge-seeking questions in a source language, for which the answer appearsin a relevant Wikipedia article and translated them to all other 11 languages,for which the respective Wikipedias lack equivalent articles. Assuming thatWikipedia reflects the prominent knowledge in the LLM's training data, to solveECLeKTic's CBQA task the model is required to transfer knowledge betweenlanguages. Experimenting with 8 LLMs, we show that SOTA models struggle toeffectively share knowledge across, languages even if they can predict theanswer well for queries in the same language the knowledge was acquired in.

 

Quick Read (beta)

loading the full paper ...