Abstract
In this paper, we evaluate the capacity of current language technologies tounderstand Basque and Spanish language varieties. We use Natural LanguageInference (NLI) as a pivot task and introduce a novel, manually-curatedparallel dataset in Basque and Spanish, along with their respective variants.Our empirical analysis of crosslingual and in-context learning experimentsusing encoder-only and decoder-based Large Language Models (LLMs) shows aperformance drop when handling linguistic variation, especially in Basque.Error analysis suggests that this decline is not due to lexical overlap, butrather to the linguistic variation itself. Further ablation experimentsindicate that encoder-only models particularly struggle with Western Basque,which aligns with linguistic theory that identifies peripheral dialects (e.g.,Western) as more distant from the standard. All data and code are publiclyavailable.