How do languages influence each other? Studying cross-lingual data sharing during LLM fine-tuning

Abstract

Multilingual large language models (MLLMs) are jointly trained on data frommany different languages such that representation of individual languages canbenefit from other languages' data. Impressive performance on zero-shotcross-lingual transfer shows that these models are capable of exploiting datafrom other languages. Yet, it remains unclear to what extent, and under whichconditions, languages rely on each other's data. In this study, we use TracIn(Pruthi et al., 2020), a training data attribution (TDA) method, to retrievethe most influential training samples seen during multilingual fine-tuning fora particular test language. This allows us to analyse cross-lingual sharingmechanisms of MLLMs from a new perspective. While previous work studiedcross-lingual sharing at the level of model parameters, we present the firstapproach to study cross-lingual sharing at the data level. We find that MLLMsrely on data from multiple languages from the early stages of fine-tuning andthat this reliance gradually increases as fine-tuning progresses. We furtherstudy how different fine-tuning languages influence model performance on agiven test language and find that they can both reinforce and complement theknowledge acquired from data of the test language itself.

Quick Read (beta)

loading the full paper ...