Abstract
Large language models (LLMs) have demonstrated impressive translationcapabilities even without being explicitly trained on parallel data. Thisremarkable property has led some to believe that parallel data is no longernecessary for building multilingual language models. While some attribute thisto the emergent abilities of LLMs due to scale, recent work suggests that it isactually caused by incidental bilingual signals present in the training data.Various methods have been proposed to maximize the utility of parallel data toenhance the multilingual capabilities of multilingual encoder-based andencoder-decoder language models. However, some decoder-based LLMs opt to ignoreparallel data instead. In this work, we conduct a systematic study on theimpact of adding parallel data on LLMs' multilingual capabilities, focusingspecifically on translation and multilingual common-sense reasoning. Throughcontrolled experiments, we demonstrate that parallel data can significantlyimprove LLMs' multilingual capabilities.