Continual Learning Under Language Shift

Abstract

The recent increase in data and model scale for language model pre-traininghas led to huge training costs. In scenarios where new data become availableover time, updating a model instead of fully retraining it would thereforeprovide significant gains. We study the pros and cons of updating a languagemodel when new data comes from new languages -- the case of continual learningunder language shift. Starting from a monolingual English language model, weincrementally add data from Danish, Icelandic, and Norwegian to investigate howforward and backward transfer effects depend on pre-training order andcharacteristics of languages, for three different model sizes. Our results showthat, while forward transfer is largely positive and independent of languageorder, backward transfer can be positive or negative depending on the order andcharacteristics of new languages. We explore a number of potentiallyexplanatory factors and find that a combination of language contamination andsyntactic similarity best fits our results.

Quick Read (beta)

loading the full paper ...