Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Abstract

A common challenge towards the adaptability of Large Language Models (LLMs)is their ability to learn new languages over time without hampering the model'sperformance on languages in which the model is already proficient (usuallyEnglish). Continual fine-tuning (CFT) is the process of sequentiallyfine-tuning an LLM to enable the model to adapt to downstream tasks withvarying data distributions and time shifts. This paper focuses on the languageadaptability of LLMs through CFT. We study a two-phase CFT process in which anEnglish-only end-to-end fine-tuned LLM from Phase 1 (predominantly TaskAbility) is sequentially fine-tuned on a multilingual dataset -- comprisingtask data in new languages -- in Phase 2 (predominantly Language Ability). Weobserve that the ``similarity'' of Phase 2 tasks with Phase 1 determines theLLM's adaptability. For similar phase-wise datasets, the LLM after Phase 2 doesnot show deterioration in task ability. In contrast, when the phase-wisedatasets are not similar, the LLM's task ability deteriorates. We test ourhypothesis on the open-source \mis\ and \llm\ models with multiple phase-wisedataset pairs. To address the deterioration, we analyze tailored variants oftwo CFT methods: layer freezing and generative replay. Our findings demonstratetheir effectiveness in enhancing the language ability of LLMs while preservingtask performance, in comparison to relevant baselines.

Quick Read (beta)

loading the full paper ...