Abstract
In this paper, we address the challenge of optimizing training setups forLarge Language Models (LLMs) of low-resource language with a limited amount ofcorpus. Existing works adopt multi-epoch, multi-lingual, and two-stage trainingto utilize the limited target language corpus efficiently. However, there isstill a lack of understanding about the optimal hyperparameter setups forcombining these three approaches to train LLMs. We exhaustively exploretraining setups for low-resource language LLM, combining these threeapproaches, and found the following insights for efficiently reducing the costof hyperparameter search: (1) As the amount of target language corpusdecreases, the optimal training approach shifts from monolingual single-stagetraining to multi-lingual two-stage training at a compute budget dependentthreshold. (2) The optimal model scale remains stable regardless of the amountof target language corpus, allowing the use of the compute-optimal scale ofmonolingual training. (3) The optimal number of epochs can be extrapolated fromsmaller-scale experiments to larger scale using our proposed model. Also, weprovide evidence that, in single-stage training, the target language validationloss follows a power law with respect to the target language ratio, with anexponent independent of the amount of data, model scale, and language pair.