Abstract
Multilingual NMT is a viable solution for translating low-resource languages(LRLs) when data from high-resource languages (HRLs) from the same languagefamily is available. However, the training schedule, i.e. the order ofpresentation of languages, has an impact on the quality of such systems. Here,in a many-to-one translation setting, we propose to apply two algorithms thatuse reinforcement learning to optimize the training schedule of NMT: (1)Teacher-Student Curriculum Learning and (2) Deep Q Network. The former uses anexponentially smoothed estimate of the returns of each action based on the losson monolingual or multilingual development subsets, while the latter estimatesrewards using an additional neural network trained from the history of actionsselected in different states of the system, together with the rewards received.On a 8-to-1 translation dataset with LRLs and HRLs, our second method improvesBLEU and COMET scores with respect to both random selection of monolingualbatches and shuffled multilingual batches, by adjusting the number ofpresentations of LRL vs. HRL batches.