On the Weaknesses of Reinforcement Learning for Neural Machine Translation

Abstract

Reinforcement learning (RL) is frequently used to increase performance intext generation tasks, including machine translation (MT), notably through theuse of Minimum Risk Training (MRT) and Generative Adversarial Networks (GAN).However, little is known about what and how these methods learn in the contextof MT. We prove that one of the most common RL methods for MT does not optimizethe expected reward, as well as show that other methods take an infeasibly longtime to converge. In fact, our results suggest that RL practices in MT arelikely to improve performance only where the pre-trained parameters are alreadyclose to yielding the correct translation. Our findings further suggest thatobserved gains may be due to effects unrelated to the training signal,concretely, changes in the shape of the distribution curve.

Quick Read (beta)

loading the full paper ...