Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation

Abstract

Policy gradient algorithms have found wide adoption in NLP, but have recentlybecome subject to criticism, doubting their suitability for NMT. Choshen et al.(2020) identify multiple weaknesses and suspect that their success isdetermined by the shape of output distributions rather than the reward. In thispaper, we revisit these claims and study them under a wider range ofconfigurations. Our experiments on in-domain and cross-domain adaptation revealthe importance of exploration and reward scaling, and provide empiricalcounter-evidence to these claims.

Quick Read (beta)

loading the full paper ...