Abstract
With little to no parallel data available for programming languages,unsupervised methods are well-suited to source code translation. However, themajority of unsupervised machine translation approaches rely onback-translation, a method developed in the context of natural languagetranslation and one that inherently involves training on noisy inputs.Unfortunately, source code is highly sensitive to small changes; a single tokencan result in compilation failures or erroneous programs, unlike naturallanguages where small inaccuracies may not change the meaning of a sentence. Toaddress this issue, we propose to leverage an automated unit-testing system tofilter out invalid translations, thereby creating a fully tested parallelcorpus. We found that fine-tuning an unsupervised model with this filtered dataset significantly reduces the noise in the translations so-generated,comfortably outperforming the state-of-the-art for all language pairs studied.In particular, for Java $\to$ Python and Python $\to$ C++ we outperform thebest previous methods by more than 16% and 24% respectively, reducing the errorrate by more than 35%.