LEPOR: An Augmented Machine Translation Evaluation Metric

Abstract

Machine translation (MT) was developed as one of the hottest research topicsin the natural language processing (NLP) literature. One important issue in MTis that how to evaluate the MT system reasonably and tell us whether thetranslation system makes an improvement or not. The traditional manual judgmentmethods are expensive, time-consuming, unrepeatable, and sometimes with lowagreement. On the other hand, the popular automatic MT evaluation methods havesome weaknesses. Firstly, they tend to perform well on the language pairs withEnglish as the target language, but weak when English is used as source.Secondly, some methods rely on many additional linguistic features to achievegood performance, which makes the metric unable to replicate and apply to otherlanguage pairs easily. Thirdly, some popular metrics utilize incomprehensivefactors, which result in low performance on some practical tasks. In thisthesis, to address the existing problems, we design novel MT evaluation methodsand investigate their performances on different languages. Firstly, we designaugmented factors to yield highly accurate evaluation. Secondly, we design atunable evaluation model where weighting of factors can be optimized accordingto the characteristics of languages. Thirdly, in the enhanced version of ourmethods, we design concise linguistic feature using part-of-speech (POS) toshow that our methods can yield even higher performance when using someexternal linguistic resources. Finally, we introduce the practical performanceof our metrics in the ACL-WMT workshop shared tasks, which show that theproposed methods are robust across different languages. In addition, we alsopresent some novel work on quality estimation of MT without using referencetranslations including the usage of probability models of Na\"ive Bayes (NB),support vector machine (SVM) classification algorithms, and CRFs.

Quick Read (beta)

loading the full paper ...