Identifying Computer-Translated Paragraphs using Coherence Features

  • 2018-12-28 05:35:31
  • Hoang-Quoc Nguyen-Son, Ngoc-Dung T. Tieu, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
  • 2


We have developed a method for extracting the coherence features from aparagraph by matching similar words in its sentences. We conducted anexperiment with a parallel German corpus containing 2000 human-created and 2000machine-translated paragraphs. The result showed that our method achieved thebest performance (accuracy = 72.3%, equal error rate = 29.8%) when it iscompared with previous methods on various computer-generated text includingtranslation and paper generation (best accuracy = 67.9%, equal error rate =32.0%). Experiments on Dutch, another rich resource language, and a lowresource one (Japanese) attained similar performances. It demonstrated theefficiency of the coherence features at distinguishing computer-translated fromhuman-created paragraphs on diverse languages.


