TransProQA: an LLM-based literary Translation evaluation metric with Professional Question Answering

Abstract

The impact of Large Language Models (LLMs) has extended into literarydomains. However, existing evaluation metrics prioritize mechanical accuracyover artistic expression and tend to overrate machine translation (MT) as beingsuperior to experienced professional human translation. In the long run, thisbias could result in a permanent decline in translation quality and culturalauthenticity. In response to the urgent need for a specialized literaryevaluation metric, we introduce TransProQA, a novel, reference-free, LLM-basedquestion-answering (QA) framework designed specifically for literarytranslation evaluation. TransProQA uniquely integrates insights fromprofessional literary translators and researchers, focusing on criticalelements in literary quality assessment such as literary devices, culturalunderstanding, and authorial voice. Our extensive evaluation shows that whileliterary-finetuned XCOMET-XL yields marginal gains, TransProQA substantiallyoutperforms current metrics, achieving up to 0.07 gain in correlation (ACC-EQand Kendall's tau) and surpassing the best state-of-the-art (SOTA) metrics byover 15 points in adequacy assessments. Incorporating professional translatorinsights as weights further improves performance, highlighting the value oftranslator inputs. Notably, TransProQA approaches human-level evaluationperformance comparable to trained linguistic annotators. It demonstrates broadapplicability to open-source models such as LLaMA3.3-70b and Qwen2.5-32b,indicating its potential as an accessible and training-free literary evaluationmetric and a valuable tool for evaluating texts that require local processingdue to copyright or ethical considerations.

Quick Read (beta)

loading the full paper ...