Better Low-Resource Entity Recognition Through Translation and Annotation Fusion

Abstract

Pre-trained multilingual language models have enabled significantadvancements in cross-lingual transfer. However, these models often exhibit aperformance disparity when transferring from high-resource languages tolow-resource languages, especially for languages that are underrepresented ornot in the pre-training data. Motivated by the superior performance of thesemodels on high-resource languages compared to low-resource languages, weintroduce a Translation-and-fusion framework, which translates low-resourcelanguage text into a high-resource language for annotation using fullysupervised models before fusing the annotations back into the low-resourcelanguage. Based on this framework, we present TransFusion, a model trained tofuse predictions from a high-resource language to make robust predictions onlow-resource languages. We evaluate our methods on two low-resource namedentity recognition (NER) datasets, MasakhaNER2.0 and LORELEI NER, covering 25languages, and show consistent improvement up to +16 F$_1$ over Englishfine-tuning systems, achieving state-of-the-art performance compared toTranslate-train systems. Our analysis depicts the unique advantages of theTransFusion method which is robust to translation errors and source languageprediction errors, and complimentary to adapted multilingual language models.

Quick Read (beta)

loading the full paper ...