Cross-Lingual Fine-Grained Entity Typing

Abstract

The growth of cross-lingual pre-trained models has enabled NLP tools torapidly generalize to new languages. While these models have been applied totasks involving entities, their ability to explicitly predict typologicalfeatures of these entities across languages has not been established. In thispaper, we present a unified cross-lingual fine-grained entity typing modelcapable of handling over 100 languages and analyze this model's ability togeneralize to languages and entities unseen during training. We train thismodel on cross-lingual training data collected from Wikipedia hyperlinks inmultiple languages (training languages). During inference, our model takes anentity mention and context in a particular language (test language, possiblynot in the training languages) and predicts fine-grained types for that entity.Generalizing to new languages and unseen entities are the fundamentalchallenges of this entity typing setup, so we focus our evaluation on thesesettings and compare against simple yet powerful string match baselines.Experimental results show that our approach outperforms the baselines on unseenlanguages such as Japanese, Tamil, Arabic, Serbian, and Persian. In addition,our approach substantially improves performance on unseen entities (even inunseen languages) over the baselines, and human evaluation shows a strongability to predict relevant types in these settings.

Quick Read (beta)

loading the full paper ...