Multilingual Fact Linking

  • 2021-10-01 03:58:54
  • Keshav Kolluru, Martin Rezk, Pat Verga, William W. Cohen, Partha Talukdar
  • 0


Knowledge-intensive NLP tasks can benefit from linking natural language textwith facts from a Knowledge Graph (KG). Although facts themselves arelanguage-agnostic, the fact labels (i.e., language-specific representation ofthe fact) in the KG are often present only in a few languages. This makes itchallenging to link KG facts to sentences in languages other than the limitedset of languages. To address this problem, we introduce the task ofMultilingual Fact Linking (MFL) where the goal is to link fact expressed in asentence to corresponding fact in the KG, even when the fact label in the KG isnot available in the language of the sentence. To facilitate research in thisarea, we present a new evaluation dataset, IndicLink. This dataset contains11,293 linked WikiData facts and 6,429 sentences spanning English and sixIndian languages. We propose a Retrieval+Generation model, ReFCoG, that canscale to millions of KG facts by combining Dual Encoder based retrieval with aSeq2Seq based generation model which is constrained to output only valid KGfacts. ReFCoG outperforms standard Retrieval+Re-ranking models by 10.7 pts [email protected] In spite of this gain, the model achieves an overall score of52.1, showing ample scope for improvement in the task.ReFCoG code and IndicLinkdata are available at


