LIDIOMS: A Multilingual Linked Idioms Data Set

  • 2018-02-22 16:38:40
  • Diego Moussallem, Mohamed Ahmed Sherif, Diego Esteves, Marcos Zampieri, Axel-Cyrille Ngonga Ngomo
  • 13

Abstract

In this paper, we describe the LIDIOMS data set, a multilingual RDFrepresentation of idioms currently containing five languages: English, German,Italian, Portuguese, and Russian. The data set is intended to support naturallanguage processing applications by providing links between idioms acrosslanguages. The underlying data was crawled and integrated from various sources.To ensure the quality of the crawled data, all idioms were evaluated by atleast two native speakers. Herein, we present the model devised for structuringthe data. We also provide the details of linking LIDIOMS to well-knownmultilingual data sets such as BabelNet. The resulting data set complies withbest practices according to Linguistic Linked Open Data Community.

 

Quick Read (beta)

loading the full paper ...