A Crosslingual Investigation of Conceptualization in 1335 Languages

  • 2023-05-26 19:29:24
  • Yihong Liu, Haotian Ye, Leonie Weissweiler, Philipp Wicke, Renhao Pei, Robert Zangenfeind, Hinrich Schütze
  • 0

Abstract

Languages differ in how they divide up the world into concepts and words;e.g., in contrast to English, Swahili has a single concept for `belly' and`womb'. We investigate these differences in conceptualization across 1,335languages by aligning concepts in a parallel corpus. To this end, we proposeConceptualizer, a method that creates a bipartite directed alignment graphbetween source language concepts and sets of target language strings. In adetailed linguistic analysis across all languages for one concept (`bird') andan evaluation on gold standard data for 32 Swadesh concepts, we show thatConceptualizer has good alignment accuracy. We demonstrate the potential ofresearch on conceptualization in NLP with two experiments. (1) We definecrosslingual stability of a concept as the degree to which it has 1-1correspondences across languages, and show that concreteness predictsstability. (2) We represent each language by its conceptualization pattern for83 concepts, and define a similarity measure on these representations. Theresulting measure for the conceptual similarity of two languages iscomplementary to standard genealogical, typological, and surface similaritymeasures. For four out of six language families, we can assign languages totheir correct family based on conceptual similarity with accuracy between 54%and 87%.

 

Quick Read (beta)

loading the full paper ...