Linguistic Universals: Language-independent semantic fingerprints

  • 2019-10-10 04:19:43
  • Weinan E, Yajun Zhou
  • 0

Abstract

Semantic processing is central to our understanding of natural languages,ensuring accuracy in monolingual communications, and minimizing losses incross-lingual translations. The mechanism of semantics is a less-chartedterritory, unlike phonology, morphology, syntax, among other aspects of humanlanguages. Data-hungry algorithms in machine learning achieve impressivesuccess in some tasks of document comprehension, through high-dimensionalnumerical representations of words and phrases. Such computationally taxingalgorithms are far from the efficient mechanism by which we humans understandtexts and acquire knowledge. Here we advance a cost-effective model thatassigns language-independent semantic fingerprints to words in a particulardocument, without consulting external knowledge-base or thesaurus. Ouruniversal semantic fingerprints quantify local meaning of words in 14representative languages across 5 major language families. Instead of embeddingwords into very high dimensional spaces, our method represents each concept bya few dozen parameters, interpretable as algebraic invariants in succinctstatistical operations. Concise and transparent, our semantic fingerprintsnumerically characterise connectivity and association of individual concepts,even with scant input of data. These semantic representations enable a robotreader to both understand short texts in a given language (automatedquestion-answering) and match medium-length texts across different languages(automated word translation).

 

Quick Read (beta)

loading the full paper ...