A mathematical model for universal semantics

  • 2020-01-16 11:46:28
  • Weinan E, Yajun Zhou
  • 0

Abstract

We present a mathematical model to characterize the meaning of words withlanguage-independent numerical fingerprints. Approximating texts by Markovprocesses on a long-range time scale, we are able to extract topics, discoversynonyms, and sketch semantic fields from a particular document of moderatelength, without consulting external knowledge-base or thesaurus. Our Markovsemantic model allows us to represent each topical concept by a low-dimensionalvector, interpretable as algebraic invariants in succinct statisticaloperations on the document, targeting local environments of individual words.These language-independent semantic representations enable a robot reader toboth understand short texts in a given language (automated question-answering)and match medium-length texts across different languages (automated wordtranslation). Our semantic fingerprints quantify local meaning of words in 14representative languages across 5 major language families, suggesting auniversal and cost-effective mechanism by which human languages are processedat the semantic level.

 

Quick Read (beta)

loading the full paper ...