Speakers Fill Lexical Semantic Gaps with Context

  • 2020-11-16 23:23:38
  • Tiago Pimentel, Rowan Hall Maudslay, Dami├ín Blasi, Ryan Cotterell
  • 0

Abstract

Lexical ambiguity is widespread in language, allowing for the reuse ofeconomical word forms and therefore making language more efficient. Ifambiguous words cannot be disambiguated from context, however, this gain inefficiency might make language less clear---resulting in frequentmiscommunication. For a language to be clear and efficiently encoded, we positthat the lexical ambiguity of a word type should correlate with how muchinformation context provides about it, on average. To investigate whether thisis the case, we operationalise the lexical ambiguity of a word as the entropyof meanings it can take, and provide two ways to estimate this---one whichrequires human annotation (using WordNet), and one which does not (using BERT),making it readily applicable to a large number of languages. We validate thesemeasures by showing that, on six high-resource languages, there are significantPearson correlations between our BERT-based estimate of ambiguity and thenumber of synonyms a word has in WordNet (e.g. $\rho = 0.40$ in English). Wethen test our main hypothesis---that a word's lexical ambiguity shouldnegatively correlate with its contextual uncertainty---and find significantcorrelations on all 18 typologically diverse languages we analyse. Thissuggests that, in the presence of ambiguity, speakers compensate by makingcontexts more informative.

 

Quick Read (beta)

loading the full paper ...