The optimality of word lengths. Theoretical foundations and an empirical study

  • 2022-09-09 16:51:50
  • Sonia Petrini, Antoni Casas-i-Muñoz, Jordi Cluet-i-Martinell, Mengxue Wang, Christian Bentz, Ramon Ferrer-i-Cancho
  • 0


One of the most robust patterns found in human languages is Zipf's law ofabbreviation, that is, the tendency of more frequent words to be shorter. SinceZipf's pioneering research, this law has been viewed as a manifestation ofcompression, i.e. the minimization of the length of forms - a universalprinciple of natural communication. Although the claim that languages areoptimized has become trendy, attempts to measure the degree of optimization oflanguages have been rather scarce. Here we demonstrate that compressionmanifests itself in a wide sample of languages without exceptions, andindependently of the unit of measurement. It is detectable for both wordlengths in characters of written language as well as durations in time inspoken language. Moreover, to measure the degree of optimization, we derive asimple formula for a random baseline and present two scores that are dualynormalized, namely, they are normalized with respect to both the minimum andthe random baseline. We analyze the theoretical and statistical pros and consof these and other scores. Harnessing the best score, we quantify for the firsttime the degree of optimality of word lengths in languages. This indicates thatlanguages are optimized to 62 or 67 percent on average (depending on thesource) when word lengths are measured in characters, and to 65 percent onaverage when word lengths are measured in time. In general, spoken worddurations are more optimized than written word lengths in characters. Beyondthe analyses reported here, our work paves the way to measure the degree ofoptimality of the vocalizations or gestures of other species, and to comparethem against written, spoken, or signed human languages.


