On the class of coding optimality of human languages and the origins of Zipf's law

  • 2025-05-26 15:05:45
  • Ramon Ferrer-i-Cancho
  • 0

Abstract

Here we present a new class of optimality for coding systems. Members of thatclass are separated linearly from optimal coding and thus exhibit Zipf's law,namely a power-law distribution of frequency ranks. Whithin that class, Zipf'slaw, the size-rank law and the size-probability law form a group-likestructure. We identify human languages that are members of the class. Alllanguages showing sufficient agreement with Zipf's law are potential members ofthe class. In contrast, there are communication systems in other species thatcannot be members of that class for exhibiting an exponential distributioninstead but dolphins and humpback whales might. We provide a new insight intoplots of frequency versus rank in double logarithmic scale. For any system, astraight line in that scale indicates that the lengths of optimal codes undernon-singular coding and under uniquely decodable encoding are separated by alinear function whose slope is the exponent of Zipf's law. For systems undercompression and constrained to be uniquely decodable, such a straight line mayindicate that the system is coding close to optimality. Our findings providesupport for the hypothesis that Zipf's law originates from compression.

 

Quick Read (beta)

loading the full paper ...