Abstract
In Shannon's seminal paper, entropy of printed English, treated as astationary stochastic process, was estimated to be roughly 1 bit per character.However, considered as a means of communication, language differs considerablyfrom its printed form: (i) the units of information are not characters or evenwords but clauses, i.e. shortest meaningful parts of speech; and (ii) what istransmitted is principally the meaning of what is being said or written, whilethe precise phrasing that was used to communicate the meaning is typicallyignored. In this study, we show that one can leverage recently developed largelanguage models to quantify information communicated in meaningful narrativesin terms of bits of meaning per clause.
Quick Read (beta)
loading the full paper ...