A mathematical model for universal semantics

  • 2020-03-15 01:46:54
  • Weinan E, Yajun Zhou
  • 0

Abstract

We characterize the meaning of words with language-independent numericalfingerprints, through a mathematical analysis of recurring patterns in texts.Approximating texts by Markov processes on a long-range time scale, we are ableto extract topics, discover synonyms, and sketch semantic fields from aparticular document of moderate length, without consulting externalknowledge-base or thesaurus. Our Markov semantic model allows us to representeach topical concept by a low-dimensional vector, interpretable as algebraicinvariants in succinct statistical operations on the document, targeting localenvironments of individual words. These language-independent semanticrepresentations enable a robot reader to both understand short texts in a givenlanguage (automated question-answering) and match medium-length texts acrossdifferent languages (automated word translation). Our semantic fingerprintsquantify local meaning of words in 14 representative languages across 5 majorlanguage families, suggesting a universal and cost-effective mechanism by whichhuman languages are processed at the semantic level. Our protocols and sourcecodes are publicly available onhttps://github.com/yajun-zhou/linguae-naturalis-principia-mathematica

 

Quick Read (beta)

A mathematical model for universal semantics

Weinan E and Yajun Zhou Weinan E is with Department of Mathematics and Program in Applied and Computational Mathematics, Princeton University, and Beijing Institute of Big Data Research
E-mail: [email protected]Yajun Zhou is with Beijing Institute of Big Data Research
Email. [email protected]
Abstract

We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis of recurring patterns in texts. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics, discover synonyms, and sketch semantic fields from a particular document of moderate length, without consulting external knowledge-base or thesaurus. Our Markov semantic model allows us to represent each topical concept by a low-dimensional vector, interpretable as algebraic invariants in succinct statistical operations on the document, targeting local environments of individual words. These language-independent semantic representations enable a robot reader to both understand short texts in a given language (automated question-answering) and match medium-length texts across different languages (automated word translation). Our semantic fingerprints quantify local meaning of words in 14 representative languages across 5 major language families, suggesting a universal and cost-effective mechanism by which human languages are processed at the semantic level. Our protocols and source codes are publicly available on https://github.com/yajun-zhou/linguae-naturalis-principia-mathematica

recurring patterns in texts, semantic model, recurrence time, hitting time, word translation, question answering
\tikzset

>=stealth \pgfplotssetscaled y ticks=false

1 Introduction

A quantitative model for the meaning of words helps us understand how we transmit information and absorb knowledge. Ideally, a universal mechanism of semantics should be based on numerical characteristics of human languages, transcending concrete written and spoken forms of verbal messages. In this work, we demonstrate, in both theory and practice, that the time structure of recurring language patterns is a good candidate for such a universal semantic mechanism. Through statistical analysis of recurrence times and hitting times, we numerically characterize connectivity and association of individual concepts, thereby devising language-independent semantic fingerprints (LISF).

Akin to the physical world, there is a hierarchy of length scales in languages. On short scales such as syllables, words, and phrases, human languages do not exhibit a universal pattern related to semantics. Except for a few onomatopoeias, the sounds of words do not affect their meaning [1]. Neither do morphological parameters [2] (say, singular/plural, present/past) or syntactic rôles [3] (say, subject/object, active/passive). In short, there are no universal semantic mechanisms at the phonological, lexical or syntactical levels [4]. Grammatical “rules and principles” [2, 3], however typologically diverse, play no definitive rôle in determining the inherent meaning of a word.

Motivated by the observations above, we will build our quantitative semantic model on long-range and language-independent textual features. Specifically, we will measure the lengths of text fragments flanked by word patterns of interest (Fig. 1). Here, a word pattern is a collection of content words that are identical up to morphological parameters and syntactic rôles. A content word signifies definitive concepts (like apple, eat, red), instead of serving purely grammatical or logical functions (like but, of, the). Fragment length statistics will tell us how tightly/loosely one concept is connected to another. This in turn, will provide us with quantitative criteria for inclusion/exclusion of different concepts within the same (computationally constructed) semantic field. Such statistical semantic mining will then pave the way for machine comprehension and machine translation.

2 Methodology

We quantify the time structure of an individual word pattern 𝖶i through the statistics of its recurrence times τii. We characterize the dynamic impact of a word pattern 𝖶i on another word pattern 𝖶j by the statistics of their hitting times τij. In what follows, we will describe the statistical analyses of τii and τij, on which we build a language-independent Markov model for semantics.

2.1 Recurrence times and topicality

Assuming uniform reading speed,11 1 On the scale of words (rather than phonemes), this assumption works fine in most languages that are written alphabetically. However, this working hypothesis does not extend to Japanese texts, which interlace Japanese syllabograms (lasting one mora per written unit) with Chinese ideograms (lasting one or more morae per written unit). we measure the recurrence times τii for a word pattern 𝖶i through nii samples of the effective fragment lengths Lii (Figs. 1, 2a). Here, while counting as in Fig. 1, we ignore contacts between short-range neighbors, which may involve language-dependent redundancies.22 2 For example, a German phrase liebe Studentinnen und Studenten with short-range recurrence is the gender-inclusive equivalent of the English expression dear students. Some Austronesian languages (such as Malay and Hawaiian) use reduplication for plurality or emphasis.

𝖶i:=happ(ier|ily|iness|y){happier, happily, happiness, happy}, 𝖶j:=marr(iage|ied|y){marriage, married, marry}

... LOREM IPSUM HAPPY DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING UNHAPPY ELIT, HAPPINESS SED HAPPY DO HAPPY EIUSMOD TEMPOR HAPPIER, INCIDIDUNT UT LABORE ...... LOREM IPSUM HAPPYHAPPINESST AMET, HAPPYHAPPINESSETUR ADIPISCING UNHAPPY ELIT, HAPPINESS SED HAPPY DO HAPPYHAPPINESSLiiLiiLii... LOREM IPSUM, MARRIAGE DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING MARRIED ELIT, MARRY SED HAPPILY DO HAPPILY EIUSMOD TEMPOR MARRIED INCIDIDUNT UT LAB ...... LOREM IPSUM HAPPINESS DOLOR SIT AMET, HAPPYHAPPINESSETUR ADIPISCING UNHAPPY ELIT, UNHAPPY SED HAPPIER DO HAPPYHAPPINESSLijLij... LOREM IPSUM HAPPINESS DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING UNHAPPY ELIT, ... LOREM IPSUM HAPPINESS DOLOR SIT AMET, HAPPYHAPPINESSLij
Fig. 1: Counting long-range transitions between word patterns. A transition from 𝖶i to 𝖶j counts towards long-range statistics, if the underlined text fragment in between contains no occurrences of 𝖶i, and lasts strictly longer than the longest word in 𝖶i𝖶j. For each long-range transition, the effective fragment length Lij discounts the length of the longest word in 𝖶i𝖶j.
{tikzpicture}{axis} [width=10.66cm,y=.7cm, xtick=, ytick=,ymax=1,ymin=0,xmin=0, xmax=681868,ycomb, bar width=.001pt ] \addplot[draw=red] coordinates (3098.5,1)(15385.5,1)(15484.5,1)(16370.5,1)(16848.5,1)(17071.5,1)(18266.5,1)(25749.5,1)(27032.5,1)(27265.5,1)(29810.5,1)(30030.5,1)(30254.5,1)(30493.5,1)(31726.5,1)(31922.5,1)(32123.5,1)(32543.5,1)(33395.5,1)(46416.5,1)(46480.5,1)(46564.5,1)(47137.5,1)(47879.5,1)(48036.5,1)(48209.5,1)(49797.5,1)(50949.5,1)(51908.5,1)(52477.5,1)(53158.5,1)(53791.5,1)(54074.5,1)(54288.5,1)(54826.5,1)(56675.5,1)(64847.5,1)(65097.5,1)(65564.5,1)(70720.5,1)(70791.5,1)(71969.5,1)(73557.5,1)(73886.5,1)(84320.5,1)(86436.5,1)(87099.5,1)(95752.5,1)(96374.5,1)(96702.5,1)(97096.5,1)(98086.5,1)(98532.5,1)(98848.5,1)(101051.,1)(104694.,1)(117094.,1)(117416.,1)(117528.,1)(122347.,1)(124010.,1)(124076.,1)(143716.,1)(143783.,1)(144688.,1)(145751.,1)(147017.,1)(148896.,1)(158510.,1)(162964.,1)(163300.,1)(163608.,1)(165180.,1)(169520.,1)(170164.,1)(170531.,1)(170688.,1)(173688.,1)(173722.,1)(178544.,1)(195070.,1)(195096.,1)(196593.,1)(198051.,1)(202241.,1)(202536.,1)(202614.,1)(203846.,1)(205290.,1)(206584.,1)(206729.,1)(207415.,1)(207445.,1)(208086.,1)(208477.,1)(209328.,1)(210132.,1)(222319.,1)(223835.,1)(225012.,1)(226592.,1)(227116.,1)(227458.,1)(227842.,1)(227940.,1)(230570.,1)(231155.,1)(231443.,1)(232285.,1)(232540.,1)(233089.,1)(233599.,1)(235419.,1)(235970.,1)(237748.,1)(239038.,1)(239713.,1)(239991.,1)(242810.,1)(242837.,1)(243731.,1)(243990.,1)(245357.,1)(246271.,1)(246657.,1)(246935.,1)(247247.,1)(252726.,1)(255187.,1)(255468.,1)(256568.,1)(256944.,1)(259741.,1)(260017.,1)(263094.,1)(265059.,1)(265833.,1)(266138.,1)(298227.,1)(307141.,1)(310956.,1)(316995.,1)(317063.,1)(323751.,1)(323990.,1)(324444.,1)(326005.,1)(326885.,1)(337299.,1)(362223.,1)(363957.,1)(363967.,1)(364520.,1)(364540.,1)(365338.,1)(372211.,1)(372773.,1)(372930.,1)(373248.,1)(379285.,1)(379454.,1)(379761.,1)(382463.,1)(383283.,1)(384334.,1)(386065.,1)(386551.,1)(388957.,1)(390343.,1)(391202.,1)(392806.,1)(394635.,1)(395467.,1)(396124.,1)(396271.,1)(396833.,1)(400795.,1)(403749.,1)(414389.,1)(418367.,1)(453854.,1)(454162.,1)(472531.,1)(472882.,1)(480767.,1)(481785.,1)(485583.,1)(485678.,1)(486968.,1)(491283.,1)(493310.,1)(493428.,1)(493594.,1)(494648.,1)(494972.,1)(496872.,1)(497379.,1)(497639.,1)(498695.,1)(498948.,1)(504520.,1)(506901.,1)(507324.,1)(509062.,1)(510698.,1)(513536.,1)(516286.,1)(519593.,1)(524098.,1)(525375.,1)(525882.,1)(526411.,1)(528731.,1)(529353.,1)(530593.,1)(531630.,1)(532784.,1)(534568.,1)(534923.,1)(540271.,1)(547785.,1)(548746.,1)(548964.,1)(550788.,1)(551742.,1)(552583.,1)(557907.,1)(559641.,1)(566906.,1)(568455.,1)(568931.,1)(578580.,1)(580437.,1)(582392.,1)(583444.,1)(584072.,1)(584656.,1)(584789.,1)(585127.,1)(585913.,1)(587735.,1)(590597.,1)(591352.,1)(593652.,1)(594556.,1)(595122.,1)(599482.,1)(599956.,1)(602204.,1)(602458.,1)(602690.,1)(603535.,1)(604766.,1)(606611.,1)(607094.,1)(608592.,1)(608823.,1)(609589.,1)(609648.,1)(610386.,1)(610812.,1)(611545.,1)(611725.,1)(611835.,1)(613019.,1)(613197.,1)(613397.,1)(614334.,1)(630119.,1)(639345.,1)(639528.,1)(650132.,1)(652659.,1)(653460.,1)(653610.,1)(654071.,1)(654565.,1)(654860.,1)(655835.,1)(658249.,1)(660441.,1)(664327.,1)(666083.,1)(667729.,1)(667778.,1)(671634.,1)(672342.,1)(672437.,1)(675677.,1)(675971.,1)(678788.,1); \legend𝖶i=Jane(|’s){tikzpicture}{axis} [width=10.66cm,y=.7cm, xtick=, ytick=,ymax=1,ymin=0,xmin=0, xmax=681868,ycomb,bar width=.001pt] \addplot[draw=gray] coordinates (2374.5,1)(3036.5,1)(3343.5,1)(11930.5,1)(17503.5,1)(19003.5,1)(20485.5,1)(20964.5,1)(21231.5,1)(22043.5,1)(26546.5,1)(29195.5,1)(31405.5,1)(31485.5,1)(33247.5,1)(34589.5,1)(34804.5,1)(37589.5,1)(43329.5,1)(45605.5,1)(47679.5,1)(48308.5,1)(51358.5,1)(57812.5,1)(58955.5,1)(59815.5,1)(60033.5,1)(61283.5,1)(61442.5,1)(64314.5,1)(65767.5,1)(67474.5,1)(76983.5,1)(78781.5,1)(81012.5,1)(81217.5,1)(82917.5,1)(89093.5,1)(89698.5,1)(93085.5,1)(97257.5,1)(97327.5,1)(103399.,1)(109046.,1)(110771.,1)(111925.,1)(118694.,1)(119056.,1)(124196.,1)(129483.,1)(130385.,1)(132067.,1)(134793.,1)(137603.,1)(137727.,1)(143610.,1)(145389.,1)(149858.,1)(150827.,1)(151300.,1)(151445.,1)(152219.,1)(155496.,1)(162430.,1)(163410.,1)(163995.,1)(164531.,1)(166717.,1)(167954.,1)(170938.,1)(171093.,1)(175716.,1)(178766.,1)(185873.,1)(187946.,1)(188810.,1)(190911.,1)(196200.,1)(196721.,1)(197817.,1)(201366.,1)(201432.,1)(202517.,1)(204140.,1)(208256.,1)(209206.,1)(211261.,1)(211480.,1)(211546.,1)(213757.,1)(214211.,1)(216665.,1)(216966.,1)(217918.,1)(218978.,1)(221405.,1)(221658.,1)(222633.,1)(223816.,1)(223929.,1)(224165.,1)(224386.,1)(226713.,1)(227527.,1)(230145.,1)(231469.,1)(232675.,1)(233503.,1)(235383.,1)(236520.,1)(237792.,1)(237848.,1)(242094.,1)(247394.,1)(248844.,1)(251529.,1)(254401.,1)(260370.,1)(260945.,1)(262128.,1)(262338.,1)(265762.,1)(269462.,1)(270958.,1)(277557.,1)(285600.,1)(289492.,1)(291189.,1)(299098.,1)(299526.,1)(300728.,1)(304226.,1)(310454.,1)(311480.,1)(313939.,1)(315535.,1)(317976.,1)(318132.,1)(323536.,1)(325100.,1)(326841.,1)(328553.,1)(330534.,1)(332716.,1)(333907.,1)(335275.,1)(341231.,1)(344893.,1)(346198.,1)(347610.,1)(349940.,1)(351733.,1)(351876.,1)(358308.,1)(359608.,1)(365120.,1)(367932.,1)(382724.,1)(383586.,1)(386103.,1)(386683.,1)(387822.,1)(395001.,1)(395699.,1)(400390.,1)(401454.,1)(404200.,1)(410507.,1)(411909.,1)(413168.,1)(415125.,1)(417092.,1)(419617.,1)(420803.,1)(423462.,1)(424386.,1)(426581.,1)(427983.,1)(430413.,1)(431971.,1)(432137.,1)(432644.,1)(438226.,1)(439331.,1)(439471.,1)(446073.,1)(446114.,1)(446571.,1)(448581.,1)(450168.,1)(450531.,1)(451326.,1)(451366.,1)(451466.,1)(452454.,1)(453581.,1)(456889.,1)(457968.,1)(459555.,1)(463824.,1)(466066.,1)(469665.,1)(470215.,1)(475149.,1)(477077.,1)(479433.,1)(484199.,1)(489194.,1)(489448.,1)(491104.,1)(492307.,1)(503351.,1)(504019.,1)(506321.,1)(512910.,1)(515248.,1)(519350.,1)(523347.,1)(529874.,1)(536482.,1)(541986.,1)(544911.,1)(548792.,1)(550368.,1)(554617.,1)(554917.,1)(555457.,1)(560038.,1)(562258.,1)(564989.,1)(566401.,1)(566888.,1)(567064.,1)(577295.,1)(580372.,1)(580986.,1)(585940.,1)(585968.,1)(586486.,1)(587847.,1)(592334.,1)(593278.,1)(595054.,1)(596216.,1)(597011.,1)(599288.,1)(600822.,1)(601088.,1)(602499.,1)(605265.,1)(605460.,1)(609067.,1)(615458.,1)(615529.,1)(616550.,1)(618152.,1)(629698.,1)(631228.,1)(641820.,1)(648836.,1)(648994.,1)(649780.,1)(652030.,1)(653231.,1)(655111.,1)(655241.,1)(655347.,1)(658979.,1)(660436.,1)(660527.,1)(665546.,1)(666755.,1)(670115.,1)(670988.,1)(671297.,1)(671629.,1)(672569.,1)(675557.,1)(679109.,1)(679911.,1)(680758.,1)(680781.,1); \legend𝖶j=than (a) {tikzpicture}{axis} [xlabel style=yshift=.35cm,ylabel style=yshift=-.5cm,xlabel=# than in a block,ylabel=# blocks,xtick=data,tiny, height=2.75cm,width=10.65cm,ymin=0,ymax=280, ybar,xtick align=inside,bar width=12 ]\addplot[draw=gray,fill=orange!50!white] coordinates (0,64)(1,182)(2,248)(3,200)(4,150)(5,78)(6,54)(7,12)(8,8)(9,4); {axis}[xtick=data,tiny, height=2.75cm,width=10.65cm,ymin=0,ymax=280] \addplot[blue,mark=*] coordinates (0,62.2872)(1,172.909)(2,239.998)(3,222.078)(4,154.122)(5,85.5686)(6,39.5898)(7,15.7002)(8,5.44796)(9,1.68039); (b) {tikzpicture}{semilogyaxis} [xlabel style=yshift=.35cm,ylabel style=yshift=-.5cm,xlabel=Lii,ylabel=Counts,xtick=0,5000,10000,15000,tiny,xmin=0,xmax=17000, height=2.75cm,width=6.1cm,ymin=0.5,xmin=-250,ymax=150, const plot,xtick align=inside, xticklabel style=/pgf/number format/1000 sep= ,minor x tick num=4,scaled x ticks=false ]\addplot[draw=gray,fill=orange!50!white] coordinates (0,0.1)(0,99)(500,99)(500,0.1)(500,0.1)(500,56)(1000,56)(1000,0.1)(1000,0.1)(1000,30)(1500,30)(1500,0.1)(1500,0.1)(1500,29)(2000,29)(2000,0.1)(2000,0.1)(2000,11)(2500,11)(2500,0.1)(2500,0.1)(2500,12)(3000,12)(3000,0.1)(3000,0.1)(3000,3)(3500,3)(3500,0.1)(3500,0.1)(3500,7)(4000,7)(4000,0.1)(4000,0.1)(4000,6)(4500,6)(4500,0.1)(4500,0.1)(4500,2)(5000,2)(5000,0.1)(5000,0.1)(5000,4)(5500,4)(5500,0.1)(5500,0.1)(5500,1)(6000,1)(6000,0.1)(6000,0.1)(6000,2)(6500,2)(6500,0.1)(6500,0.1)(6500,2)(7000,2)(7000,0.1)(7000,0.1)(7000,2)(7500,2)(7500,0.1)(7500,0.1)(7500,2)(8000,2)(8000,0.1)(8000,0.1)(8000,1)(8500,1)(8500,0.1)(8500,0.1)(8500,2)(9000,2)(9000,0.1)(9000,0.1)(9000,1)(9500,1)(9500,0.1)(9500,0.1)(9500,2)(10000,2)(10000,0.1)(10000,0.1)(10000,2)(10500,2)(10500,0.1)(10500,0.1)(10500,2)(11000,2)(11000,0.1)(11000,0.1)(11000,0)(11500,0)(11500,0.1)(11500,0.1)(11500,0)(12000,0)(12000,0.1)(12000,0.1)(12000,3)(12500,3)(12500,0.1)(12500,0.1)(12500,0)(13000,0)(13000,0.1)(13000,0.1)(13000,1)(13500,1)(13500,0.1)(13500,0.1)(13500,0)(14000,0)(14000,0.1)(14000,0.1)(14000,0)(14500,0)(14500,0.1)(14500,0.1)(14500,0)(15000,0)(15000,0.1)(15000,0.1)(15000,0)(15500,0)(15500,0.1)(15500,0.1)(15500,1)(16000,1)(16000,0.1)(16000,0.1)(16000,0)(16500,0)(16500,0.1)(16500,0.1)(16500,1)(17000,1)(17000,0.1)(17000,0.1)(17000,0)(17500,0)(17500,0.1)(17500,0.1)(17500,0)(18000,0)(18000,0.1)(18000,0.1)(18000,1)(18500,1)(18500,0.1)(18500,0.1)(18500,0)(19000,0)(19000,0.1)(19000,0.1)(19000,0)(19500,0)(19500,0.1)(19500,0.1)(19500,1)(20000,1)(20000,0.1)(20000,0.1)(20000,0)(20500,0)(20500,0.1)(20500,0.1)(20500,0)(21000,0)(21000,0.1)(21000,0.1)(21000,0)(21500,0)(21500,0.1)(21500,0.1)(21500,0)(22000,0)(22000,0.1)(22000,0.1)(22000,0)(22500,0)(22500,0.1)(22500,0.1)(22500,0)(23000,0)(23000,0.1)(23000,0.1)(23000,0)(23500,0)(23500,0.1)(23500,0.1)(23500,0)(24000,0)(24000,0.1)(24000,0.1)(24000,0)(24500,0)(24500,0.1)(24500,0.1)(24500,1)(25000,1)(25000,0.1)(25000,0.1)(25000,0)(25500,0)(25500,0.1)(25500,0.1)(25500,0)(26000,0)(26000,0.1)(26000,0.1)(26000,0)(26500,0)(26500,0.1)(26500,0.1)(26500,0)(27000,0)(27000,0.1)(27000,0.1)(27000,0)(27500,0)(27500,0.1)(27500,0.1)(27500,0)(28000,0)(28000,0.1)(28000,0.1)(28000,0)(28500,0)(28500,0.1)(28500,0.1)(28500,0)(29000,0)(29000,0.1)(29000,0.1)(29000,0)(29500,0)(29500,0.1)(29500,0.1)(29500,0)(30000,0)(30000,0.1)(30000,0.1)(30000,0)(30500,0)(30500,0.1)(30500,0.1)(30500,0)(31000,0)(31000,0.1)(31000,0.1)(31000,0)(31500,0)(31500,0.1)(31500,0.1)(31500,0)(32000,0)(32000,0.1)(32000,0.1)(32000,1)(32500,1)(32500,0.1)(32500,0.1)(32500,0)(33000,0)(33000,0.1)(33000,0.1)(33000,0)(33500,0)(33500,0.1)(33500,0.1)(33500,0)(34000,0)(34000,0.1)(34000,0.1)(34000,0)(34500,0)(34500,0.1)(34500,0.1)(34500,0)(35000,0)(35000,0.1)(35000,0.1)(35000,1)(35500,1)(35500,0.1); {semilogyaxis}[xtick=0,5000,15000,tiny,xmin=0,xmax=17000, height=2.75cm,width=6.1cm,ymin=0.5,xmin=-250,ymax=150,xtick align=inside, xticklabel style=/pgf/number format/1000 sep= ,scaled x ticks=false] \addplot[blue, thick] coordinates (0,62.0707)(100,59.4609)(200,56.9608)(300,54.5658)(400,52.2716)(500,50.0737)(600,47.9683)(700,45.9515)(800,44.0194)(900,42.1686)(1000,40.3955)(1100,38.6971)(1200,37.07)(1300,35.5114)(1400,34.0183)(1500,32.5879)(1600,31.2177)(1700,29.9051)(1800,28.6478)(1900,27.4432)(2000,26.2894)(2100,25.184)(2200,24.1251)(2300,23.1107)(2400,22.139)(2500,21.2082)(2600,20.3165)(2700,19.4622)(2800,18.6439)(2900,17.86)(3000,17.1091)(3100,16.3897)(3200,15.7006)(3300,15.0404)(3400,14.408)(3500,13.8022)(3600,13.2219)(3700,12.666)(3800,12.1334)(3900,11.6233)(4000,11.1346)(4100,10.6664)(4200,10.2179)(4300,9.78829)(4400,9.37674)(4500,8.98248)(4600,8.6048)(4700,8.24301)(4800,7.89642)(4900,7.56441)(5000,7.24635)(5100,6.94167)(5200,6.6498)(5300,6.37021)(5400,6.10237)(5500,5.84579)(5600,5.59999)(5700,5.36454)(5800,5.13898)(5900,4.92291)(6000,4.71592)(6100,4.51763)(6200,4.32768)(6300,4.14572)(6400,3.97141)(6500,3.80443)(6600,3.64447)(6700,3.49123)(6800,3.34444)(6900,3.20382)(7000,3.06911)(7100,2.94007)(7200,2.81645)(7300,2.69803)(7400,2.58459)(7500,2.47592)(7600,2.37181)(7700,2.27209)(7800,2.17656)(7900,2.08504)(8000,1.99737)(8100,1.91339)(8200,1.83294)(8300,1.75587)(8400,1.68205)(8500,1.61132)(8600,1.54357)(8700,1.47867)(8800,1.4165)(8900,1.35694)(9000,1.29989)(9100,1.24523)(9200,1.19288)(9300,1.14272)(9400,1.09467)(9500,1.04865)(9600,1.00456)(9700,0.962318)(9800,0.921856)(9900,0.883096)(10000,0.845965)(10100,0.810395)(10200,0.776322)(10300,0.74368)(10400,0.712412)(10500,0.682458)(10600,0.653763)(10700,0.626275)(10800,0.599942)(10900,0.574717)(11000,0.550553)(11100,0.527404)(11200,0.505229)(11300,0.483986)(11400,0.463636)(11500,0.444142)(11600,0.425468)(11700,0.407579)(11800,0.390442)(11900,0.374025)(12000,0.358299)(12100,0.343234)(12200,0.328802)(12300,0.314977)(12400,0.301734)(12500,0.289047); \addplot[red, thick] coordinates (0,123.113)(100,110.946)(200,100.022)(300,90.213)(400,81.4044)(500,73.4938)(600,66.389)(700,60.0073)(800,54.2745)(900,49.124)(1000,44.4962)(1100,40.3375)(1200,36.5997)(1300,33.2398)(1400,30.219)(1500,27.5025)(1600,25.0592)(1700,22.8611)(1800,20.8831)(1900,19.1028)(2000,17.4998)(2100,16.056)(2200,14.7552)(2300,13.5827)(2400,12.5255)(2500,11.5718)(2600,10.711)(2700,9.9337)(2800,9.23136)(2900,8.59636)(3000,8.02185)(3100,7.5017)(3200,7.03038)(3300,6.60296)(3400,6.21499)(3500,5.86249)(3600,5.54188)(3700,5.24995)(3800,4.98381)(3900,4.74088)(4000,4.51885)(4100,4.31562)(4200,4.12932)(4300,3.95828)(4400,3.80098)(4500,3.65607)(4600,3.52234)(4700,3.39869)(4800,3.28414)(4900,3.17782)(5000,3.07893)(5100,2.98675)(5200,2.90066)(5300,2.82008)(5400,2.74449)(5500,2.67344)(5600,2.60649)(5700,2.54328)(5800,2.48348)(5900,2.42677)(6000,2.3729)(6100,2.3216)(6200,2.27267)(6300,2.22589)(6400,2.18111)(6500,2.13814)(6600,2.09685)(6700,2.05711)(6800,2.0188)(6900,1.98181)(7000,1.94604)(7100,1.91141)(7200,1.87785)(7300,1.84527)(7400,1.81361)(7500,1.78282)(7600,1.75285)(7700,1.72364)(7800,1.69515)(7900,1.66734)(8000,1.64018)(8100,1.61363)(8200,1.58766)(8300,1.56225)(8400,1.53737)(8500,1.51299)(8600,1.48911)(8700,1.46568)(8800,1.44271)(8900,1.42017)(9000,1.39804)(9100,1.37632)(9200,1.35499)(9300,1.33404)(9400,1.31345)(9500,1.29322)(9600,1.27333)(9700,1.25378)(9800,1.23455)(9900,1.21565)(10000,1.19705)(10100,1.17877)(10200,1.16077)(10300,1.14307)(10400,1.12565)(10500,1.10852)(10600,1.09165)(10700,1.07505)(10800,1.05871)(10900,1.04263)(11000,1.0268)(11100,1.01122)(11200,0.995877)(11300,0.980774)(11400,0.965906)(11500,0.951267)(11600,0.936854)(11700,0.922663)(11800,0.90869)(11900,0.894932)(12000,0.881384)(12100,0.868043)(12200,0.854907)(12300,0.841971)(12400,0.829233)(12500,0.816688)(12600,0.804335)(12700,0.79217)(12800,0.78019)(12900,0.768392)(13000,0.756773)(13100,0.745331)(13200,0.734062)(13300,0.722965)(13400,0.712035)(13500,0.701272)(13600,0.690671)(13700,0.680231)(13800,0.66995)(13900,0.659824)(14000,0.649851)(14100,0.64003)(14200,0.630357)(14300,0.62083)(14400,0.611448)(14500,0.602208)(14600,0.593107)(14700,0.584144)(14800,0.575317)(14900,0.566623)(15000,0.55806)(15100,0.549628)(15200,0.541322)(15300,0.533142)(15400,0.525086)(15500,0.517152)(15600,0.509337)(15700,0.501641)(15800,0.494061)(15900,0.486596)(16000,0.479243)(16100,0.472002)(16200,0.46487)(16300,0.457846)(16400,0.450928)(16500,0.444114); \node[fill=none,anchor=center] at (130,4) 𝖶i=Jane(|’s); {tikzpicture}{semilogyaxis} [ytick=,xlabel style=yshift=.35cm,ylabel style=yshift=-.5cm,xlabel=Ljj,ylabel=,xtick=0,5000,10000,15000,tiny,xmin=0,xmax=17000, height=2.75cm,width=6.1cm,ymin=0.5,xmin=-250,ymax=150, const plot,xtick align=inside, xticklabel style=/pgf/number format/1000 sep= ,minor x tick num=4,scaled x ticks=false ]\addplot[draw=gray,fill=orange!50!white] coordinates (0,0.1)(0,51)(500,51)(500,0.1)(500,0.1)(500,37)(1000,37)(1000,0.1)(1000,0.1)(1000,44)(1500,44)(1500,0.1)(1500,0.1)(1500,29)(2000,29)(2000,0.1)(2000,0.1)(2000,26)(2500,26)(2500,0.1)(2500,0.1)(2500,18)(3000,18)(3000,0.1)(3000,0.1)(3000,9)(3500,9)(3500,0.1)(3500,0.1)(3500,10)(4000,10)(4000,0.1)(4000,0.1)(4000,9)(4500,9)(4500,0.1)(4500,0.1)(4500,7)(5000,7)(5000,0.1)(5000,0.1)(5000,7)(5500,7)(5500,0.1)(5500,0.1)(5500,8)(6000,8)(6000,0.1)(6000,0.1)(6000,7)(6500,7)(6500,0.1)(6500,0.1)(6500,7)(7000,7)(7000,0.1)(7000,0.1)(7000,3)(7500,3)(7500,0.1)(7500,0.1)(7500,1)(8000,1)(8000,0.1)(8000,0.1)(8000,1)(8500,1)(8500,0.1)(8500,0.1)(8500,1)(9000,1)(9000,0.1)(9000,0.1)(9000,0)(9500,0)(9500,0.1)(9500,0.1)(9500,1)(10000,1)(10000,0.1)(10000,0.1)(10000,1)(10500,1)(10500,0.1)(10500,0.1)(10500,1)(11000,1)(11000,0.1)(11000,0.1)(11000,1)(11500,1)(11500,0.1)(11500,0.1)(11500,1)(12000,1)(12000,0.1)(12000,0.1)(12000,0)(12500,0)(12500,0.1)(12500,0.1)(12500,0)(13000,0)(13000,0.1)(13000,0.1)(13000,0)(13500,0)(13500,0.1)(13500,0.1)(13500,0)(14000,0)(14000,0.1)(14000,0.1)(14000,0)(14500,0)(14500,0.1)(14500,0.1)(14500,1)(15000,1)(15000,0.1); {semilogyaxis}[yticklabels=,xtick=0,5000,10000,15000,tiny,xmin=0,xmax=17000, height=2.75cm,width=6.1cm,ymin=0.5,xmin=-250,ymax=150,xtick align=inside, xticklabel style=/pgf/number format/1000 sep= ,scaled x ticks=false] \addplot[blue, thick] coordinates (0,58.3895)(200,53.7325)(400,49.447)(600,45.5033)(800,41.8742)(1000,38.5344)(1200,35.4611)(1400,32.6328)(1600,30.0302)(1800,27.6351)(2000,25.431)(2200,23.4027)(2400,21.5362)(2600,19.8186)(2800,18.2379)(3000,16.7833)(3200,15.4447)(3400,14.2129)(3600,13.0794)(3800,12.0362)(4000,11.0762)(4200,10.1928)(4400,9.3799)(4600,8.63179)(4800,7.94335)(5000,7.30982)(5200,6.72682)(5400,6.19031)(5600,5.6966)(5800,5.24226)(6000,4.82415)(6200,4.4394)(6400,4.08533)(6600,3.7595)(6800,3.45965)(7000,3.18373)(7200,2.9298)(7400,2.69613)(7600,2.4811)(7800,2.28322)(8000,2.10112)(8200,1.93354)(8400,1.77933)(8600,1.63741)(8800,1.50682)(9000,1.38664)(9200,1.27605)(9400,1.17428)(9600,1.08062)(9800,0.994434)(10000,0.915122)(10200,0.842135)(10400,0.77497)(10600,0.713161)(10800,0.656282)(11000,0.603939)(11200,0.555772)(11400,0.511445)(11600,0.470654)(11800,0.433117)(12000,0.398573)(12200,0.366784)(12400,0.337531)(12600,0.310611)(12800,0.285838)(13000,0.26304)(13200,0.242061)(13400,0.222755)(13600,0.204989)(13800,0.18864)(14000,0.173595)(14200,0.15975)(14400,0.147009)(14600,0.135284)(14800,0.124494)(15000,0.114565)(15200,0.105428)(15400,0.097019)(15600,0.0892812)(15800,0.0821604)(16000,0.0756076)(16200,0.0695775)(16400,0.0640282)(16600,0.0589216)(16800,0.0542222)(17000,0.0498976)(17200,0.045918)(17400,0.0422558)(17600,0.0388856)(17800,0.0357842)(18000,0.0329302)(18200,0.0303038)(18400,0.0278869)(18600,0.0256628)(18800,0.023616)(19000,0.0217325)(19200,0.0199992)(19400,0.0184041)(19600,0.0169363)(19800,0.0155855)(20000,0.0143425)(20200,0.0131986)(20400,0.0121459)(20600,0.0111772)(20800,0.0102857)(21000,0.00946538)(21200,0.00871045)(21400,0.00801574)(21600,0.00737644)(21800,0.00678812)(22000,0.00624672)(22200,0.00574851)(22400,0.00529003)(22600,0.00486812)(22800,0.00447985)(23000,0.00412256)(23200,0.00379376)(23400,0.00349118)(23600,0.00321274)(23800,0.0029565)(24000,0.0027207); \node[fill=none,anchor=center] at (140,4) 𝖶j=than; (c) (d) {tikzpicture}{semilogyaxis} [xlabel style=yshift=.35cm,ylabel style=yshift=-.5cm,xlabel=logLii,ylabel=Counts,xtick=0,1,…,12,tiny,xmin=-0.5,xmax=11.5, height=2.75cm,width=6.1cm,ymin=0.5,ymax=150, const plot,xtick align=inside, xticklabel style=/pgf/number format/1000 sep= ,scaled x ticks=false ]\addplot[draw=gray,fill=orange!50!white] coordinates (2.2,0.1)(2.2,1)(2.4,1)(2.4,0.1)(2.4,0.1)(2.4,0)(2.6,0)(2.6,0.1)(2.6,0.1)(2.6,1)(2.8,1)(2.8,0.1)(2.8,0.1)(2.8,2)(3.,2)(3.,0.1)(3.,0.1)(3.,1)(3.2,1)(3.2,0.1)(3.2,0.1)(3.2,0)(3.4,0)(3.4,0.1)(3.4,0.1)(3.4,0)(3.6,0)(3.6,0.1)(3.6,0.1)(3.6,1)(3.8,1)(3.8,0.1)(3.8,0.1)(3.8,2)(4.,2)(4.,0.1)(4.,0.1)(4.,4)(4.2,4)(4.2,0.1)(4.2,0.1)(4.2,2)(4.4,2)(4.4,0.1)(4.4,0.1)(4.4,4)(4.6,4)(4.6,0.1)(4.6,0.1)(4.6,3)(4.8,3)(4.8,0.1)(4.8,0.1)(4.8,7)(5.,7)(5.,0.1)(5.,0.1)(5.,6)(5.2,6)(5.2,0.1)(5.2,0.1)(5.2,9)(5.4,9)(5.4,0.1)(5.4,0.1)(5.4,13)(5.6,13)(5.6,0.1)(5.6,0.1)(5.6,19)(5.8,19)(5.8,0.1)(5.8,0.1)(5.8,9)(6.,9)(6.,0.1)(6.,0.1)(6.,13)(6.2,13)(6.2,0.1)(6.2,0.1)(6.2,16)(6.4,16)(6.4,0.1)(6.4,0.1)(6.4,15)(6.6,15)(6.6,0.1)(6.6,0.1)(6.6,20)(6.8,20)(6.8,0.1)(6.8,0.1)(6.8,15)(7.,15)(7.,0.1)(7.,0.1)(7.,16)(7.2,16)(7.2,0.1)(7.2,0.1)(7.2,15)(7.4,15)(7.4,0.1)(7.4,0.1)(7.4,20)(7.6,20)(7.6,0.1)(7.6,0.1)(7.6,10)(7.8,10)(7.8,0.1)(7.8,0.1)(7.8,12)(8.,12)(8.,0.1)(8.,0.1)(8.,5)(8.2,5)(8.2,0.1)(8.2,0.1)(8.2,11)(8.4,11)(8.4,0.1)(8.4,0.1)(8.4,6)(8.6,6)(8.6,0.1)(8.6,0.1)(8.6,4)(8.8,4)(8.8,0.1)(8.8,0.1)(8.8,6)(9.,6)(9.,0.1)(9.,0.1)(9.,6)(9.2,6)(9.2,0.1)(9.2,0.1)(9.2,4)(9.4,4)(9.4,0.1)(9.4,0.1)(9.4,4)(9.6,4)(9.6,0.1)(9.6,0.1)(9.6,2)(9.8,2)(9.8,0.1)(9.8,0.1)(9.8,2)(10.,2)(10.,0.1)(10.,0.1)(10.,1)(10.2,1)(10.2,0.1)(10.2,0.1)(10.2,1)(10.4,1)(10.4,0.1)(10.4,0.1)(10.4,1)(10.6,1)(10.6,0.1); {semilogyaxis}[xtick=0,1,…,12,tiny,xmin=-0.5,xmax=11.5, height=2.75cm,width=6.1cm,ymin=0.5,ymax=150,xtick align=inside, xticklabel style=/pgf/number format/1000 sep= ,scaled x ticks=false] \addplot[blue, thick] coordinates (0.,0.0248176)(0.1,0.0274265)(0.2,0.0303094)(0.3,0.0334953)(0.4,0.0370157)(0.5,0.0409059)(0.6,0.0452047)(0.7,0.0499548)(0.8,0.0552036)(0.9,0.0610033)(1.,0.0674115)(1.1,0.0744921)(1.2,0.0823153)(1.3,0.0909589)(1.4,0.100508)(1.5,0.111059)(1.6,0.122714)(1.7,0.13559)(1.8,0.149813)(1.9,0.165523)(2.,0.182876)(2.1,0.202042)(2.2,0.223209)(2.3,0.246583)(2.4,0.272394)(2.5,0.300892)(2.6,0.332354)(2.7,0.367084)(2.8,0.405418)(2.9,0.447724)(3.,0.494405)(3.1,0.545907)(3.2,0.602716)(3.3,0.665366)(3.4,0.734443)(3.5,0.810587)(3.6,0.894498)(3.7,0.98694)(3.8,1.08875)(3.9,1.20082)(4.,1.32416)(4.1,1.45981)(4.2,1.60895)(4.3,1.77282)(4.4,1.95275)(4.5,2.1502)(4.6,2.36669)(4.7,2.60387)(4.8,2.86347)(4.9,3.1473)(5.,3.45726)(5.1,3.79533)(5.2,4.16352)(5.3,4.56388)(5.4,4.99842)(5.5,5.46913)(5.6,5.97788)(5.7,6.52635)(5.8,7.11601)(5.9,7.74792)(6.,8.42273)(6.1,9.14043)(6.2,9.9003)(6.3,10.7007)(6.4,11.5387)(6.5,12.4101)(6.6,13.3093)(6.7,14.2286)(6.8,15.1584)(6.9,16.0867)(7.,16.9992)(7.1,17.8789)(7.2,18.7065)(7.3,19.4599)(7.4,20.1153)(7.5,20.6471)(7.6,21.0288)(7.7,21.2344)(7.8,21.2393)(7.9,21.0226)(8.,20.5681)(8.1,19.8672)(8.2,18.9204)(8.3,17.7388)(8.4,16.3457)(8.5,14.7769)(8.6,13.0793)(8.7,11.3095)(8.8,9.53009)(8.9,7.80482)(9.,6.19354)(9.1,4.74664)(9.2,3.50036)(9.3,2.47378)(9.4,1.66796)(9.5,1.06768)(9.6,0.645287)(9.7,0.366014)(9.8,0.193542)(9.9,0.0947072)(10.,0.0425381)(10.1,0.01738)(10.2,0.00639541)(10.3,0.00209632)(10.4,0.00060469)(10.5,0.000151446)(10.6,0.0000324471)(10.7,0.00000585006)(10.8,0.000000871621)(10.9,0.000000105189)(11.,0.0000000100568); \addplot[red, thick] coordinates (0.,0.0491938)(0.1,0.0543616)(0.2,0.0600716)(0.3,0.0663805)(0.4,0.073351)(0.5,0.0810521)(0.6,0.0895602)(0.7,0.0989596)(0.8,0.109343)(0.9,0.120813)(1.,0.133483)(1.1,0.147478)(1.2,0.162935)(1.3,0.180005)(1.4,0.198857)(1.5,0.219673)(1.6,0.242657)(1.7,0.268032)(1.8,0.296043)(1.9,0.326961)(2.,0.361083)(2.1,0.398736)(2.2,0.440277)(2.3,0.4861)(2.4,0.536637)(2.5,0.592359)(2.6,0.653785)(2.7,0.721479)(2.8,0.796059)(2.9,0.878197)(3.,0.968627)(3.1,1.06815)(3.2,1.17762)(3.3,1.29797)(3.4,1.43023)(3.5,1.57547)(3.6,1.73486)(3.7,1.90965)(3.8,2.10117)(3.9,2.31082)(4.,2.54008)(4.1,2.7905)(4.2,3.06369)(4.3,3.3613)(4.4,3.685)(4.5,4.03647)(4.6,4.41733)(4.7,4.82915)(4.8,5.27336)(4.9,5.75121)(5.,6.26366)(5.1,6.81134)(5.2,7.39441)(5.3,8.01245)(5.4,8.6643)(5.5,9.34797)(5.6,10.0604)(5.7,10.7973)(5.8,11.553)(5.9,12.3203)(6.,13.0903)(6.1,13.8521)(6.2,14.5931)(6.3,15.2987)(6.4,15.9525)(6.5,16.5366)(6.6,17.0321)(6.7,17.4196)(6.8,17.6801)(6.9,17.796)(7.,17.7525)(7.1,17.5389)(7.2,17.1502)(7.3,16.5883)(7.4,15.8634)(7.5,14.9944)(7.6,14.0091)(7.7,12.9432)(7.8,11.8383)(7.9,10.7393)(8.,9.69035)(8.1,8.73105)(8.2,7.89217)(8.3,7.19259)(8.4,6.63735)(8.5,6.2177)(8.6,5.91296)(8.7,5.69404)(8.8,5.52785)(8.9,5.38181)(9.,5.22752)(9.1,5.04321)(9.2,4.8147)(9.3,4.53526)(9.4,4.20474)(9.5,3.82845)(9.6,3.41604)(9.7,2.98038)(9.8,2.53645)(9.9,2.10015)(10.,1.68688)(10.1,1.31022)(10.2,0.980601)(10.3,0.70443)(10.4,0.483624)(10.5,0.315815)(10.6,0.195131)(10.7,0.113412)(10.8,0.0616085)(10.9,0.0310586)(11.,0.0144169); \node[fill=none,anchor=center] at (90,4) 𝖶i=Jane(|’s); {tikzpicture}{semilogyaxis} [ytick=,xlabel style=yshift=.35cm,ylabel style=yshift=-.5cm,xlabel=logLjj,ylabel=,xtick=0,1,…,12,tiny,xmin=-0.5,xmax=11.5, height=2.75cm,width=6.1cm,ymin=0.5,ymax=150, const plot,xtick align=inside, xticklabel style=/pgf/number format/1000 sep= ,scaled x ticks=false ]\addplot[draw=gray,fill=orange!50!white] coordinates (2.6,0.1)(2.6,1)(2.8,1)(2.8,0.1)(2.8,0.1)(2.8,1)(3.,1)(3.,0.1)(3.,0.1)(3.,0)(3.2,0)(3.2,0.1)(3.2,0.1)(3.2,0)(3.4,0)(3.4,0.1)(3.4,0.1)(3.4,2)(3.6,2)(3.6,0.1)(3.6,0.1)(3.6,0)(3.8,0)(3.8,0.1)(3.8,0.1)(3.8,1)(4.,1)(4.,0.1)(4.,0.1)(4.,4)(4.2,4)(4.2,0.1)(4.2,0.1)(4.2,1)(4.4,1)(4.4,0.1)(4.4,0.1)(4.4,3)(4.6,3)(4.6,0.1)(4.6,0.1)(4.6,2)(4.8,2)(4.8,0.1)(4.8,0.1)(4.8,6)(5.,6)(5.,0.1)(5.,0.1)(5.,4)(5.2,4)(5.2,0.1)(5.2,0.1)(5.2,7)(5.4,7)(5.4,0.1)(5.4,0.1)(5.4,5)(5.6,5)(5.6,0.1)(5.6,0.1)(5.6,5)(5.8,5)(5.8,0.1)(5.8,0.1)(5.8,2)(6.,2)(6.,0.1)(6.,0.1)(6.,6)(6.2,6)(6.2,0.1)(6.2,0.1)(6.2,10)(6.4,10)(6.4,0.1)(6.4,0.1)(6.4,5)(6.6,5)(6.6,0.1)(6.6,0.1)(6.6,14)(6.8,14)(6.8,0.1)(6.8,0.1)(6.8,16)(7.,16)(7.,0.1)(7.,0.1)(7.,24)(7.2,24)(7.2,0.1)(7.2,0.1)(7.2,23)(7.4,23)(7.4,0.1)(7.4,0.1)(7.4,19)(7.6,19)(7.6,0.1)(7.6,0.1)(7.6,24)(7.8,24)(7.8,0.1)(7.8,0.1)(7.8,20)(8.,20)(8.,0.1)(8.,0.1)(8.,14)(8.2,14)(8.2,0.1)(8.2,0.1)(8.2,11)(8.4,11)(8.4,0.1)(8.4,0.1)(8.4,16)(8.6,16)(8.6,0.1)(8.6,0.1)(8.6,21)(8.8,21)(8.8,0.1)(8.8,0.1)(8.8,7)(9.,7)(9.,0.1)(9.,0.1)(9.,2)(9.2,2)(9.2,0.1)(9.2,0.1)(9.2,4)(9.4,4)(9.4,0.1)(9.4,0.1)(9.4,0)(9.6,0)(9.6,0.1)(9.6,0.1)(9.6,1)(9.8,1)(9.8,0.1); {semilogyaxis}[yticklabels=,xtick=0,1,…,12,tiny,xmin=-0.5,xmax=11.5, height=2.75cm,width=6.1cm,ymin=0.5,ymax=150,xtick align=inside, xticklabel style=/pgf/number format/1000 sep= ,scaled x ticks=false] \addplot[blue, thick] coordinates (0.,0.0240107)(0.1,0.0265348)(0.2,0.0293241)(0.3,0.0324064)(0.4,0.0358125)(0.5,0.0395763)(0.6,0.0437355)(0.7,0.0483313)(0.8,0.0534097)(0.9,0.0590211)(1.,0.0652214)(1.1,0.0720722)(1.2,0.0796416)(1.3,0.0880048)(1.4,0.0972448)(1.5,0.107453)(1.6,0.118731)(1.7,0.131189)(1.8,0.144952)(1.9,0.160154)(2.,0.176946)(2.1,0.195493)(2.2,0.215976)(2.3,0.238596)(2.4,0.263574)(2.5,0.291154)(2.6,0.321604)(2.7,0.355218)(2.8,0.392322)(2.9,0.433271)(3.,0.478458)(3.1,0.528314)(3.2,0.583312)(3.3,0.643968)(3.4,0.710852)(3.5,0.784585)(3.6,0.865846)(3.7,0.955378)(3.8,1.05399)(3.9,1.16257)(4.,1.28206)(4.1,1.41352)(4.2,1.55807)(4.3,1.71692)(4.4,1.89139)(4.5,2.08288)(4.6,2.2929)(4.7,2.52305)(4.8,2.77504)(4.9,3.05064)(5.,3.35175)(5.1,3.68031)(5.2,4.03832)(5.3,4.42781)(5.4,4.85082)(5.5,5.30936)(5.6,5.80533)(5.7,6.3405)(5.8,6.91639)(5.9,7.53424)(6.,8.19483)(6.1,8.89839)(6.2,9.64445)(6.3,10.4317)(6.4,11.2576)(6.5,12.1186)(6.6,13.0093)(6.7,13.923)(6.8,14.8505)(6.9,15.7807)(7.,16.7002)(7.1,17.5928)(7.2,18.4399)(7.3,19.2204)(7.4,19.911)(7.5,20.4866)(7.6,20.9208)(7.7,21.1874)(7.8,21.2613)(7.9,21.1199)(8.,20.7454)(8.1,20.1265)(8.2,19.2602)(8.3,18.1543)(8.4,16.8278)(8.5,15.3124)(8.6,13.6516)(8.7,11.899)(8.8,10.1156)(8.9,8.36549)(9.,6.71037)(9.1,5.20433)(9.2,3.88872)(9.3,2.7885)(9.4,1.91063)(9.5,1.24493)(9.6,0.767332)(9.7,0.444785)(9.8,0.240903)(9.9,0.121048)(10.,0.055985)(10.1,0.0236265)(10.2,0.00901059)(10.3,0.00307264)(10.4,0.000925893)(10.5,0.000243363); \node[fill=none,anchor=center] at (97,4) 𝖶j=than; (c) (d) \polygon*(2,12)(40,50)(2,50)AREA FORBIDDEN BYJENSEN’S INEQUALITY Eliza(|beth|beth’s) Darcy(|’s) Bennet(|’s|s) Bingley(|’s|s|s’) Jane(|’s) Wickham(|’s) Collins(|’s) happ(ily|iness|y|ier|iest) Lydia(|’s) Catherine(|’s) lov(e|e’|ed|ely|es|ing|eliness|e-making|er|ers) Gardiner(|’s|s) Lizzy(|’s) Charlotte(|’s) Lucas(|’s|es|es’) danc(e|ed|es|ing) Kitty(|’s) Chapter Rosings William(|’s) handsome(|ly|r|st) beaut(iful|ies|y) Forster(|’s|s) Mary(|’s) Bourgh(|’s) Fitzwilliam(|’s) Hurst(|’s|s)7891011678910logLii(e)
Fig. 2: Statistical analysis of recurrence times and topicality. (a) Barcode representations (adapted from [5, Fig. 2]) for the coverage of 𝖶i=Jane(|’s) (291 occurrences) and 𝖶j=than (282 occurrences) in the whole text of Pride and Prejudice. Horizontal axis scales linearly with respect to the text length measured in the number of constituting letters, spaces and punctuation marks. (b) Counts of the word than within a consecutive block of 1217 words (spanning about 1% of the entire text), drawn from 1000 randomly chosen blocks, fitted to a Poisson distribution with mean 2.776 (blue curve). (c) Histogram of reduced fragment length Lii (see Fig. 1 for its definition) for the topical pattern 𝖶i=Jane(|’s), fitted to an exponential distribution (blue line in the semi-log plot) and a weighted mixture of two exponential distributions c1k1e-k1t+c2k2e-k2t (red curve, with c1:c21:3, k1:k21:7). (d) Histogram of Ljj for the function word 𝖶j=than, fitted to an exponential distribution (blue line in the semi-log plot). All the parameter estimators in panels b–d are based on maximum likelihood. (c)–(d) Reinterpretations of panels c–d, with logarithmic binning on the horizontal axes, to give fuller coverage of the dynamic ranges for the statistics. (e) Recurrence statistics for word patterns in Jane Austen’s Pride and Prejudice, where denotes averages over nii samples of long-range transitions. Data points in gray, green and red have radii 14nii. Labels for proper names and some literary motifs are attached next to the corresponding colored dots. Jensen’s bound (green dashed line) has unit slope and zero intercept. Exponentially distributed recurrence statistics reside on the line of Poissonian banality (blue line), with unit slope and negative intercept. Red (resp. green) dots mark significant downward (resp. upward) departure from the blue line.

2.1.1 Recurrence of non-topical patterns

In a memoryless (hence banal) Poisson process (Fig. 2b), recurrence times are exponentially distributed (Fig. 2d,d). The same is also true for word recurrence in a randomly reshuffled text [5]. If we have nii independent samples of exponentially distributed random variables Lii, then the statistic δi:=logLii-logLii-γ0+12nii satisfies an inequality

|δi|<2niiπ26-1-12nii (1)

with probability 95% (see Theorem 1 in Appendix A for a two-sigma rule). Here, γ0:=limn(-logn+m=1n1m) is the Euler–Mascheroni constant.

As a working definition, we consider a word pattern 𝖶i non-topical if its nii counts of effective fragment lengths Lii are exponentially distributed (Lii>t)e-kt, within 95% margins of error [that is, satisfying (1) above].

2.1.2 Recurrence of topical patterns

In contrast, we consider a word pattern 𝖶i topical if its diagonal statistics nii,Lii constitute significant departure from the Poissonian line logLii-logLii+γ0=0 (Fig. 2e, blue line), violating the bound in (1).

Notably, most data points for topics (colored dots on Fig. 2e) in Jane Austen’s Pride and Prejudice mark systematic downward departures from the Poissonian line. This suggests that the topical recurrence times τ=Lii follow weighted mixtures of exponential distributions (Fig. 2c,c):

(τ>t)mcme-kmt, (2)

(where cm,km>0, and mcm=1), which impose an inequality constraint on the recurrence time τ=Lii:

logLii-logLii+γ0
= mcmlog1km-logmcmkm0. (3)
Word Counts felt 100 feelings 86 feel 39 feeling 38 feels 4 feelingly 1 \xLongrightarrowword count censorship  Word Counts felt 100 feelings 86 feel 39 feeling 38 \xLongrightarrowalphabetic sorting  Word Counts feel 39 feeling 38 feelings 86 felt 100 \xLongrightarrowsequence alignment 100863839 \xLongrightarrowvertical merger 100863839

(a)

𝖶ien