A Mathematical Model for Linguistic Universals

  • 2019-07-31 02:21:44
  • Weinan E, Yajun Zhou
  • 0

Abstract

Inspired by chemical kinetics and neurobiology, we propose a mathematicaltheory for pattern recurrence in text documents, applicable to a wide varietyof languages. We present a Markov model at the discourse level for StevenPinker's "mentalese", or chains of mental states that transcend thespoken/written forms. Such (potentially) universal temporal structures oftextual patterns lead us to a language-independent semantic representation, ora translationally-invariant word embedding, thereby forming the common groundfor both comprehensibility within a given language and translatability betweendifferent languages. Applying our model to documents of moderate lengths,without relying on external knowledge bases, we reconcile Noam Chomsky's"poverty of stimulus" paradox with statistical learning of natural languages.

 

Quick Read (beta)

A Mathematical Model for Linguistic Universals

Weinan E1,2, Yajun Zhou2

1Department of Mathematics & Program in Applied and Computational Mathematics,
Princeton University, Princeton, NJ 08544, USA
2Beijing Institute of Big Data Research, Beijing 100871, P. R. China

Corresponding authors. E-mail: [email protected] (W.E), [email protected] (Y.Z.)
\tikzset

>=stealth \pgfplotssetscaled y ticks=false

Inspired by chemical kinetics and neurobiology, we propose a mathematical theory for pattern recurrence in text documents, applicable to a wide variety of languages. We present a Markov model at the discourse level for Steven Pinker’s “mentalese”, or chains of mental states that transcend the spoken/written forms. Such (potentially) universal temporal structures of textual patterns lead us to a language-independent semantic representation, or a translationally-invariant word embedding, thereby forming the common ground for both comprehensibility within a given language and translatability between different languages. Applying our model to documents of moderate lengths, without relying on external knowledge bases, we reconcile Noam Chomsky’s “poverty of stimulus” paradox with statistical learning of natural languages.

We human beings distinguish ourselves from other animals (?, ?, ?), in that our brain development (?, ?, ?) enables us to convey sophisticated ideas and to share individual experiences, via languages (?, ?, ?). Texts written in natural languages constitute a major medium that perpetuates our civilizations (?), as a cumulative body of knowledge. The quantitative mechanism underlying the mental faculties of language has long been a difficult problem for anthropologists, linguists, neurobiologists and psychologists (?, ?, ?, ?, ?), before attracting the attention of computer and data scientists (?, ?, ?, ?, ?, ?), in the recent wave of artificial intelligence. Instead of marveling at the partial success of data-hungry approaches (?, ?, ?, ?) to machine learning, we still crave for a cost-effective, interpretable and universal algorithm for understanding natural languages—one that mimics language acquisition and knowledge accumulation during early childhood, based on limited resources, as in Chomsky’s “poverty of stimulus” scenario (?, ?). Without filling the gap of data sizes, one cannot satisfactorily answer nativists’ criticism (?) against empiricists’ statistical models for natural languages.

Rising to the challenges outlined above, we perform a detailed mathematical analysis for computable “linguistic universals”—statistical patterns common to a wide range of human languages. On the theoretical side, we will present a stochastic “mentalese” model that depicts the timecourse of Markov states behind individual concepts. On the practical side, we will demonstrate (through automated word translation and question answering) that word’s meaning can be numerically characterized by moderate-sized Markov neural networks, even when there is relatively scant data input.

Our Markov model explains, up to acceptably small error margins, how our innate language faculties (nature) may help us understand the world, by connecting dots of our past experiences (nurture), irrespective of our mother tongue. Bridging nature to nurture, our stochastic algorithm for Markov neural semantics reconciles the views of nativists and empiricists.

Heuristic background

𝖶i={happier, happily, happiness, happy}, 𝖶j={marriage, married, marry}

... LOREM IPSUM HAPPY DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING UNHAPPY ELIT, HAPPINESS SED HAPPY DO HAPPY EIUSMOD TEMPOR HAPPIER, INCIDIDUNT UT ...... LOREM IPSUM HAPPYHAPPINESST AMET, HAPPYHAPPINESSETUR ADIPISCING UNHAPPY ELIT, HAPPINESS SED HAPPY DO HAPPYHAPPINESSLiiLiiLii... LOREM IPSUM, MARRIAGE DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING MARRIED ELIT, MARRY SED HAPPILY DO HAPPILY EIUSMOD TEMPOR MARRIED INCIDIDUNT ...... LOREM IPSUM HAPPINESS DOLOR SIT AMET, HAPPYHAPPINESSETUR ADIPISCING UNHAPPY ELIT, UNHAPPY SED HAPPIER DO HAPPYHAPPINESSLijLij... LOREM IPSUM HAPPINESS DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING UNHAPPY ELIT, ... LOREM IPSUM HAPPINESS DOLOR SIT AMET, HAPPYHAPPINESSLij
Figure 1: Counting effective transitions between textual patterns. A transition from 𝖶i to 𝖶j is considered effective, if the underlined text fragment in between contains no occurrences of 𝖶i, and lasts longer than the longest word in 𝖶i𝖶j. The reduced fragment length Lij (measured in the number of letters, punctuation marks and white spaces) discounts the length of the longest word in 𝖶i𝖶j. We count waiting times in Lij, so as to ignore kinetic features (?) on the short time scales in the Friederici hierarchy, which may vary from language to language.

Languages differ in their phonemic repertoires (“elementary particles” in Jakobson’s (?) terms), word morphologies (“atoms”) and syntactic structures (“molecules”), corresponding to the three short time scales (phonological processing level, lexical level, and sentence level) in the Friederici hierarchy (?), which are mapped to different brain regions in functional magnetic resonance imaging (fMRI). These three Friederici scales exhibit no universal linguistic patterns and bear no semantic significance. Ferdinand de Saussure’s foundational work (?) rules out semantic dependence on phonological representation (except for a limited set of onomatopoeias), while the inherent meaning of a word is affected by neither its morphological parameters (say, singular vs. plural, present vs. past) nor its syntactic rôles (say, subject vs. object, active vs. passive).

Based on the foregoing arguments, one might speculate that universal semantic content, or Pinker’s “mentalese” (?), may only exist at the discourse level (“bulk materials”, if we extrapolate Jakobson’s (?) metaphor), namely, on the longest time scale in Friederici’s neurobiological hierarchy (?). In this work, we turn such a qualitative speculation into a quantitative model (?). Concretely speaking, we observe the following statistical features of textual patterns (clusters of words that are morphologically related, see Fig. 1 and Fig. 3B for examples) shared by many languages in common:

  1. 1.

    The recurrence behavior of most textual patterns is consistent with time series generated by a certain Markov process, on the longest, as opposed to the shortest (?), neuro-linguistic time scale;

  2. 2.

    Recurrence kinetics of a given concept nearly remains independent of the language in which it is expressed;

  3. 3.

    Kinetic data quantify the semantic distance between different textual patterns, thus allowing us to construct semantic fields by statistical computations.

These long-range temporal features of documents written in various languages, in our opinion, point to a universal kinetic mechanism that defines the semantic rôles of individual nodes in a web of words, mathematically and linguistically.

A B {tikzpicture} [scale=.8] {axis}[xmin=-5,xmax=105,xlabel style=yshift=.2cm,xlabel=n,ylabel=Vector components ,small,height=4.3cm,width=4.3cm,ymin=0,ymax=.05 , yticklabel style= /pgf/number format/fixed, /pgf/number format/precision=5 , minor y tick num=4, minor x tick num=4 ] \addplot[ draw=blue,ultra thick] coordinates (1,0.0427412)(2,0.0359023)(3,0.0344021)(4,0.0221458)(5,0.0220223)(6,0.0212803)(7,0.0204983)(8,0.0197513)(9,0.0184415)(10,0.0187672)(11,0.0178574)(12,0.0185852)(13,0.017717)(14,0.0163402)(15,0.0170398)(16,0.016625)(17,0.0171598)(18,0.013833)(19,0.0140909)(20,0.0153123)(21,0.012413)(22,0.0109183)(23,0.010911)(24,0.0112457)(25,0.0100797)(26,0.0101964)(27,0.0105124)(28,0.00983249)(29,0.0103193)(30,0.0102999)(31,0.0101545)(32,0.0102876)(33,0.00976635)(34,0.00979521)(35,0.009708)(36,0.0102891)(37,0.00980242)(38,0.00909617)(39,0.00933516)(40,0.00872185)(41,0.0089552)(42,0.00912276)(43,0.00810502)(44,0.00800476)(45,0.00801182)(46,0.00777778)(47,0.00777517)(48,0.00743615)(49,0.00740595)(50,0.00739197)(51,0.00791219)(52,0.0080568)(53,0.00741848)(54,0.00703084)(55,0.00746606)(56,0.00738363)(57,0.00750104)(58,0.00725826)(59,0.00715664)(60,0.00642826)(61,0.00639794)(62,0.00689197)(63,0.00690154)(64,0.00611991)(65,0.00605218)(66,0.00617167)(67,0.00611607)(68,0.00574138)(69,0.00604912)(70,0.00599906)(71,0.00580029)(72,0.00653307)(73,0.005882)(74,0.00555355)(75,0.00592127)(76,0.00562434)(77,0.00563314)(78,0.00551392)(79,0.00509913)(80,0.00547119)(81,0.00600591)(82,0.00538236)(83,0.00572755)(84,0.00536065)(85,0.00558659)(86,0.00543585)(87,0.00505737)(88,0.00513226)(89,0.00488484)(90,0.00509212)(91,0.00484774)(92,0.00526973)(93,0.00493608)(94,0.00488072)(95,0.0048343)(96,0.00500919)(97,0.00450346)(98,0.00514728)(99,0.00472475)(100,0.00491242); \addplot[ draw=magenta,ultra thick,densely dotted] coordinates (1,0.0436739)(2,0.0365061)(3,0.0339501)(4,0.0232261)(5,0.0218925)(6,0.0216147)(7,0.0208924)(8,0.0190587)(9,0.0186142)(10,0.0185031)(11,0.0179474)(12,0.0172807)(13,0.0170584)(14,0.0163916)(15,0.0162249)(16,0.0161138)(17,0.0161694)(18,0.0148914)(19,0.0147802)(20,0.0144469)(21,0.0122798)(22,0.0112241)(23,0.0111685)(24,0.0110574)(25,0.0107796)(26,0.0105018)(27,0.0104462)(28,0.0102795)(29,0.0102239)(30,0.0101684)(31,0.0101684)(32,0.0100017)(33,0.00961271)(34,0.00961271)(35,0.00955715)(36,0.00989054)(37,0.00950158)(38,0.00933489)(39,0.00927932)(40,0.00922376)(41,0.00905707)(42,0.00894594)(43,0.00844585)(44,0.00822359)(45,0.00816803)(46,0.0080569)(47,0.0080569)(48,0.00794577)(49,0.00772351)(50,0.00766794)(51,0.00761238)(52,0.00750125)(53,0.00750125)(54,0.00750125)(55,0.00750125)(56,0.00739012)(57,0.00733456)(58,0.00705673)(59,0.00700117)(60,0.00666778)(61,0.00666778)(62,0.00666778)(63,0.00666778)(64,0.00633439)(65,0.00616769)(66,0.00616769)(67,0.00611213)(68,0.006001)(69,0.006001)(70,0.00594544)(71,0.00594544)(72,0.00594544)(73,0.00583431)(74,0.00577874)(75,0.00572318)(76,0.00561205)(77,0.00550092)(78,0.00538979)(79,0.00533422)(80,0.00533422)(81,0.00538979)(82,0.00533422)(83,0.00527866)(84,0.00527866)(85,0.00527866)(86,0.00527866)(87,0.00522309)(88,0.00522309)(89,0.00511196)(90,0.00511196)(91,0.0050564)(92,0.00500083)(93,0.00500083)(94,0.0050564)(95,0.00500083)(96,0.00494527)(97,0.0048897)(98,0.0048897)(99,0.00483414)(100,0.00477857); \legend𝝅ˇ,𝝅; {tikzpicture} [scale=.8] {axis}[xmin=-7.5,xmax=0.5,xlabel style=yshift=.2cm,xlabel=log|λ(𝐏ˇ)|,ylabel=Cumul. counts,small,height=3.5cm,width=4cm,ymin=-5,ymax=105, minor x tick num = 1 , minor y tick num = 4 ] \addplot[const plot, draw=blue,thin] plot coordinates (-9.60255,0)(-8.60255,1)(-6.71989,2)(-6.69284,3)(-6.69284,4)(-6.23769,5)(-6.23769,6)(-6.1555,7)(-6.1555,8)(-6.15359,9)(-6.15359,10)(-6.06939,11)(-6.06939,12)(-5.98299,13)(-5.98299,14)(-5.76582,15)(-5.76582,16)(-5.67066,17)(-5.67066,18)(-5.55432,19)(-5.54653,20)(-5.54653,21)(-5.48568,22)(-5.48568,23)(-5.44847,24)(-5.44847,25)(-5.35716,26)(-5.35716,27)(-5.31952,28)(-5.31952,29)(-5.31694,30)(-5.31694,31)(-5.29651,32)(-5.29651,33)(-5.25386,34)(-5.25386,35)(-5.23823,36)(-5.23823,37)(-5.22312,38)(-5.16804,39)(-5.16804,40)(-5.15759,41)(-5.15759,42)(-5.091,43)(-5.091,44)(-5.06026,45)(-5.06026,46)(-5.06002,47)(-5.06002,48)(-5.05806,49)(-5.05806,50)(-5.03161,51)(-5.02479,52)(-5.02479,53)(-4.96514,54)(-4.96514,55)(-4.94638,56)(-4.94638,57)(-4.93113,58)(-4.93113,59)(-4.91171,60)(-4.91171,61)(-4.89796,62)(-4.89796,63)(-4.89273,64)(-4.89273,65)(-4.83405,66)(-4.83405,67)(-4.73089,68)(-4.73089,69)(-4.70593,70)(-4.70593,71)(-4.70431,72)(-4.70431,73)(-4.64997,74)(-4.64997,75)(-4.58039,76)(-4.58039,77)(-4.46919,78)(-4.46919,79)(-4.41246,80)(-4.21215,81)(-4.21215,82)(-4.1256,83)(-4.1256,84)(-3.83236,85)(-3.83236,86)(-3.78693,87)(-3.78693,88)(-3.78497,89)(-3.78497,90)(-3.36602,91)(-3.3166,92)(-3.3166,93)(-3.08806,94)(-3.08806,95)(-2.85641,96)(-2.63599,97)(-2.30467,98)(-2.14544,99)(0.,100)(0.,100); \addplot[const plot, draw=orange!50!yellow,thin] plot coordinates (-10.7025,0)(-9.70254,1)(-6.97026,2)(-6.83086,3)(-6.17283,4)(-6.13546,5)(-6.05554,6)(-6.02619,7)(-6.02619,8)(-5.88851,9)(-5.88851,10)(-5.78722,11)(-5.78722,12)(-5.73496,13)(-5.73496,14)(-5.62732,15)(-5.62732,16)(-5.61596,17)(-5.61596,18)(-5.60439,19)(-5.60439,20)(-5.49642,21)(-5.49642,22)(-5.44225,23)(-5.44225,24)(-5.39887,25)(-5.39887,26)(-5.39739,27)(-5.39739,28)(-5.33575,29)(-5.33575,30)(-5.28906,31)(-5.21107,32)(-5.21107,33)(-5.15066,34)(-5.15066,35)(-5.14974,36)(-5.14974,37)(-5.14752,38)(-5.14752,39)(-5.13523,40)(-5.13523,41)(-5.13437,42)(-5.13184,43)(-5.13184,44)(-5.08877,45)(-5.08877,46)(-5.06909,47)(-5.06909,48)(-5.06067,49)(-5.06067,50)(-5.04885,51)(-5.04885,52)(-5.02879,53)(-5.02879,54)(-5.00133,55)(-5.00133,56)(-4.986,57)(-4.986,58)(-4.96484,59)(-4.96484,60)(-4.91339,61)(-4.83398,62)(-4.83398,63)(-4.80904,64)(-4.80904,65)(-4.80473,66)(-4.80473,67)(-4.75738,68)(-4.69589,69)(-4.64885,70)(-4.64885,71)(-4.62598,72)(-4.62598,73)(-4.62206,74)(-4.51145,75)(-4.51145,76)(-4.46195,77)(-4.44569,78)(-4.33745,79)(-4.29365,80)(-4.29365,81)(-4.09204,82)(-4.09204,83)(-4.08231,84)(-4.08231,85)(-3.81074,86)(-3.81074,87)(-3.801,88)(-3.60207,89)(-3.60207,90)(-3.40818,91)(-3.35394,92)(-3.35394,93)(-2.93705,94)(-2.90322,95)(-2.81764,96)(-2.53974,97)(-2.16768,98)(-2.08388,99)(0.,100)(0.,100); \addplot[const plot, draw=green, thin] plot coordinates (-8.79316,0)(-7.79316,1)(-6.75797,2)(-6.61022,3)(-6.61022,4)(-6.47272,5)(-6.47272,6)(-6.0236,7)(-5.96303,8)(-5.96303,9)(-5.9153,10)(-5.9153,11)(-5.83552,12)(-5.83552,13)(-5.81416,14)(-5.81416,15)(-5.80061,16)(-5.64156,17)(-5.64156,18)(-5.58745,19)(-5.564,20)(-5.564,21)(-5.55927,22)(-5.55927,23)(-5.55281,24)(-5.55281,25)(-5.52048,26)(-5.51195,27)(-5.51195,28)(-5.39437,29)(-5.39437,30)(-5.39236,31)(-5.39236,32)(-5.37015,33)(-5.33845,34)(-5.33845,35)(-5.32709,36)(-5.32709,37)(-5.31967,38)(-5.27754,39)(-5.27754,40)(-5.25642,41)(-5.25642,42)(-5.22265,43)(-5.22265,44)(-5.13865,45)(-5.13865,46)(-5.12128,47)(-5.12128,48)(-5.11585,49)(-5.11585,50)(-5.07714,51)(-5.07714,52)(-5.07597,53)(-5.07597,54)(-5.06297,55)(-5.06297,56)(-4.98985,57)(-4.98985,58)(-4.97372,59)(-4.97372,60)(-4.9733,61)(-4.9733,62)(-4.95968,63)(-4.95968,64)(-4.92635,65)(-4.92635,66)(-4.91569,67)(-4.91569,68)(-4.81722,69)(-4.81722,70)(-4.67325,71)(-4.67325,72)(-4.60852,73)(-4.60852,74)(-4.56349,75)(-4.56349,76)(-4.53975,77)(-4.53975,78)(-4.29814,79)(-4.29814,80)(-4.2556,81)(-4.2556,82)(-4.22271,83)(-4.20521,84)(-3.87155,85)(-3.87155,86)(-3.76321,87)(-3.76321,88)(-3.60627,89)(-3.60627,90)(-3.50899,91)(-3.50899,92)(-3.24141,93)(-2.98135,94)(-2.72734,95)(-2.61449,96)(-2.40471,97)(-2.20349,98)(-2.00457,99)(0.,100)(0.,100); \addplot[const plot, draw=red, thin] plot coordinates (-9.53274,0)(-8.53274,1)(-7.47379,2)(-6.69295,3)(-6.64602,4)(-6.64602,5)(-6.2073,6)(-6.2073,7)(-6.07049,8)(-6.07049,9)(-5.89144,10)(-5.89144,11)(-5.87072,12)(-5.86749,13)(-5.86749,14)(-5.84922,15)(-5.84922,16)(-5.83394,17)(-5.64971,18)(-5.64971,19)(-5.63602,20)(-5.5611,21)(-5.5611,22)(-5.55739,23)(-5.55739,24)(-5.53911,25)(-5.53911,26)(-5.50563,27)(-5.50563,28)(-5.4402,29)(-5.35384,30)(-5.35384,31)(-5.34879,32)(-5.34879,33)(-5.34845,34)(-5.34845,35)(-5.3305,36)(-5.3305,37)(-5.31933,38)(-5.31933,39)(-5.26649,40)(-5.26649,41)(-5.25953,42)(-5.25953,43)(-5.18556,44)(-5.13629,45)(-5.13629,46)(-5.09513,47)(-5.09513,48)(-5.08742,49)(-5.08742,50)(-5.0613,51)(-5.0613,52)(-4.98264,53)(-4.98264,54)(-4.95178,55)(-4.95178,56)(-4.94544,57)(-4.94544,58)(-4.88788,59)(-4.88788,60)(-4.88454,61)(-4.88454,62)(-4.86405,63)(-4.71523,64)(-4.70563,65)(-4.70563,66)(-4.61837,67)(-4.61837,68)(-4.58881,69)(-4.58881,70)(-4.50143,71)(-4.50143,72)(-4.45193,73)(-4.45193,74)(-4.43989,75)(-4.35,76)(-4.35,77)(-4.34518,78)(-4.32288,79)(-4.19505,80)(-4.16009,81)(-4.16009,82)(-4.09878,83)(-4.09878,84)(-3.86925,85)(-3.86925,86)(-3.81514,87)(-3.73216,88)(-3.73216,89)(-3.6887,90)(-3.6887,91)(-3.51576,92)(-3.3329,93)(-3.12149,94)(-3.05361,95)(-3.03756,96)(-2.57614,97)(-2.266,98)(-2.15783,99)(0.,100)(1.,100); {tikzpicture} [scale=.8] {axis}[yticklabels=,xmin=-1.15,xmax=1.15,xlabel style=yshift=.2cm,xlabel=1πargλ(𝐏ˇ),ylabel=,small,height=3.5cm,width=4cm,ymin=-5,ymax=105, minor x tick num = 4 , minor y tick num = 4 ] \addplot[const plot, draw=blue,thin] plot coordinates (-1.15,0)(-0.971627,1)(-0.962331,2)(-0.931205,3)(-0.914197,4)(-0.878971,5)(-0.849229,6)(-0.828944,7)(-0.810065,8)(-0.785128,9)(-0.771484,10)(-0.738708,11)(-0.727283,12)(-0.672989,13)(-0.672651,14)(-0.645757,15)(-0.628345,16)(-0.595224,17)(-0.565528,18)(-0.561131,19)(-0.461625,20)(-0.459231,21)(-0.446359,22)(-0.443339,23)(-0.354907,24)(-0.348111,25)(-0.318543,26)(-0.298846,27)(-0.272168,28)(-0.210188,29)(-0.178878,30)(-0.172765,31)(-0.1433,32)(-0.134751,33)(-0.102026,34)(-0.0879207,35)(-0.0664715,36)(-0.0633885,37)(-0.0633652,38)(-0.0538922,39)(-0.0430376,40)(-0.03665,41)(-0.0314877,42)(-0.025645,43)(-0.0115001,44)(0.,45)(0.,46)(0.,47)(0.,48)(0.,49)(0.,50)(0.,51)(0.,52)(0.,53)(0.0115001,54)(0.025645,55)(0.0314877,56)(0.03665,57)(0.0430376,58)(0.0538922,59)(0.0633652,60)(0.0633885,61)(0.0664715,62)(0.0879207,63)(0.102026,64)(0.134751,65)(0.1433,66)(0.172765,67)(0.178878,68)(0.210188,69)(0.272168,70)(0.298846,71)(0.318543,72)(0.348111,73)(0.354907,74)(0.443339,75)(0.446359,76)(0.459231,77)(0.461625,78)(0.561131,79)(0.565528,80)(0.595224,81)(0.628345,82)(0.645757,83)(0.672651,84)(0.672989,85)(0.727283,86)(0.738708,87)(0.771484,88)(0.785128,89)(0.810065,90)(0.828944,91)(0.849229,92)(0.878971,93)(0.914197,94)(0.931205,95)(0.962331,96)(0.971627,97)(1.,98)(1.,99)(1.,100)(1.5,100); \addplot[const plot, draw=orange!50!yellow,thin] plot coordinates (-1.15,0)(-0.916107,1)(-0.875047,2)(-0.858074,3)(-0.831626,4)(-0.823231,5)(-0.786729,6)(-0.762415,7)(-0.761265,8)(-0.714303,9)(-0.706217,10)(-0.701381,11)(-0.641877,12)(-0.612211,13)(-0.609151,14)(-0.608027,15)(-0.557074,16)(-0.537947,17)(-0.496309,18)(-0.436892,19)(-0.40743,20)(-0.390132,21)(-0.35458,22)(-0.339094,23)(-0.302773,24)(-0.293819,25)(-0.264424,26)(-0.217111,27)(-0.184558,28)(-0.146021,29)(-0.138377,30)(-0.123565,31)(-0.0868932,32)(-0.0862093,33)(-0.0848761,34)(-0.0507398,35)(-0.0406853,36)(-0.0293969,37)(-0.0287539,38)(0.,39)(0.,40)(0.,41)(0.,42)(0.,43)(0.,44)(0.,45)(0.,46)(0.,47)(0.,48)(0.,49)(0.,50)(0.,51)(0.,52)(0.,53)(0.0287539,54)(0.0293969,55)(0.0406853,56)(0.0507398,57)(0.0848761,58)(0.0862093,59)(0.0868932,60)(0.123565,61)(0.138377,62)(0.146021,63)(0.184558,64)(0.217111,65)(0.264424,66)(0.293819,67)(0.302773,68)(0.339094,69)(0.35458,70)(0.390132,71)(0.40743,72)(0.436892,73)(0.496309,74)(0.537947,75)(0.557074,76)(0.608027,77)(0.609151,78)(0.612211,79)(0.641877,80)(0.701381,81)(0.706217,82)(0.714303,83)(0.761265,84)(0.762415,85)(0.786729,86)(0.823231,87)(0.831626,88)(0.858074,89)(0.875047,90)(0.916107,91)(1.,92)(1.,93)(1.,94)(1.,95)(1.,96)(1.,97)(1.,98)(1.,99)(1.,100)(1.05,100); \addplot[const plot, draw=green, thin] plot coordinates (-1.15,0)(-0.973973,1)(-0.966248,2)(-0.940293,3)(-0.88305,4)(-0.879144,5)(-0.846465,6)(-0.83379,7)(-0.816525,8)(-0.802595,9)(-0.779956,10)(-0.754851,11)(-0.752593,12)(-0.735184,13)(-0.668978,14)(-0.6634,15)(-0.655112,16)(-0.638096,17)(-0.625817,18)(-0.553111,19)(-0.471373,20)(-0.460891,21)(-0.434716,22)(-0.38324,23)(-0.321649,24)(-0.309054,25)(-0.297643,26)(-0.256745,27)(-0.220319,28)(-0.201366,29)(-0.170479,30)(-0.165997,31)(-0.133734,32)(-0.117182,33)(-0.113244,34)(-0.092505,35)(-0.0594225,36)(-0.0502421,37)(-0.0336255,38)(-0.0324377,39)(-0.0085419,40)(-0.00745288,41)(0.,42)(0.,43)(0.,44)(0.,45)(0.,46)(0.,47)(0.,48)(0.,49)(0.,50)(0.,51)(0.,52)(0.,53)(0.,54)(0.,55)(0.00745288,56)(0.0085419,57)(0.0324377,58)(0.0336255,59)(0.0502421,60)(0.0594225,61)(0.092505,62)(0.113244,63)(0.117182,64)(0.133734,65)(0.165997,66)(0.170479,67)(0.201366,68)(0.220319,69)(0.256745,70)(0.297643,71)(0.309054,72)(0.321649,73)(0.38324,74)(0.434716,75)(0.460891,76)(0.471373,77)(0.553111,78)(0.625817,79)(0.638096,80)(0.655112,81)(0.6634,82)(0.668978,83)(0.735184,84)(0.752593,85)(0.754851,86)(0.779956,87)(0.802595,88)(0.816525,89)(0.83379,90)(0.846465,91)(0.879144,92)(0.88305,93)(0.940293,94)(0.966248,95)(0.973973,96)(1.,97)(1.,98)(1.,99)(1.,100)(1.05,100); \addplot[const plot, draw=red, thin] plot coordinates (-1.15,0)(-0.977222,1)(-0.894745,2)(-0.866586,3)(-0.845459,4)(-0.833876,5)(-0.803931,6)(-0.738349,7)(-0.717334,8)(-0.68656,9)(-0.668191,10)(-0.646931,11)(-0.611168,12)(-0.606452,13)(-0.572246,14)(-0.5399,15)(-0.501242,16)(-0.482088,17)(-0.423407,18)(-0.350226,19)(-0.350124,20)(-0.337503,21)(-0.293672,22)(-0.249221,23)(-0.242842,24)(-0.214385,25)(-0.208675,26)(-0.19908,27)(-0.177661,28)(-0.145834,29)(-0.133269,30)(-0.131016,31)(-0.10867,32)(-0.107081,33)(-0.0809228,34)(-0.0716183,35)(-0.0645069,36)(-0.0630668,37)(-0.0327464,38)(0.,39)(0.,40)(0.,41)(0.,42)(0.,43)(0.,44)(0.,45)(0.,46)(0.,47)(0.,48)(0.,49)(0.,50)(0.,51)(0.,52)(0.,53)(0.,54)(0.,55)(0.,56)(0.,57)(0.,58)(0.0327464,59)(0.0630668,60)(0.0645069,61)(0.0716183,62)(0.0809228,63)(0.107081,64)(0.10867,65)(0.131016,66)(0.133269,67)(0.145834,68)(0.177661,69)(0.19908,70)(0.208675,71)(0.214385,72)(0.242842,73)(0.249221,74)(0.293672,75)(0.337503,76)(0.350124,77)(0.350226,78)(0.423407,79)(0.482088,80)(0.501242,81)(0.5399,82)(0.572246,83)(0.606452,84)(0.611168,85)(0.646931,86)(0.668191,87)(0.68656,88)(0.717334,89)(0.738349,90)(0.803931,91)(0.833876,92)(0.845459,93)(0.866586,94)(0.894745,95)(0.977222,96)(1.,97)(1.,98)(1.,99)(1.,100)(1.5,100);    —– English —– French —– Russian —– Finnish   
Figure 2: (A) Dominant eigenvector 𝝅ˇ (N=100), computed from Jane Austen’s Pride and Prejudice, in comparison with 𝝅, the list of normalized frequencies for top 100 textual patterns. (B) Distributions of eigenvalues λ in the spectra σ(𝐏ˇ) (N=100), as estimated from four parallel versions of Pride and Prejudice. (See fig. S3 for more versions.) The prominence of nearly real eigenvalues [satisfying argλ(𝐏ˇ)𝟎] is not found in a Markov matrix with random entries, whose spectrum is uniformly distributed over a circular disk centered at the origin (?, ?).

Transition probability, spectral invariance, pattern recurrence

To begin, we show how to numerically construct a Markov matrix from a realistic text document, and how this Markov model enables us to interpret long-range temporal structures that are common to a wide variety of languages.

In our kinetic language model, we assume that (the gists of) texts are generated by a discrete Markov process on a semantic web with N nodes, each of which represents a textual pattern 𝖶k—a set of morphologically related content words (?), indexed by an integer k{1,2,,N}—occurring in a given document (?). The stochastic hoppings between the nodes are governed by an ergodic Markov transition matrix 𝐏=(𝐩𝐢𝐣)𝟏𝐢,𝐣𝐍, which (putatively) caricatures the dynamics of mental activities (?) underlying a text, on the time scale of discourse level. We emphasize that our Markov model for the long-range behavior of human languages is independent of Chomsky’s transformational generative grammar (?), the latter of which characterizes short-range syntactic features as hierarchical trees without Markovian structures.

One can estimate transition probabilities between textual patterns on short time scales, by simply counting unigrams and bigrams in a large corpus (?). To estimate long-range transition probabilities from documents of moderate lengths (e.g. a literary piece, a Wikipedia page), i.e. to learn despite a “poverty of the stimulus”, we need some makeshift strategies.

Given a timecourse of molecular states in a biochemical reaction, we can partially reconstruct kinetic information (?) from the probability distribution for the waiting time between consecutive encounters of the same molecular state. Carrying this waiting time analysis a little further, we put a crude estimate of the transition probability pij as

pˇij:=nije-logLijj=1Nnije-logLij, (1)

where nij counts the number of effective transitions from 𝖶i to 𝖶j, and Lij is a statistic that measures the reduced fragment lengths of such transitions (Fig. 1). On the diagonal, the Gibbs weights niie-logLii hearken back to the TF-IDF measure of word importance (?, ?). Off the diagonal, the ensemble average logLij weighs the cost of biochemical activation energy required to jump from 𝖶i to 𝖶j, so that the memorability factor e-logLij can be viewed as a naïve estimate for the rate of associative learning per copy number nij, in Hebb’s fire-and-wire process (?). It is worth noting that our estimate of pˇij was based on statistical analysis of the text in situ, without digesting a document (or small parts of it) as a scrambled bag of words, a procedure implemented in conventional algorithms (?, ?, ?).

The empirical Markov matrix 𝐏ˇ=(𝐩ˇ𝐢𝐣)𝟏𝐢,𝐣𝐍 has some desirable properties.

A B {tikzpicture} [scale=.8] {semilogyaxis}[xmin=.5,xmax=5.5,xtick=data,xlabel style=yshift=.2cm,xlabel=n,ylabel=rn ,small,height=3.5cm,width=4cm,ymin=0.0000005,ymax=.2 , yticklabel style= /pgf/number format/fixed, /pgf/number format/precision=5 , minor y tick num =3,legend style=font=,legend style= legend columns=2, cells=anchor=west,at=(1,-0.35), font=,legend style=row sep=-4.75pt, ] \addplot[ draw=blue,thick,mark=o,mark size=2pt] coordinates (1,0.06581611214634)(2,0.00385637191009)(3,0.00041145081506)(4,0.0000496887858)(5,0.0000062374825); \addplot[ draw=orange!50!yellow,thick,mark=x,mark size=3pt] coordinates (1,0.07244072260401)(2,0.00423860781595)(3,0.00045132032522)(4,0.0000530568982)(5,0.0000063604116); \addplot[ draw=green,thick,mark=o,mark size=2pt] coordinates (1,0.06621736438190)(2,0.00359275792461)(3,0.00032962129046)(4,0.0000341430140)(5,0.0000036555833); \addplot[ draw=red,thick,mark=x,mark size=3pt] coordinates (1,0.06621736438190)(2,0.00359275792461)(3,0.00032962129046)(4,0.0000341430140)(5,0.0000036555833); \legendEnglish,French, Russian, Finnish,gibberish; C D \polygon*(2,12)(40,50)(2,50)AREA FORBIDDEN BYJENSEN’S INEQUALITY Eliza(|beth|beth’s) Darcy(|’s) Bennet(|’s|s) Bingley(|’s|s|s’) Jane(|’s) Wickham(|’s) Collins(|’s) happ(ily|iness|y|ier|iest) Lydia(|’s) Catherine(|’s) lov(e|e’|ed|ely|es|ing|eliness|e-making|er|ers) Gardiner(|’s|s) Lizzy(|’s) Charlotte(|’s) Lucas(|’s|es|es’) danc(e|ed|es|ing) Kitty(|’s) Chapter Rosings William(|’s) handsome(|ly|r|st) beaut(iful|ies|y) Forster(|’s|s) Mary(|’s) Bourgh(|’s) Fitzwilliam(|’s) Hurst(|’s|s)7891011678910logLii 𝖶ien










                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    















































                                                                                                     00.20.40.60.81
Figure 3: (A) Precipitous decays of rn:=121i,j100|πˇipˇij(n)-πˇjpˇji(n)| from the initial value r10.07, for matrix powers 𝐏ˇn=(pˇij(n))1i,j100 constructed from four versions of Pride and Prejudice. (In contrast, one has r10.33 for a random 100×100 Markov matrix.) Such quick relaxations support our working hypothesis about detailed balance πipij=πjpji. (B) Some textual patterns 𝖶i sorted by descending nii, with font size proportional to the square root of the memorability factor e-logLii (see scheme S2 for word stacking methods, and figs. S7–S20 for further examples). Topical (i.e. significantly non-Poissonian) patterns painted in red (resp. green) reside below (resp. above) the critical line of Poissonian banality (blue line in C), where the deviations exceed the error margin prescribed in (5). (C) Recurrence statistics for textual patterns (gray, red and green dots with radii 14nii). Labels for proper names and some literary motifs are attached next to the corresponding colored dots. By Jensen’s inequality, all the data points must sit below the green dashed line with unit slope and zero intercept. Actually, almost all data points lie beneath [up to error margin prescribed in (5)] the blue line with unit slope and intercept -γ0, a phenomenon that is predicted by detailed balance [cf. (4)]. (D) Ružička similarities sR(𝐛𝐢𝐞𝐧,𝐛𝐣𝐟𝐫) between selected topics (sorted by descending nii20) in English and French versions (see tables S1 and S2 for stylistic variations in translations) of Pride and Prejudice. Rows and columns with maximal sR(𝐛𝐢𝐞𝐧,𝐛𝐣𝐟𝐫) less than 0.7 are not shown. Correct matchings are indicated by green cross-hairs.
A 01215.56.06.57.07.5