The optimality of syntactic dependency distances

  • 2020-07-30 09:40:41
  • Ramon Ferrer-i-Cancho, Carlos Gómez-Rodríguez, Juan Luis Esteban, Lluís Alemany-Puig
It is often stated that human languages, as other biological systems, areshaped by cost-cutting pressures but, to what extent? Attempts to quantify thedegree of optimality of languages by means of an optimality score have beenscarce and focused mostly on English. Here we recast the problem of theoptimality of the word order of a sentence as an optimization problem on aspatial network where the vertices are words, arcs indicate syntacticdependencies and the space is defined by the linear order of the words in thesentence. We introduce a new score to quantify the cognitive pressure to reducethe distance between linked words in a sentence. The analysis of sentences from93 languages representing 19 linguistic families reveals that half of languagesare optimized to a 70% or more. The score indicates that distances are notsignificantly reduced in a few languages and confirms two theoreticalpredictions, i.e. that longer sentences are more optimized and that distancesare more likely to be longer than expected by chance in short sentences. Wepresent a new hierarchical ranking of languages by their degree ofoptimization. The statistical advantages of the new score call for areevaluation of the evolution of dependency distance over time in languages aswell as the relationship between dependency distance and linguistic competence.Finally, the principles behind the design of the score can be extended todevelop more powerful normalizations of topological distances or physicaldistances in more dimensions.


