A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures

  • 2023-11-16 22:15:15
  • Vasilii A. Gromov, Nikita S. Borodin, Asel S. Yerbolova
The present paper introduces a novel object of study - a language fractalstructure. We hypothesize that a set of embeddings of all $n$-grams of anatural language constitutes a representative sample of this fractal set. (Weuse the term Hailonakea to refer to the sum total of all language fractalstructures, over all $n$). The paper estimates intrinsic (genuine) dimensionsof language fractal structures for the Russian and English languages. To thisend, we employ methods based on (1) topological data analysis and (2) a minimumspanning tree of a data graph for a cloud of points considered (Steeletheorem). For both languages, for all $n$, the intrinsic dimensions appear tobe non-integer values (typical for fractal sets), close to 9 for both of theRussian and English language.


