Global Syntactic Variation in Seven Languages: Towards a Computational Dialectology

  • 2021-04-03 03:40:21
  • Jonathan Dunn
  • 0


The goal of this paper is to provide a complete representation of regionallinguistic variation on a global scale. To this end, the paper focuses onremoving three constraints that have previously limited work withindialectology/dialectometry. First, rather than assuming a fixed and incompleteset of variants, we use Computational Construction Grammar to provide areplicable and falsifiable set of syntactic features. Second, rather thanassuming a specific area of interest, we use global language mapping based onweb-crawled and social media datasets to determine the selection of nationalvarieties. Third, rather than looking at a single language in isolation, wemodel seven major languages together using the same methods: Arabic, English,French, German, Portuguese, Russian, and Spanish. Results show that models foreach language are able to robustly predict the region-of-origin of held-outsamples better using Construction Grammars than using simpler syntacticfeatures. These global-scale experiments are used to argue that new methods incomputational sociolinguistics are able to provide more generalized models ofregional variation that are essential for understanding language variation andchange at scale.


Quick Read (beta)

loading the full paper ...