This paper measures similarity both within and between 84 language varietiesacross nine languages. These corpora are drawn from digital sources (the weband tweets), allowing us to evaluate whether such geo-referenced corpora arereliable for modelling linguistic variation. The basic idea is that, if eachsource adequately represents a single underlying language variety, then thesimilarity between these sources should be stable across all languages andcountries. The paper shows that there is a consistent agreement between thesesources using frequency-based corpus similarity measures. This provides furtherevidence that digital geo-referenced corpora consistently represent locallanguage varieties.