Geolocation differences of language use in urban areas

Abstract

The explosion in the availability of natural language data in the era ofsocial media has given rise to a host of applications such as sentimentanalysis and opinion mining. Simultaneously, the growing availability ofprecise geolocation information is enabling visualization of global phenomenasuch as environmental changes and disease propagation. Opportunities fortracking spatial variations in language use, however, have largely beenoverlooked, especially on small spatial scales. Here we explore the use ofTwitter data with precise geolocation information to resolve spatial variationsin language use on an urban scale down to single city blocks. We identifyseveral categories of language tokens likely to show distinctive patterns ofuse and develop quantitative methods to visualize the spatial distributionsassociated with these patterns. Our analysis concentrates on comparison ofcontrasting pairs of Tweet distributions from the same category, each definedby a set of tokens. Our work shows that analysis of small-scale variations canprovide unique information on correlations between language use and socialcontext which are highly valuable to a wide range of fields from linguisticscience and commercial advertising to social services.

Quick Read (beta)

loading the full paper ...