Abstract
Working from a dataset of 118 billion messages running from the start of 2009to the end of 2019, we identify and explore the relative daily use of over 150languages on Twitter. We find that eight languages comprise 80% of all tweets,with English, Japanese, Spanish, and Portuguese being the most dominant. Toquantify each language's level of being a Twitter `echo chamber' over time, wecompute the `contagion ratio': the balance of retweets to organic messages. Wefind that for the most common languages on Twitter there is a growing tendency,though not universal, to retweet rather than share new content. By the end of2019, the contagion ratios for half of the top 30 languages, including Englishand Spanish, had reached above 1---the naive contagion threshold. In 2019, thetop 5 languages with the highest average daily ratios were, in order, Thai(7.3), Hindi, Tamil, Urdu, and Catalan, while the bottom 5 were Russian,Swedish, Esperanto, Cebuano, and Finnish (0.26). Further, we show that overtime, the contagion ratios for most common languages are growing more stronglythan those of rare languages.