Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings

Abstract

Stretched words like `heellllp' or `heyyyyy' are a regular feature of spokenlanguage, often used to emphasize or exaggerate the underlying meaning of theroot word. While stretched words are rarely found in formal written languageand dictionaries, they are prevalent within social media. In this paper, weexamine the frequency distributions of `stretchable words' found in roughly 100billion tweets authored over an 8 year period. We introduce two centralparameters, `balance' and `stretch', that capture their main characteristics,and explore their dynamics by creating visual tools we call `balance plots' and`spelling trees'. We discuss how the tools and methods we develop here could beused to study the statistical patterns of mistypings and misspellings, alongwith the potential applications in augmenting dictionaries, improving languageprocessing, and in any area where sequence construction matters, such asgenetics.

Quick Read (beta)

loading the full paper ...