SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions

  • 2018-06-13 20:29:25
  • Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney, Nazli Goharian
  • 6

Abstract

Mental health is a significant and growing public health concern. As languageusage can be leveraged to obtain crucial insights into mental healthconditions, there is a need for large-scale, labeled, mental health-relateddatasets of users who have been diagnosed with one or more of such conditions.In this paper, we investigate the creation of high-precision patterns toidentify self-reported diagnoses of nine different mental health conditions,and obtain high-quality labeled data without the need for manual labelling. Weintroduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make itavailable. SMHD is a novel large dataset of social media posts from users withone or multiple mental health conditions along with matched control users. Weexamine distinctions in users' language, as measured by linguistic andpsychological variables. We further explore text classification methods toidentify individuals with mental conditions through their language.

 

Quick Read (beta)

loading the full paper ...