Abstract
The social media platform provides an opportunity to gain valuable insightsinto user behaviour. Users mimic their internal feelings and emotions in adisinhibited fashion using natural language. Techniques in Natural LanguageProcessing have helped researchers decipher standard documents and culltogether inferences from massive amount of data. A representative corpus is aprerequisite for NLP and one of the challenges we face today is thenon-standard and noisy language that exists on the internet. Our work focuseson building a corpus from social media that is focused on detecting mentalillness. We use depression as a case study and demonstrate the effectiveness ofusing such a corpus for helping practitioners detect such cases. Our resultsshow a high correlation between our Social Media Corpus and the standard corpusfor depression.