Abstract
With more than 300 million people depressed worldwide, depression is a globalproblem. Due to access barriers such as social stigma, cost, and treatmentavailability, 60% of mentally-ill adults do not receive any mental healthservices. Effective and efficient diagnosis relies on detecting clinicalsymptoms of depression. Automatic detection of depressive symptoms wouldpotentially improve diagnostic accuracy and availability, leading to fasterintervention. In this work, we present a machine learning method for measuringthe severity of depressive symptoms. Our multi-modal method uses 3D facialexpressions and spoken language, commonly available from modern cell phones. Itdemonstrates an average error of 3.67 points (15.3% relative) on theclinically-validated Patient Health Questionnaire (PHQ) scale. For detectingmajor depressive disorder, our model demonstrates 83.3% sensitivity and 82.6%specificity. Overall, this paper shows how speech recognition, computer vision,and natural language processing can be combined to assist mental healthpatients and practitioners. This technology could be deployed to cell phonesworldwide and facilitate low-cost universal access to mental health care.