Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only

Abstract

Health departments have been deploying text classification systems for theearly detection of foodborne illness complaints in social media documents suchas Yelp restaurant reviews. Current systems have been successfully applied fordocuments in English and, as a result, a promising direction is to increasecoverage and recall by considering documents in additional languages, such asSpanish or Chinese. Training previous systems for more languages, however,would be expensive, as it would require the manual annotation of many documentsfor each new target language. To address this challenge, we considercross-lingual learning and train multilingual classifiers using only theannotations for English-language reviews. Recent zero-shot approaches based onpre-trained multi-lingual BERT (mBERT) have been shown to effectively alignlanguages for aspects such as sentiment. Interestingly, we show that thoseapproaches are less effective for capturing the nuances of foodborne illness,our public health application of interest. To improve performance without extraannotations, we create artificial training documents in the target languagethrough machine translation and train mBERT jointly for the source (English)and target language. Furthermore, we show that translating labeled documents tomultiple languages leads to additional performance improvements for some targetlanguages. We demonstrate the benefits of our approach through extensiveexperiments with Yelp restaurant reviews in seven languages. Our classifiersidentify foodborne illness complaints in multilingual reviews from the YelpChallenge dataset, which highlights the potential of our general approach fordeployment in health departments.

Quick Read (beta)

loading the full paper ...