A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19

Abstract

COVID-19 has resulted in an ongoing pandemic and as of 12 June 2020, hascaused more than 7.4 million cases and over 418,000 deaths. The highly dynamicand rapidly evolving situation with COVID-19 has made it difficult to accessaccurate, on-demand information regarding the disease. Online communities,forums, and social media provide potential venues to search for relevantquestions and answers, or post questions and seek answers from other members.However, due to the nature of such sites, there are always a limited number ofrelevant questions and responses to search from, and posted questions arerarely answered immediately. With the advancements in the field of naturallanguage processing, particularly in the domain of language models, it hasbecome possible to design chatbots that can automatically answer consumerquestions. However, such models are rarely applied and evaluated in thehealthcare domain, to meet the information needs with accurate and up-to-datehealthcare data. In this paper, we propose to apply a language model forautomatically answering questions related to COVID-19 and qualitativelyevaluate the generated responses. We utilized the GPT-2 language model andapplied transfer learning to retrain it on the COVID-19 Open Research Dataset(CORD-19) corpus. In order to improve the quality of the generated responses,we applied 4 different approaches, namely tf-idf, BERT, BioBERT, and USE tofilter and retain relevant sentences in the responses. In the performanceevaluation step, we asked two medical experts to rate the responses. We foundthat BERT and BioBERT, on average, outperform both tf-idf and USE inrelevance-based sentence filtering tasks. Additionally, based on the chatbot,we created a user-friendly interactive web application to be hosted online.

Quick Read (beta)

loading the full paper ...