Automated Lay Language Summarization of Biomedical Scientific Reviews

Abstract

Health literacy has emerged as a crucial factor in making appropriate healthdecisions and ensuring treatment outcomes. However, medical jargon and thecomplex structure of professional language in this domain make healthinformation especially hard to interpret. Thus, there is an urgent unmet needfor automated methods to enhance the accessibility of the biomedical literatureto the general population. This problem can be framed as a type of translationproblem between the language of healthcare professionals, and that of thegeneral public. In this paper, we introduce the novel task of automatedgeneration of lay language summaries of biomedical scientific reviews, andconstruct a dataset to support the development and evaluation of automatedmethods through which to enhance the accessibility of the biomedicalliterature. We conduct analyses of the various challenges in solving this task,including not only summarization of the key points but also explanation ofbackground knowledge and simplification of professional language. We experimentwith state-of-the-art summarization models as well as several data augmentationtechniques, and evaluate their performance using both automated metrics andhuman assessment. Results indicate that automatically generated summariesproduced using contemporary neural architectures can achieve promising qualityand readability as compared with reference summaries developed for the laypublic by experts (best ROUGE-L of 50.24 and Flesch-Kincaid readability scoreof 13.30). We also discuss the limitations of the current attempt, providinginsights and directions for future work.

Quick Read (beta)

loading the full paper ...