AI- and HPC-enabled Lead Generation for SARS-CoV-2: Models and Processes to Extract Druglike Molecules Contained in Natural Language Text

  • 2021-01-12 17:15:43
  • Zhi Hong, J. Gregory Pauloski, Logan Ward, Kyle Chard, Ben Blaiszik, Ian Foster
  • 0

Abstract

Researchers worldwide are seeking to repurpose existing drugs or discover newdrugs to counter the disease caused by severe acute respiratory syndromecoronavirus 2 (SARS-CoV-2). A promising source of candidates for such studiesis molecules that have been reported in the scientific literature to bedrug-like in the context of coronavirus research. We report here on a projectthat leverages both human and artificial intelligence to detect references todrug-like molecules in free text. We engage non-expert humans to create acorpus of labeled text, use this labeled corpus to train a named entityrecognition model, and employ the trained model to extract 10912 drug-likemolecules from the COVID-19 Open Research Dataset Challenge (CORD-19) corpus of198875 papers. Performance analyses show that our automated extraction modelcan achieve performance on par with that of non-expert humans.

 

Quick Read (beta)

loading the full paper ...