Abstract
Researchers worldwide are seeking to repurpose existing drugs or discover newdrugs to counter the disease caused by severe acute respiratory syndromecoronavirus 2 (SARS-CoV-2). A promising source of candidates for such studiesis molecules that have been reported in the scientific literature to bedrug-like in the context of coronavirus research. We report here on a projectthat leverages both human and artificial intelligence to detect references todrug-like molecules in free text. We engage non-expert humans to create acorpus of labeled text, use this labeled corpus to train a named entityrecognition model, and employ the trained model to extract 10912 drug-likemolecules from the COVID-19 Open Research Dataset Challenge (CORD-19) corpus of198875 papers. Performance analyses show that our automated extraction modelcan achieve performance on par with that of non-expert humans.