Natural language understanding for task oriented dialog in the biomedical domain in a low resources context

Abstract

In the biomedical domain, the lack of sharable datasets often limit thepossibility of developing natural language processing systems, especiallydialogue applications and natural language understanding models. To overcomethis issue, we explore data generation using templates and terminologies anddata augmentation approaches. Namely, we report our experiments usingparaphrasing and word representations learned on a large EHR corpus withFasttext and ELMo, to learn a NLU model without any available dataset. Weevaluate on a NLU task of natural language queries in EHRs divided inslot-filling and intent classification sub-tasks. On the slot-filling task, weobtain a F-score of 0.76 with the ELMo representation; and on theclassification task, a mean F-score of 0.71. Our results show that this methodcould be used to develop a baseline system.

Quick Read (beta)

loading the full paper ...