An innovative solution for breast cancer textual big data analysis

  • 2017-12-06 16:18:31
  • Nicolas Thiebaut, Antoine Simoulin, Karl Neuberger, Issam Ibnoushein, Nicolas Bousquet, Nathalie Reix, Sébastien Molière, Carole Mathelin
  • 4


The digitalization of stored information in hospitals now allows for theexploitation of medical data in text format, as electronic health records(EHRs), initially gathered for other purposes than epidemiology. Manual searchand analysis operations on such data become tedious. In recent years, the useof natural language processing (NLP) tools was highlighted to automatize theextraction of information contained in EHRs, structure it and performstatistical analysis on this structured information. The main difficulties withthe existing approaches is the requirement of synonyms or ontologydictionaries, that are mostly available in English only and do not includelocal or custom notations. In this work, a team composed of oncologists asdomain experts and data scientists develop a custom NLP-based system to processand structure textual clinical reports of patients suffering from breastcancer. The tool relies on the combination of standard text mining techniquesand an advanced synonym detection method. It allows for a global analysis byretrieval of indicators such as medical history, tumor characteristics,therapeutic responses, recurrences and prognosis. The versatility of the methodallows to obtain easily new indicators, thus opening up the way forretrospective studies with a substantial reduction of the amount of manualwork. With no need for biomedical annotators or pre-defined ontologies, thislanguage-agnostic method reached an good extraction accuracy for severalconcepts of interest, according to a comparison with a manually structuredfile, without requiring any existing corpus with local or new notations.


