Multilingual Question Answering from Formatted Text applied to Conversational Agents

Abstract

Recent advances in NLP with language models such as BERT, GPT-2, XLNet orXLM, have allowed surpassing human performance on Reading Comprehension taskson large-scale datasets (e.g. SQuAD), and this opens up many perspectives forConversational AI. However, task-specific datasets are mostly in English whichmakes it difficult to acknowledge progress in foreign languages. Fortunately,state-of-the-art models are now being pre-trained on multiple languages (e.g.BERT was released in a multilingual version managing a hundred languages) andare exhibiting ability for zero-shot transfer from English to others languageson XNLI. In this paper, we run experiments that show that multilingual BERT,trained to solve the complex Question Answering task defined in the EnglishSQuAD dataset, is able to achieve the same task in Japanese and French. It evenoutperforms the best published results of a baseline which explicitly combinesan English model for Reading Comprehension and a Machine Translation Model fortransfer. We run further tests on crafted cross-lingual QA datasets (context inone language and question in another) to provide intuition on the mechanismsthat allow BERT to transfer the task from one language to another. Finally, weintroduce our application Kate. Kate is a conversational agent dedicated to HRsupport for employees that exploits multilingual models to accurately answer toquestions, in several languages, directly from information web pages.

Quick Read (beta)

loading the full paper ...