MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue

  • 2022-12-20 17:34:25
  • Nikita Moghe, Evgeniia Razumovskaia, Liane Guillou, Ivan Vulić, Anna Korhonen, Alexandra Birch
  • 0

Abstract

Task-oriented dialogue (TOD) systems have been applied in a range of domainsto support human users to achieve specific goals. Systems are typicallyconstructed for a single domain or language and do not generalise well beyondthis. Their extension to other languages in particular is restricted by thelack of available training data for many of the world's languages. To supportwork on Natural Language Understanding (NLU) in TOD across multiple languagesand domains simultaneously, we constructed MULTI3NLU++, a multilingual,multi-intent, multi-domain dataset. MULTI3NLU++ extends the English-only NLU++dataset to include manual translations into a range of high, medium and lowresource languages (Spanish, Marathi, Turkish and Amharic), in two domains(banking and hotels). MULTI3NLU++ inherits the multi-intent property of NLU++,where an utterance may be labelled with multiple intents, providing a morerealistic representation of a user's goals and aligning with the more complextasks that commercial systems aim to model. We use MULTI3NLU++ to benchmarkstate-of-the-art multilingual language models as well as Machine Translationand Question Answering systems for the NLU task of intent detection for TODsystems in the multilingual setting. The results demonstrate the challengingnature of the dataset, particularly in the low-resource language setting.

 

Quick Read (beta)

loading the full paper ...