MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

  • 2022-11-11 02:09:51
  • Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, Thien Huu Nguyen
  • 2


Event Detection (ED) is the task of identifying and classifying trigger wordsof event mentions in text. Despite considerable research efforts in recentyears for English text, the task of ED in other languages has beensignificantly less explored. Switching to non-English languages, importantresearch questions for ED include how well existing ED models perform ondifferent languages, how challenging ED is in other languages, and how well EDknowledge and annotation can be transferred across languages. To answer thosequestions, it is crucial to obtain multilingual ED datasets that provideconsistent event annotation for multiple languages. There exist somemultilingual ED datasets; however, they tend to cover a handful of languagesand mainly focus on popular ones. Many languages are not covered in existingmultilingual ED datasets. In addition, the current datasets are often small andnot accessible to the public. To overcome those shortcomings, we introduce anew large-scale multilingual dataset for ED (called MINION) that consistentlyannotates events for 8 different languages; 5 of them have not been supportedby existing multilingual datasets. We also perform extensive experiments andanalysis to demonstrate the challenges and transferability of ED acrosslanguages in MINION that in all call for more research effort in this area.


