SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2)

Abstract

We present the findings of SemEval-2023 Task 2 on Fine-grained MultilingualNamed Entity Recognition (MultiCoNER 2). Divided into 13 tracks, the taskfocused on methods to identify complex fine-grained named entities (likeWRITTENWORK, VEHICLE, MUSICALGRP) across 12 languages, in both monolingual andmultilingual scenarios, as well as noisy settings. The task used the MultiCoNERV2 dataset, composed of 2.2 million instances in Bangla, Chinese, English,Farsi, French, German, Hindi, Italian., Portuguese, Spanish, Swedish, andUkrainian. MultiCoNER 2 was one of the most popular tasks of SemEval-2023. Itattracted 842 submissions from 47 teams, and 34 teams submitted system papers.Results showed that complex entity types such as media titles and product nameswere the most challenging. Methods fusing external knowledge into transformermodels achieved the best performance, and the largest gains were on theCreative Work and Group classes, which are still challenging even with externalknowledge. Some fine-grained classes proved to be more challenging than others,such as SCIENTIST, ARTWORK, and PRIVATECORP. We also observed that noisy datahas a significant impact on model performance, with an average drop of 10% onthe noisy subset. The task highlights the need for future research on improvingNER robustness on noisy data containing complex entities.

Quick Read (beta)

loading the full paper ...