Overview for the Second Shared Task on Language Identification in Code-Switched Data

Abstract

We present an overview of the second shared task on language identificationin code-switched data. For the shared task, we had code-switched data from twodifferent language pairs: Modern Standard Arabic-Dialectal Arabic (MSA-DA) andSpanish-English (SPA-ENG). We had a total of nine participating teams, with allteams submitting a system for SPA-ENG and four submitting for MSA-DA. Throughevaluation, we found that once again language identification is more difficultfor the language pair that is more closely related. We also found that thisyear's systems performed better overall than the systems from the previousshared task indicating overall progress in the state of the art for this task.

Quick Read (beta)

loading the full paper ...