A Short Survey on Sense-Annotated Corpora for Diverse Languages and Resources

Abstract

With the advancement of research in word sense disambiguation and deeplearning, large sense-annotated datasets are increasingly important fortraining supervised systems. However, gathering high-quality sense-annotateddata for as many instances as possible is an arduous task. This has led to theproliferation of automatic and semi-automatic methods for overcoming theso-called knowledge-acquisition bottleneck. In this paper we present anoverview of currently available sense-annotated corpora, both manually andautomatically constructed, for various languages and resources (i.e. WordNet,Wikipedia, BabelNet). General statistics and specific features of eachsense-annotated dataset are also provided.

Quick Read (beta)

loading the full paper ...