Abstract
Complex Word Identification (CWI) is the task of identifying which words orphrases in a sentence are difficult to understand by a target audience. Thelatest CWI Shared Task released data for two settings: monolingual (i.e. trainand test in the same language) and cross-lingual (i.e. test in a language notseen during training). The best monolingual models relied on language-dependentfeatures, which do not generalise in the cross-lingual setting, while the bestcross-lingual model used neural networks with multi-task learning. In thispaper, we present monolingual and cross-lingual CWI models that perform as wellas (or better than) most models submitted to the latest CWI Shared Task. Weshow that carefully selected features and simple learning models can achievestate-of-the-art performance, and result in strong baselines for futuredevelopment in this area. Finally, we discuss how inconsistencies in theannotation of the data can explain some of the results obtained.