Abstract
We tackle the problem of identifying metaphors in text, treated as a sequencetagging task. The pre-trained word embeddings GloVe, ELMo and BERT haveindividually shown good performance on sequential metaphor identification.These embeddings are generated by different models, training targets andcorpora, thus encoding different semantic and syntactic information. We showthat leveraging GloVe, ELMo and feature-based BERT based on a multi-channel CNNand a Bidirectional LSTM model can significantly outperform any single wordembedding method and the combination of the two embeddings. Incorporatinglinguistic features into our model can further improve model performance,yielding state-of-the-art performance on three public metaphor datasets. Wealso provide in-depth analysis on the effectiveness of leveraging multiple wordembeddings, including analysing the spatial distribution of different embeddingmethods for metaphors and literals, and showing how well the embeddingscomplement each other in different genres and parts of speech.