Updating Pre-trained Word Vectors and Text Classifiers using Monolingual Alignment

Abstract

In this paper, we focus on the problem of adapting word vector-based modelsto new textual data. Given a model pre-trained on large reference data, how canwe adapt it to a smaller piece of data with a slightly different languagedistribution? We frame the adaptation problem as a monolingual word vectoralignment problem, and simply average models after alignment. We align vectorsusing the RCSLS criterion. Our formulation results in a simple and efficientalgorithm that allows adapting general-purpose models to changing worddistributions. In our evaluation, we consider applications to word embeddingand text classification models. We show that the proposed approach yields goodperformance in all setups and outperforms a baseline consisting in fine-tuningthe model on new data.

Quick Read (beta)

loading the full paper ...