Cross-Lingual Task-Specific Representation Learning for Text Classification in Resource Poor Languages

Abstract

Neural network models have shown promising results for text classification.However, these solutions are limited by their dependence on the availability ofannotated data. The prospect of leveraging resource-rich languages to enhance the textclassification of resource-poor languages is fascinating. The performance onresource-poor languages can significantly improve if the resource availabilityconstraints can be offset. To this end, we present a twin Bidirectional LongShort Term Memory (Bi-LSTM) network with shared parameters consolidated by acontrastive loss function (based on a similarity metric). The model learns therepresentation of resource-poor and resource-rich sentences in a common spaceby using the similarity between their assigned annotation tags. Hence, themodel projects sentences with similar tags closer and those with different tagsfarther from each other. We evaluated our model on the classification tasks ofsentiment analysis and emoji prediction for resource-poor languages - Hindi andTelugu and resource-rich languages - English and Spanish. Our modelsignificantly outperforms the state-of-the-art approaches in both the tasksacross all metrics.

Quick Read (beta)

loading the full paper ...