UniBERT: Adversarial Training for Language-Universal Representations

Abstract

This paper presents UniBERT, a compact multilingual language model thatleverages an innovative training framework integrating three components: maskedlanguage modeling, adversarial training, and knowledge distillation.Pre-trained on a meticulously curated Wikipedia corpus spanning 107 languages,UniBERT is designed to reduce the computational demands of large-scale modelswhile maintaining competitive performance across various natural languageprocessing tasks. Comprehensive evaluations on four tasks -- named entityrecognition, natural language inference, question answering, and semantictextual similarity -- demonstrate that our multilingual training strategyenhanced by an adversarial objective significantly improves cross-lingualgeneralization. Specifically, UniBERT models show an average relativeimprovement of 7.72% over traditional baselines, which achieved an averagerelative improvement of only 1.17%, with statistical analysis confirming thesignificance of these gains (p-value = 0.0181). This work highlights thebenefits of combining adversarial training and knowledge distillation to buildscalable and robust language models, thereby advancing the field ofmultilingual and cross-lingual natural language processing.

Quick Read (beta)

loading the full paper ...