Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

Abstract

Neural language representation models such as Bidirectional EncoderRepresentations from Transformers (BERT) pre-trained on large-scale corpora canwell capture rich semantics from plain text, and can be fine-tuned toconsistently improve the performance on various natural language processing(NLP) tasks. However, the existing pre-trained language representation modelsrarely consider explicitly incorporating commonsense knowledge or otherknowledge. In this paper, we develop a pre-training approach for incorporatingcommonsense knowledge into language representation models. We construct acommonsense-related multi-choice question answering dataset for pre-training aneural language representation model. The dataset is created automatically byour proposed "align, mask, and select" (AMS) method. We also investigatedifferent pre-training tasks. Experimental results demonstrate thatpre-training models using the proposed approach followed by fine-tuningachieves significant improvements on various commonsense-related tasks, such asCommonsenseQA and Winograd Schema Challenge, while maintaining comparableperformance on other NLP tasks, such as sentence classification and naturallanguage inference (NLI) tasks, compared to the original BERT models.

Quick Read (beta)

loading the full paper ...