Abstract
Neural language representation models such as Bidirectional EncoderRepresentations from Transformers (BERT) pre-trained on large-scale corpora canwell capture rich semantics from plain text, and can be fine-tuned toconsistently improve the performance on various natural language processing(NLP) tasks. However, the existing pre-trained language representation modelsrarely consider explicitly incorporating commonsense knowledge or otherknowledge. In this paper, we develop a pre-training approach for incorporatingcommonsense knowledge into language representation models. We construct acommonsense-related multi-choice question answering dataset for pre-training aneural language representation model. The dataset is created automatically byour proposed "align, mask, and select" (AMS) method. We also investigatedifferent pre-training tasks. Experimental results demonstrate thatpre-training models using the proposed approach followed by fine-tuningachieves significant improvements on various commonsense-related tasks, such asCommonsenseQA and Winograd Schema Challenge, while maintaining comparableperformance on other NLP tasks, such as sentence classification and naturallanguage inference (NLI) tasks, compared to the original BERT models.