StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Abstract

Recently, the pre-trained language model, BERT (and its robustly optimizedversion RoBERTa), has attracted a lot of attention in natural languageunderstanding (NLU), and achieved state-of-the-art accuracy in various NLUtasks, such as sentiment classification, natural language inference, semantictextual similarity and question answering. Inspired by the linearizationexploration work of Elman [8], we extend BERT to a new model, StructBERT, byincorporating language structures into pre-training. Specifically, we pre-trainStructBERT with two auxiliary tasks to make the most of the sequential order ofwords and sentences, which leverage language structures at the word andsentence levels, respectively. As a result, the new model is adapted todifferent levels of language understanding required by downstream tasks. TheStructBERT with structural pre-training gives surprisingly good empiricalresults on a variety of downstream tasks, including pushing thestate-of-the-art on the GLUE benchmark to 89.0 (outperforming all publishedmodels), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy onSNLI to 91.7.

Quick Read (beta)

loading the full paper ...