Structural Guidance for Transformer Language Models

Abstract

Transformer-based language models pre-trained on large amounts of text datahave proven remarkably successful in learning generic transferable linguisticrepresentations. Here we study whether structural guidance leads to morehuman-like systematic linguistic generalization in Transformer language modelswithout resorting to pre-training on very large amounts of data. We explore twogeneral ideas. The "Generative Parsing" idea jointly models the incrementalparse and word sequence as part of the same sequence modeling task. The"Structural Scaffold" idea guides the language model's representation viaadditional structure loss that separately predicts the incremental constituencyparse. We train the proposed models along with a vanilla Transformer languagemodel baseline on a 14 million-token and a 46 million-token subset of the BLLIPdataset, and evaluate models' syntactic generalization performances on SG TestSuites and sized BLiMP. Experiment results across two benchmarks suggestconverging evidence that generative structural supervisions can induce morerobust and humanlike linguistic generalization in Transformer language modelswithout the need for data intensive pre-training.

Quick Read (beta)

loading the full paper ...