Abstract
We consider retrofitting structure-aware Transformer-based language model forfacilitating end tasks by proposing to exploit syntactic distance to encodeboth the phrasal constituency and dependency connection into the languagemodel. A middle-layer structural learning strategy is leveraged for structureintegration, accomplished with main semantic task training under multi-tasklearning scheme. Experimental results show that the retrofitted structure-awareTransformer language model achieves improved perplexity, meanwhile inducingaccurate syntactic phrases. By performing structure-aware fine-tuning, ourmodel achieves significant improvements for both semantic- andsyntactic-dependent tasks.
Quick Read (beta)
loading the full paper ...