Deepening Hidden Representations from Pre-trained Language Models for Natural Language Understanding

Abstract

Transformer-based pre-trained language models have proven to be effective forlearning contextualized language representation. However, current approachesonly take advantage of the output of the encoder's final layer when fine-tuningthe downstream tasks. We argue that only taking single layer's output restrictsthe power of pre-trained representation. Thus we deepen the representationlearned by the model by fusing the hidden representation in terms of anexplicit HIdden Representation Extractor (HIRE), which automatically absorbsthe complementary representation with respect to the output from the finallayer. Utilizing RoBERTa as the backbone encoder, our proposed improvement overthe pre-trained models is shown effective on multiple natural languageunderstanding tasks and help our model rival with the state-of-the-art modelson the GLUE benchmark.

Quick Read (beta)

loading the full paper ...