Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models

  • 2019-10-29 19:43:05
  • Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng
  • 0

Abstract

In this work, we study how the large-scale pretrain-finetune frameworkchanges the behavior of a neural language generator. We focus on thetransformer encoder-decoder model for the open-domain dialogue responsegeneration task. We find that after standard fine-tuning, the model forgetsimportant language generation skills acquired during large-scale pre-training.We demonstrate the forgetting phenomenon through a detailed behavior analysisfrom the perspectives of context sensitivity and knowledge transfer. Adoptingthe concept of data mixing, we propose an intuitive fine-tuning strategy named"mix-review". We find that mix-review effectively regularize the fine-tuningprocess, and the forgetting problem is largely alleviated. Finally, we discussinteresting behavior of the resulting dialogue model and its implications.

 

Quick Read (beta)

loading the full paper ...