Abstract
Language model pre-training has proven to be useful in many languageunderstanding tasks. In this paper, we investigate whether it is still helpfulto add the specific task's loss in pre-training step. In industry NLPapplications, we have large amount of data produced by users. We use thefine-tuned model to give the user-generated unlabeled data a pseudo-label. Thenwe use the pseudo-label for the task-specific loss and masked language modelloss to pre-train. The experiment shows that using the fine-tuned model'spredictions for pseudo-labeled pre-training offers further gains in thedownstream task. The improvement of our method is stable and remarkable.
Quick Read (beta)
loading the full paper ...