Abstract
We demonstrate that explicitly aligning the pretraining objectives to thefinetuning objectives in language model training significantly improves thefinetuning task performance and reduces the minimum amount of finetuningexamples required. The performance margin gained from objective alignmentallows us to build language models with smaller sizes for tasks with lessavailable training data. We provide empirical evidence of these claims byapplying objective alignment to concept-of-interest tagging and acronymdetection tasks. We found that, with objective alignment, our 768 by 3 and 512by 3 transformer language models can reach accuracy of 83.9%/82.5% forconcept-of-interest tagging and 73.8%/70.2% for acronym detection using only200 finetuning examples per task, outperforming the 768 by 3 model pretrainedwithout objective alignment by +4.8%/+3.4% and +9.9%/+6.3%. We name finetuningsmall language models in the presence of hundreds of training examples or less"Few Example learning". In practice, Few Example Learning enabled by objectivealignment not only saves human labeling costs, but also makes it possible toleverage language models in more real-time applications.