Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

  • 2021-10-06 02:46:48
  • Ningyu Zhang, Luoqiu Li, Xiang Chen, Shumin Deng, Zhen Bi, Chuanqi Tan, Fei Huang, Huajun Chen
  • 0

Abstract

Large-scale pre-trained language models have contributed significantly tonatural language processing by demonstrating remarkable abilities as few-shotlearners. However, their effectiveness depends mainly on scaling the modelparameters and prompt design, hindering their implementation in most real-worldapplications. This study proposes a novel pluggable, extensible, and efficientapproach named DifferentiAble pRompT (DART), which can convert small languagemodels into better few-shot learners without any prompt engineering. The mainprinciple behind this approach involves reformulating potential naturallanguage processing tasks into the task of a pre-trained language model anddifferentially optimizing the prompt template as well as the target label withbackpropagation. Furthermore, the proposed approach can be: (i) Plugged to anypre-trained language models; (ii) Extended to widespread classification tasks.A comprehensive evaluation of standard NLP tasks demonstrates that the proposedapproach achieves a better few-shot performance.

 

Quick Read (beta)

loading the full paper ...