Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Abstract

Large-scale pre-trained language models have contributed significantly tonatural language processing by demonstrating remarkable abilities as few-shotlearners. However, their effectiveness depends mainly on scaling the modelparameters and prompt design, hindering their implementation in most real-worldapplications. This study proposes a novel pluggable, extensible, and efficientapproach named DifferentiAble pRompT (DART), which can convert small languagemodels into better few-shot learners without any prompt engineering. The mainprinciple behind this approach involves reformulating potential naturallanguage processing tasks into the task of a pre-trained language model anddifferentially optimizing the prompt template as well as the target label withbackpropagation. Furthermore, the proposed approach can be: (i) Plugged to anypre-trained language models; (ii) Extended to widespread classification tasks.A comprehensive evaluation of standard NLP tasks demonstrates that the proposedapproach achieves a better few-shot performance.

Quick Read (beta)

loading the full paper ...