Small Models are Valuable Plug-ins for Large Language Models

Abstract

Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but theirweights are often publicly unavailable and their immense sizes make the modelsdifficult to be tuned with common hardware. As a result, effectively tuningthese models with large-scale supervised data can be challenging. As analternative, In-Context Learning (ICL) can only use a small number ofsupervised examples due to context length limits. In this paper, we proposeSuper In-Context Learning (SuperICL) which allows black-box LLMs to work withlocally fine-tuned smaller models, resulting in superior performance onsupervised tasks. Our experiments demonstrate that SuperICL can improveperformance beyond state-of-the-art fine-tuned models while addressing theinstability problem of in-context learning. Furthermore, SuperICL can enhancethe capabilities of smaller models, such as multilinguality andinterpretability.

Quick Read (beta)

loading the full paper ...