Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning

Abstract

In-Context Learning (ICL) enables Large Language Models (LLMs) to performtasks by conditioning on input-output examples in the prompt, without requiringany update in model parameters. While widely adopted, it remains unclearwhether prompting with multiple examples is the most effective and efficientway to convey task information. In this work, we propose Soft Injection of taskembeddings. The task embeddings are constructed only once using few-shot ICLprompts and repeatedly used during inference. Soft injection is performed bysoftly mixing task embeddings with attention head activations usingpre-optimized mixing parameters, referred to as soft head-selection parameters.This method not only allows a desired task to be performed without in-promptdemonstrations but also significantly outperforms existing ICL approaches whilereducing memory usage and compute cost at inference time. An extensiveevaluation is performed across 57 tasks and 12 LLMs, spanning four modelfamilies of sizes from 4B to 70B. Averaged across 57 tasks, our methodoutperforms 10-shot ICL by 10.2%-14.3% across 12 LLMs. Additional analyses showthat our method also serves as an insightful tool for analyzing task-relevantroles of attention heads, revealing that task-relevant head positions selectedby our method transfer across similar tasks but not across dissimilar ones --underscoring the task-specific nature of head functionality. Our soft injectionmethod opens a new paradigm for reducing prompt length and improving taskperformance by shifting task conditioning from the prompt space to theactivation space.

Quick Read (beta)

loading the full paper ...