Prompt Compression for Large Language Models: A Survey

Abstract

Leveraging large language models (LLMs) for complex natural language taskstypically requires long-form prompts to convey detailed requirements andinformation, which results in increased memory usage and inference costs. Tomitigate these challenges, multiple efficient methods have been proposed, withprompt compression gaining significant research interest. This survey providesan overview of prompt compression techniques, categorized into hard promptmethods and soft prompt methods. First, the technical approaches of thesemethods are compared, followed by an exploration of various ways to understandtheir mechanisms, including the perspectives of attention optimization,Parameter-Efficient Fine-Tuning (PEFT), modality integration, and new syntheticlanguage. We also examine the downstream adaptations of various promptcompression techniques. Finally, the limitations of current prompt compressionmethods are analyzed, and several future directions are outlined, such asoptimizing the compression encoder, combining hard and soft prompts methods,and leveraging insights from multimodality.

Quick Read (beta)

loading the full paper ...