Simple Context Compression: Mean-Pooling and Multi-Ratio Training

Abstract

A common strategy to reduce the computational costs of using long contexts inretrieval-augmented generation (RAG) with large language models (LLMs) is softcontext compression, where the input sequence is transformed into a shortercontinuous representation. We develop a lightweight and simple mean-poolingapproach that consistently outperforms the widely used compression-tokensarchitecture, and study training the same compressor to output multiplecompression ratios. We conduct extensive experiments across in-domain andout-of-domain QA datasets, as well as across model families, scales, andcompression ratios. Overall, our simple mean-pooling approach achieves thestrongest performance, with a relatively small drop when training for multiplecompression ratios. More broadly though, across architectures and trainingregimes the trade-offs are more nuanced, illustrating the complex landscape ofcompression methods.

Quick Read (beta)

loading the full paper ...