Abstract
Large language models (LLMs) typically utilize the top-k contexts from aretriever in retrieval-augmented generation (RAG). In this work, we propose anovel instruction fine-tuning framework RankRAG, which instruction-tunes asingle LLM for the dual purpose of context ranking and answer generation inRAG. In particular, the instruction-tuned LLMs work surprisingly well by addinga small fraction of ranking data into the training blend, and outperformexisting expert ranking models, including the same LLM exclusively fine-tunedon a large amount of ranking data. For generation, we compare our model withmany strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, andChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAGbenchmarks. Specifically, our Llama3-RankRAG significantly outperformsLlama3-ChatQA-1.5 and GPT-4 models on nine knowledge-intensive benchmarks. Inaddition, it also performs comparably to GPT-4 on five RAG benchmarks in thebiomedical domain without instruction fine-tuning on biomedical data,demonstrating its superb capability for generalization to new domains.