Abstract
Traditional Retrieval-Augmented Generation (RAG) struggles with complexqueries that lack strong signals to retrieve the most relevant context, forcinga trade-off between choosing a small context that misses key information and alarge context that confuses the LLM. To address this, we proposeForward-Backward RAG (FB-RAG), a new training-free framework based on a simpleyet powerful forward-looking strategy. FB-RAG employs a light-weight LLM topeek into potential future generations, using evidence from multiple sampledoutputs to precisely identify the most relevant context for a final, morepowerful generator. This improves performance without complex finetuning orReinforcement Learning common in prior work. Across 9 datasets, FB-RAGconsistently delivers strong results. Further, the performance gains can beachieved with reduced latency due to a shorter, more focused prompt for thepowerful generator. On EN.QA dataset, FB-RAG matches the leading baseline withover 48% latency reduction or achieves an 8% performance improvement with a 10%latency reduction. Our analysis finds cases where even when the forward-lookingLLM fails to generate correct answers, its attempts are sufficient to guide thefinal model to an accurate response, demonstrating how smaller LLMs cansystematically improve the performance and efficiency of larger ones.