Abstract
Retrieval-Augmented Generation (RAG) effectively improves the accuracy ofLarge Language Models (LLMs). However, retrieval noises significantly impactthe quality of LLMs' generation, necessitating the development of denoisingmechanisms. Previous methods extract evidence straightforwardly withoutexplicit thinking, which risks filtering out key clues and struggles withgeneralization. To this end, we propose LEAR, which learns to extract rationalevidence by (1) explicitly reasoning to identify potential cues withinretrieval contents first, and then (2) consciously extracting to avoid omittingany key cues helpful for answering questions. Specifically, we frame evidencereasoning and evidence extraction into one unified response for end-to-endtraining; apply knowledge token masks for disentanglement to derivereasoning-based and extraction-based answers; and devise three types ofverifiable reward functions, including answer, length, and format, to updatethe model via the policy optimization algorithm. Extensive experiments on threebenchmark datasets show the effectiveness of LEAR, providing compact andhigh-quality evidence, improving the accuracy of downstream tasks, andpromoting effective application in online RAG systems.