Abstract
Extracting scientific evidence from biomedical studies for clinical researchquestions (e.g., Does stem cell transplantation improve quality of life inpatients with medically refractory Crohn's disease compared to placebo?) is acrucial step in synthesising biomedical evidence. In this paper, we focus onthe task of document-level scientific evidence extraction for clinicalquestions with conflicting evidence. To support this task, we create a datasetcalled CochraneForest, leveraging forest plots from Cochrane systematicreviews. It comprises 202 annotated forest plots, associated clinical researchquestions, full texts of studies, and study-specific conclusions. Building onCochraneForest, we propose URCA (Uniform Retrieval Clustered Augmentation), aretrieval-augmented generation framework designed to tackle the uniquechallenges of evidence extraction. Our experiments show that URCA outperformsthe best existing methods by up to 10.3% in F1 score on this task. However, theresults also underscore the complexity of CochraneForest, establishing it as achallenging testbed for advancing automated evidence synthesis systems.