SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Abstract

We introduce SelfCite, a novel self-supervised approach that aligns LLMs togenerate high-quality, fine-grained, sentence-level citations for thestatements in their generated responses. Instead of only relying on costly andlabor-intensive annotations, SelfCite leverages a reward signal provided by theLLM itself through context ablation: If a citation is necessary, removing thecited text from the context should prevent the same response; if sufficient,retaining the cited text alone should preserve the same response. This rewardcan guide the inference-time best-of-N sampling strategy to improve citationquality significantly, as well as be used in preference optimization todirectly fine-tune the models for generating better citations. Theeffectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3points on the LongBench-Cite benchmark across five long-form question answeringtasks.

Quick Read (beta)

loading the full paper ...