Checklists Are Better Than Reward Models For Aligning Language Models

Abstract

Language models must be adapted to understand and follow user instructions.Reinforcement learning is widely used to facilitate this -- typically usingfixed criteria such as "helpfulness" and "harmfulness". In our work, we insteadpropose using flexible, instruction-specific criteria as a means of broadeningthe impact that reinforcement learning can have in eliciting instructionfollowing. We propose "Reinforcement Learning from Checklist Feedback" (RLCF).From instructions, we extract checklists and evaluate how well responsessatisfy each item - using both AI judges and specialized verifier programs -then combine these scores to compute rewards for RL. We compare RLCF with otheralignment methods applied to a strong instruction following model(Qwen2.5-7B-Instruct) on five widely-studied benchmarks -- RLCF is the onlymethod to improve performance on every benchmark, including a 4-point boost inhard satisfaction rate on FollowBench, a 6-point increase on InFoBench, and a3-point rise in win rate on Arena-Hard. These results establish checklistfeedback as a key tool for improving language models' support of queries thatexpress a multitude of needs.

Quick Read (beta)

loading the full paper ...