Self-Taught Evaluators - Paper Detail

Abstract

Model-based evaluation is at the heart of successful model development -- asa reward model for training, and as a replacement for human evaluation. Totrain such evaluators, the standard approach is to collect a large amount ofhuman preference judgments over model responses, which is costly and the databecomes stale as models improve. In this work, we present an approach that aimsto im-prove evaluators without human annotations, using synthetic training dataonly. Starting from unlabeled instructions, our iterative self-improvementscheme generates contrasting model outputs and trains an LLM-as-a-Judge toproduce reasoning traces and final judgments, repeating this training at eachnew iteration using the improved predictions. Without any labeled preferencedata, our Self-Taught Evaluator can improve a strong LLM (Llama3-70B-Instruct)from 75.4 to 88.3 (88.7 with majority vote) on RewardBench. This outperformscommonly used LLM judges such as GPT-4 and matches the performance of thetop-performing reward models trained with labeled examples.

Quick Read (beta)

loading the full paper ...