TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge

Abstract

The LLM-as-a-judge paradigm uses large language models (LLMs) for automatedtext evaluation, where a numerical assessment is assigned by an LLM to theinput text following scoring rubrics. Existing methods for LLM-as-a-judge usecross-entropy (CE) loss for fine-tuning, which neglects the numeric nature ofscore prediction. Recent work addresses numerical prediction limitations of LLMfine-tuning through regression-aware fine-tuning, which, however, does notconsider chain-of-thought (CoT) reasoning for score prediction. In this paper,we introduce TRACT (Two-stage Regression-Aware fine-tuning with CoT), a methodcombining CoT reasoning with regression-aware training. TRACT consists of twostages: first, seed LLM is fine-tuned to generate CoTs, which serve assupervision for the second stage fine-tuning. The training objective of TRACTcombines the CE loss for learning the CoT reasoning capabilities, and theregression-aware loss for the score prediction. Experiments across fourLLM-as-a-judge datasets and two LLMs show that TRACT significantly outperformsexisting methods. Extensive ablation studies validate the importance of eachcomponent in TRACT.

Quick Read (beta)

loading the full paper ...