HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Abstract

Preference datasets are essential for training general-domain,instruction-following language models with Reinforcement Learning from HumanFeedback (RLHF). Each subsequent data release raises expectations for futuredata collection, meaning there is a constant need to advance the quality anddiversity of openly available preference data. To address this need, weintroduce HelpSteer3-Preference, a permissively licensed (CC-BY-4.0),high-quality, human-annotated preference dataset comprising of over 40,000samples. These samples span diverse real-world applications of large languagemodels (LLMs), including tasks relating to STEM, coding and multilingualscenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) thatachieve top performance on RM-Bench (82.4%) and JudgeBench (73.7%). Thisrepresents a substantial improvement (~10% absolute) over the previouslybest-reported results from existing RMs. We demonstrate HelpSteer3-Preferencecan also be applied to train Generative RMs and how policy models can bealigned with RLHF using our RMs. Dataset (CC-BY-4.0):https://huggingface.co/datasets/nvidia/HelpSteer3#preference

Quick Read (beta)

loading the full paper ...