ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction

Abstract

Recent efforts in LLM alignment have focused on constructing large-scalepreference datasets via human or Artificial Intelligence (AI) annotators.However, such approaches rely on instance-wise supervision, incurringsubstantial annotation cost and limited interpretability. In this paper, wepropose ZEBRA - a model behavior-wise zero-annotation framework that constructspreference data by leveraging model behavior knowledge derived from benchmarkperformances. ZEBRA binarizes response pairs by evaluating the quality andsimilarity of their origin models, entirely bypassing instance-levelannotation. This allows scalable, controllable, and cost-effective alignmentdata generation. Empirical results show that ZEBRA achieves alignmentperformance comparable to instance-supervised methods, despite requiring nomanual or model-based labeling.

Quick Read (beta)

loading the full paper ...