CC-LEARN: Cohort-based Consistency Learning

  • 2025-06-18 18:41:28
  • Xiao Ye, Shaswat Shrivastava, Zhaonan Li, Jacob Dineen, Shijie Lu, Avneet Ahuja, Ming Shen, Zhikun Xu, Ben Zhou
  • 0

Abstract

Large language models excel at many tasks but still struggle with consistent,robust reasoning. We introduce Cohort-based Consistency Learning (CC-Learn), areinforcement learning framework that improves the reliability of LLM reasoningby training on cohorts of similar questions derived from shared programmaticabstractions. To enforce cohort-level consistency, we define a compositeobjective combining cohort accuracy, a retrieval bonus for effective problemdecomposition, and a rejection penalty for trivial or invalid lookups thatreinforcement learning can directly optimize, unlike supervised fine-tuning.Optimizing this reward guides the model to adopt uniform reasoning patternsacross all cohort members. Experiments on challenging reasoning benchmarks(including ARC-Challenge and StrategyQA) show that CC-Learn boosts bothaccuracy and reasoning stability over pretrained and SFT baselines. Theseresults demonstrate that cohort-level RL effectively enhances reasoningconsistency in LLMs.

 

Quick Read (beta)

loading the full paper ...