Temporal Consistency for LLM Reasoning Process Error Identification

Abstract

Verification is crucial for effective mathematical reasoning. We present anew temporal consistency method where verifiers iteratively refine theirjudgments based on the previous assessment. Unlike one-round verification ormulti-model debate approaches, our method leverages consistency in a sequenceof self-reflection actions to improve verification accuracy. Empiricalevaluations across diverse mathematical process error identification benchmarks(Mathcheck, ProcessBench, and PRM800K) show consistent performance improvementsover baseline methods. When applied to the recent DeepSeek R1 distilled models,our method demonstrates strong performance, enabling 7B/8B distilled models tooutperform all 70B/72B models and GPT-4o on ProcessBench. Notably, thedistilled 14B model with our method achieves performance comparable toDeepseek-R1. Our codes are available athttps://github.com/jcguo123/Temporal-Consistency

Quick Read (beta)

loading the full paper ...