Internal Consistency and Self-Feedback in Large Language Models: A Survey

Abstract

Large language models (LLMs) are expected to respond accurately but oftenexhibit deficient reasoning or generate hallucinatory content. To addressthese, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve,and Self-Refine have been initiated. They share a commonality: involving LLMsevaluating and updating itself to mitigate the issues. Nonetheless, theseefforts lack a unified perspective on summarization, as existing surveyspredominantly focus on categorization without examining the motivations behindthese works. In this paper, we summarize a theoretical framework, termed InternalConsistency, which offers unified explanations for phenomena such as the lackof reasoning and the presence of hallucinations. Internal Consistency assessesthe coherence among LLMs' latent layer, decoding layer, and response layerbased on sampling methodologies. Expanding upon the Internal Consistencyframework, we introduce a streamlined yet effective theoretical frameworkcapable of mining Internal Consistency, named Self-Feedback. The Self-Feedbackframework consists of two modules: Self-Evaluation and Self-Update. Thisframework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work;summarize relevant evaluation methods and benchmarks; and delve into theconcern, ``Does Self-Feedback Really Work?'' We propose several criticalviewpoints, including the ``Hourglass Evolution of Internal Consistency'',``Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latentand Explicit Reasoning''. Furthermore, we outline promising directions forfuture research. We have open-sourced the experimental code, reference list,and statistical data, available at\url{https://github.com/IAAR-Shanghai/ICSFSurvey}.

Quick Read (beta)

loading the full paper ...