Larger Language Models Don't Care How You Think: Why Chain-of-Thought Prompting Fails in Subjective Tasks

Abstract

In-Context Learning (ICL) in Large Language Models (LLM) has emerged as thedominant technique for performing natural language tasks, as it does notrequire updating the model parameters with gradient-based methods. ICL promisesto "adapt" the LLM to perform the present task at a competitive orstate-of-the-art level at a fraction of the computational cost. ICL can beaugmented by incorporating the reasoning process to arrive at the final labelexplicitly in the prompt, a technique called Chain-of-Thought (CoT) prompting.However, recent work has found that ICL relies mostly on the retrieval of taskpriors and less so on "learning" to perform tasks, especially for complexsubjective domains like emotion and morality, where priors ossify posteriorpredictions. In this work, we examine whether "enabling" reasoning also createsthe same behavior in LLMs, wherein the format of CoT retrieves reasoning priorsthat remain relatively unchanged despite the evidence in the prompt. We findthat, surprisingly, CoT indeed suffers from the same posterior collapse as ICLfor larger language models. Code is avalaible athttps://github.com/gchochla/cot-priors.

Quick Read (beta)

loading the full paper ...