Abstract
Large Language Models (LLMs) have started to demonstrate the ability topersuade humans, yet our understanding of how this dynamic transpires islimited. Recent work has used linear probes, lightweight tools for analyzingmodel representations, to study various LLM skills such as the ability to modeluser sentiment and political perspective. Motivated by this, we apply probes tostudy persuasion dynamics in natural, multi-turn conversations. We leverageinsights from cognitive science to train probes on distinct aspects ofpersuasion: persuasion success, persuadee personality, and persuasion strategy.Despite their simplicity, we show that they capture various aspects ofpersuasion at both the sample and dataset levels. For instance, probes canidentify the point in a conversation where the persuadee was persuaded or wherepersuasive success generally occurs across the entire dataset. We also showthat in addition to being faster than expensive prompting-based approaches,probes can do just as well and even outperform prompting in some settings, suchas when uncovering persuasion strategy. This suggests probes as a plausibleavenue for studying other complex behaviours such as deception andmanipulation, especially in multi-turn settings and large-scale datasetanalysis where prompting-based methods would be computationally inefficient.