Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language

Abstract

We are exposed to much information trying to influence us, such as teasermessages, debates, politically framed news, and propaganda - all of which usepersuasive language. With the recent interest in Large Language Models (LLMs),we study the ability of LLMs to produce persuasive text. As opposed to priorwork which focuses on particular domains or types of persuasion, we conduct ageneral study across various domains to measure and benchmark to what degreeLLMs produce persuasive language - both when explicitly instructed to rewritetext to be more or less persuasive and when only instructed to paraphrase. Weconstruct the new dataset Persuasive-Pairs of pairs of a short text and itsrewrite by an LLM to amplify or diminish persuasive language. We multi-annotatethe pairs on a relative scale for persuasive language: a valuable resource initself, and for training a regression model to score and benchmark persuasivelanguage, including for new LLMs across domains. In our analysis, we find thatdifferent 'personas' in LLaMA3's system prompt change persuasive languagesubstantially, even when only instructed to paraphrase.

Quick Read (beta)

loading the full paper ...