LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Abstract

Recent advancements in Large Language Models (LLMs) have enabled them toapproach human-level persuasion capabilities. However, such potential alsoraises concerns about the safety risks of LLM-driven persuasion, particularlytheir potential for unethical influence through manipulation, deception,exploitation of vulnerabilities, and many other harmful tactics. In this work,we present a systematic investigation of LLM persuasion safety through twocritical aspects: (1) whether LLMs appropriately reject unethical persuasiontasks and avoid unethical strategies during execution, including cases wherethe initial persuasion goal appears ethically neutral, and (2) how influencingfactors like personality traits and external pressures affect their behavior.To this end, we introduce PersuSafety, the first comprehensive framework forthe assessment of persuasion safety which consists of three stages, i.e.,persuasion scene creation, persuasive conversation simulation, and persuasionsafety assessment. PersuSafety covers 6 diverse unethical persuasion topics and15 common unethical strategies. Through extensive experiments across 8 widelyused LLMs, we observe significant safety concerns in most LLMs, includingfailing to identify harmful persuasion tasks and leveraging various unethicalpersuasion strategies. Our study calls for more attention to improve safetyalignment in progressive and goal-driven conversations such as persuasion.

Quick Read (beta)

loading the full paper ...