Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)

Abstract

The wording of natural language prompts has been shown to influence theperformance of large language models (LLMs), yet the role of politeness andtone remains underexplored. In this study, we investigate how varying levels ofprompt politeness affect model accuracy on multiple-choice questions. Wecreated a dataset of 50 base questions spanning mathematics, science, andhistory, each rewritten into five tone variants: Very Polite, Polite, Neutral,Rude, and Very Rude, yielding 250 unique prompts. Using ChatGPT 4o, weevaluated responses across these conditions and applied paired sample t-teststo assess statistical significance. Contrary to expectations, impolite promptsconsistently outperformed polite ones, with accuracy ranging from 80.8% forVery Polite prompts to 84.8% for Very Rude prompts. These findings differ fromearlier studies that associated rudeness with poorer outcomes, suggesting thatnewer LLMs may respond differently to tonal variation. Our results highlightthe importance of studying pragmatic aspects of prompting and raise broaderquestions about the social dimensions of human-AI interaction.

Quick Read (beta)

loading the full paper ...