When "A Helpful Assistant" Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models

Abstract

Prompting serves as the major way humans interact with Large Language Models(LLM). Commercial AI systems commonly define the role of the LLM in systemprompts. For example, ChatGPT uses ``You are a helpful assistant'' as part ofits default system prompt. Despite current practices of adding personas tosystem prompts, it remains unclear how different personas affect a model'sperformance on objective tasks. In this study, we present a systematicevaluation of personas in system prompts. We curate a list of 162 rolescovering 6 types of interpersonal relationships and 8 domains of expertise.Through extensive analysis of 4 popular families of LLMs and 2,410 factualquestions, we demonstrate that adding personas in system prompts does notimprove model performance across a range of questions compared to the controlsetting where no persona is added. Nevertheless, further analysis suggests thatthe gender, type, and domain of the persona can all influence the resultingprediction accuracies. We further experimented with a list of persona searchstrategies and found that, while aggregating results from the best persona foreach question significantly improves prediction accuracy, automaticallyidentifying the best persona is challenging, with predictions often performingno better than random selection. Overall, our findings suggest that whileadding a persona may lead to performance gains in certain settings, the effectof each persona can be largely random. Code and data are available athttps://github.com/Jiaxin-Pei/Prompting-with-Social-Roles.

Quick Read (beta)

loading the full paper ...