Large Language Models as Misleading Assistants in Conversation

Abstract

Large Language Models (LLMs) are able to provide assistance on a wide rangeof information-seeking tasks. However, model outputs may be misleading, whetherunintentionally or in cases of intentional deception. We investigate theability of LLMs to be deceptive in the context of providing assistance on areading comprehension task, using LLMs as proxies for human users. We compareoutcomes of (1) when the model is prompted to provide truthful assistance, (2)when it is prompted to be subtly misleading, and (3) when it is prompted toargue for an incorrect answer. Our experiments show that GPT-4 can effectivelymislead both GPT-3.5-Turbo and GPT-4, with deceptive assistants resulting in upto a 23% drop in accuracy on the task compared to when a truthful assistant isused. We also find that providing the user model with additional context fromthe passage partially mitigates the influence of the deceptive model. This workhighlights the ability of LLMs to produce misleading information and theeffects this may have in real-world situations.

Quick Read (beta)

loading the full paper ...