Perspectives on the Social Impacts of Reinforcement Learning with Human Feedback

Abstract

Is it possible for machines to think like humans? And if it is, how should wego about teaching them to do so? As early as 1950, Alan Turing stated that weought to teach machines in the way of teaching a child. Reinforcement learningwith human feedback (RLHF) has emerged as a strong candidate toward allowingagents to learn from human feedback in a naturalistic manner. RLHF is distinctfrom traditional reinforcement learning as it provides feedback from a humanteacher in addition to a reward signal. It has been catapulted into public viewby multiple high-profile AI applications, including OpenAI's ChatGPT,DeepMind's Sparrow, and Anthropic's Claude. These highly capable chatbots arealready overturning our understanding of how AI interacts with humanity. Thewide applicability and burgeoning success of RLHF strongly motivate the need toevaluate its social impacts. In light of recent developments, this paperconsiders an important question: can RLHF be developed and used withoutnegatively affecting human societies? Our objectives are threefold: to providea systematic study of the social effects of RLHF; to identify key social andethical issues of RLHF; and to discuss social impacts for stakeholders.Although text-based applications of RLHF have received much attention, it iscrucial to consider when evaluating its social implications the diverse rangeof areas to which it may be deployed. We describe seven primary ways in whichRLHF-based technologies will affect society by positively transforming humanexperiences with AI. This paper ultimately proposes that RLHF has potential tonet positively impact areas of misinformation, AI value-alignment, bias, AIaccess, cross-cultural dialogue, industry, and workforce. As RLHF raisesconcerns that echo those of existing AI technologies, it will be important forall to be aware and intentional in the adoption of RLHF.

Quick Read (beta)

loading the full paper ...