Abstract
Many recent advances in natural language generation have been fueled bytraining large language models on internet-scale data. However, this paradigmcan lead to models that generate toxic, inaccurate, and unhelpful content, andautomatic evaluation metrics often fail to identify these behaviors. As modelsbecome more capable, human feedback is an invaluable signal for evaluating andimproving models. This survey aims to provide an overview of the recentresearch that has leveraged human feedback to improve natural languagegeneration. First, we introduce an encompassing formalization of feedback, andidentify and organize existing research into a taxonomy following thisformalization. Next, we discuss how feedback can be described by its format andobjective, and cover the two approaches proposed to use feedback (either fortraining or decoding): directly using the feedback or training feedback models.We also discuss existing datasets for human-feedback data collection, andconcerns surrounding feedback collection. Finally, we provide an overview ofthe nascent field of AI feedback, which exploits large language models to makejudgments based on a set of principles and minimize the need for humanintervention.