Reinforcement Learning for Generative AI: A Survey

Abstract

Deep Generative AI has been a long-standing essential topic in the machinelearning community, which can impact a number of application areas like textgeneration and computer vision. The major paradigm to train a generative modelis maximum likelihood estimation, which pushes the learner to capture andapproximate the target data distribution by decreasing the divergence betweenthe model distribution and the target distribution. This formulationsuccessfully establishes the objective of generative tasks, while it isincapable of satisfying all the requirements that a user might expect from agenerative model. Reinforcement learning, serving as a competitive option toinject new training signals by creating new objectives that exploit novelsignals, has demonstrated its power and flexibility to incorporate humaninductive bias from multiple angles, such as adversarial learning,hand-designed rules and learned reward model to build a performant model.Thereby, reinforcement learning has become a trending research field and hasstretched the limits of generative AI in both model design and application. Itis reasonable to summarize and conclude advances in recent years with acomprehensive review. Although there are surveys in different application areasrecently, this survey aims to shed light on a high-level review that spans arange of application areas. We provide a rigorous taxonomy in this area andmake sufficient coverage on various models and applications. Notably, we alsosurveyed the fast-developing large language model area. We conclude this surveyby showing the potential directions that might tackle the limit of currentmodels and expand the frontiers for generative AI.

Quick Read (beta)

loading the full paper ...