Abstract
Training a deep neural network to maximize a target objective has become thestandard recipe for successful machine learning over the last decade. Thesenetworks can be optimized with supervised learning, if the target objective isdifferentiable. For many interesting problems, this is however not the case.Common objectives like intersection over union (IoU), bilingual evaluationunderstudy (BLEU) score or rewards cannot be optimized with supervisedlearning. A common workaround is to define differentiable surrogate losses,leading to suboptimal solutions with respect to the actual objective.Reinforcement learning (RL) has emerged as a promising alternative foroptimizing deep neural networks to maximize non-differentiable objectives inrecent years. Examples include aligning large language models via humanfeedback, code generation, object detection or control problems. This makes RLtechniques relevant to the larger machine learning audience. The subject is,however, time intensive to approach due to the large range of methods, as wellas the often very theoretical presentation. In this introduction, we take analternative approach, different from classic reinforcement learning textbooks.Rather than focusing on tabular problems, we introduce reinforcement learningas a generalization of supervised learning, which we first apply tonon-differentiable objectives and later to temporal problems. Assuming onlybasic knowledge of supervised learning, the reader will be able to understandstate-of-the-art deep RL algorithms like proximal policy optimization (PPO)after reading this tutorial.