Abstract
We review basic concepts of convex duality, focusing on the very general andsupremely useful Fenchel-Rockafellar duality. We summarize how this duality maybe applied to a variety of reinforcement learning (RL) settings, includingpolicy evaluation or optimization, online or offline learning, and discountedor undiscounted rewards. The derivations yield a number of intriguing results,including the ability to perform policy evaluation and on-policy policygradient with behavior-agnostic offline data and methods to learn a policy viamax-likelihood optimization. Although many of these results have appearedpreviously in various forms, we provide a unified treatment and perspective onthese results, which we hope will enable researchers to better use and applythe tools of convex duality to make further progress in RL.