Reinforcement Learning via Fenchel-Rockafellar Duality

  • 2020-01-07 02:59:59
  • Ofir Nachum, Bo Dai
  • 3


We review basic concepts of convex duality, focusing on the very general andsupremely useful Fenchel-Rockafellar duality. We summarize how this duality maybe applied to a variety of reinforcement learning (RL) settings, includingpolicy evaluation or optimization, online or offline learning, and discountedor undiscounted rewards. The derivations yield a number of intriguing results,including the ability to perform policy evaluation and on-policy policygradient with behavior-agnostic offline data and methods to learn a policy viamax-likelihood optimization. Although many of these results have appearedpreviously in various forms, we provide a unified treatment and perspective onthese results, which we hope will enable researchers to better use and applythe tools of convex duality to make further progress in RL.


Quick Read (beta)

loading the full paper ...