Behavior Constraining in Weight Space for Offline Reinforcement Learning

  • 2021-07-12 14:50:50
  • Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler
  • 0


In offline reinforcement learning, a policy needs to be learned from a singlepre-collected dataset. Typically, policies are thus regularized during trainingto behave similarly to the data generating policy, by adding a penalty based ona divergence between action distributions of generating and trained policy. Wepropose a new algorithm, which constrains the policy directly in its weightspace instead, and demonstrate its effectiveness in experiments.


Quick Read (beta)

loading the full paper ...