Overcoming Overfitting in Reinforcement Learning via Gaussian Process Diffusion Policy

Abstract

One of the key challenges that Reinforcement Learning (RL) faces is itslimited capability to adapt to a change of data distribution caused byuncertainties. This challenge arises especially in RL systems using deep neuralnetworks as decision makers or policies, which are prone to overfitting afterprolonged training on fixed environments. To address this challenge, this paperproposes Gaussian Process Diffusion Policy (GPDP), a new algorithm thatintegrates diffusion models and Gaussian Process Regression (GPR) to representthe policy. GPR guides diffusion models to generate actions that maximizelearned Q-function, resembling the policy improvement in RL. Furthermore, thekernel-based nature of GPR enhances the policy's exploration efficiency underdistribution shifts at test time, increasing the chance of discovering newbehaviors and mitigating overfitting. Simulation results on the Walker2dbenchmark show that our approach outperforms state-of-the-art algorithms underdistribution shift condition by achieving around 67.74% to 123.18% improvementin the RL's objective function while maintaining comparable performance undernormal conditions.

Quick Read (beta)

loading the full paper ...