Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning

Abstract

Offline reinforcement learning (RL) learns policies from a fixed dataset, butoften requires large amounts of data. The challenge arises when labeleddatasets are expensive, especially when rewards have to be provided by humanlabelers for large datasets. In contrast, unlabelled data tends to be lessexpensive. This situation highlights the importance of finding effective waysto use unlabelled data in offline RL, especially when labelled data is limitedor expensive to obtain. In this paper, we present the algorithm to utilize theunlabeled data in the offline RL method with kernel function approximation andgive the theoretical guarantee. We present various eigenvalue decay conditionsof $\mathcal{H}_k$ which determine the complexity of the algorithm. In summary,our work provides a promising approach for exploiting the advantages offered byunlabeled data in offline RL, whilst maintaining theoretical assurances.

Quick Read (beta)

loading the full paper ...