Enhancing Offline Reinforcement Learning with Curriculum Learning-Based Trajectory Valuation

Abstract

The success of deep reinforcement learning (DRL) relies on the availabilityand quality of training data, often requiring extensive interactions withspecific environments. In many real-world scenarios, where data collection iscostly and risky, offline reinforcement learning (RL) offers a solution byutilizing data collected by domain experts and searching for abatch-constrained optimal policy. This approach is further augmented byincorporating external data sources, expanding the range and diversity of datacollection possibilities. However, existing offline RL methods often strugglewith challenges posed by non-matching data from these external sources. In thiswork, we specifically address the problem of source-target domain mismatch inscenarios involving mixed datasets, characterized by a predominance of sourcedata generated from random or suboptimal policies and a limited amount oftarget data generated from higher-quality policies. To tackle this problem, weintroduce Transition Scoring (TS), a novel method that assigns scores totransitions based on their similarity to the target domain, and proposeCurriculum Learning-Based Trajectory Valuation (CLTV), which effectivelyleverages these transition scores to identify and prioritize high-qualitytrajectories through a curriculum learning approach. Our extensive experimentsacross various offline RL methods and MuJoCo environments, complemented byrigorous theoretical analysis, demonstrate that CLTV enhances the overallperformance and transferability of policies learned by offline RL algorithms.

Quick Read (beta)

loading the full paper ...