Abstract
Offline reinforcement learning (RL) has progressed with return-conditionedsupervised learning (RCSL), but its lack of stitching ability remains alimitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), whicheffectively combines the stability of RCSL with the stitching capability of$Q$-functions. By analyzing $Q$-function over-generalization, which impairsstable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss functionbased on trajectory return. Empirical results show that QCS significantlyoutperforms RCSL and value-based methods, consistently achieving or exceedingthe maximum trajectory returns across diverse offline RL benchmarks.
Quick Read (beta)
loading the full paper ...