Abstract
We present Gradient Boosting Reinforcement Learning (GBRL), a framework thatadapts the strengths of gradient boosting trees (GBT) to reinforcement learning(RL) tasks. While neural networks (NNs) have become the de facto choice for RL,they face significant challenges with structured and categorical features andtend to generalize poorly to out-of-distribution samples. These are challengesfor which GBTs have traditionally excelled in supervised learning. However,GBT's application in RL has been limited. The design of traditional GBTlibraries is optimized for static datasets with fixed labels, making themincompatible with RL's dynamic nature, where both state distributions andreward signals evolve during training. GBRL overcomes this limitation bycontinuously interleaving tree construction with environment interaction.Through extensive experiments, we demonstrate that GBRL outperforms NNs indomains with structured observations and categorical features while maintainingcompetitive performance on standard continuous control benchmarks. Like itssupervised learning counterpart, GBRL demonstrates superior robustness toout-of-distribution samples and better handles irregular state-actionrelationships.