Abstract
Model-Free Reinforcement Learning (RL) algorithms either learn how to mapstates to expected rewards or search for policies that can maximize a certainperformance function. Model-Based algorithms instead, aim to learn anapproximation of the underlying model of the RL environment and then use it incombination with planning algorithms. Upside-Down Reinforcement Learning (UDRL)is a novel learning paradigm that aims to learn how to predict actions fromstates and desired commands. This task is formulated as a Supervised Learningproblem and has successfully been tackled by Neural Networks (NNs). In thispaper, we investigate whether function approximation algorithms other than NNscan also be used within a UDRL framework. Our experiments, performed overseveral popular optimal control benchmarks, show that tree-based methods likeRandom Forests and Extremely Randomized Trees can perform just as well as NNswith the significant benefit of resulting in policies that are inherently moreinterpretable than NNs, therefore paving the way for more transparent, safe,and robust RL.