Symbolic Regression Methods for Reinforcement Learning

Abstract

Reinforcement learning algorithms can solve dynamic decision-making andoptimal control problems. With continuous-valued state and input variables,reinforcement learning algorithms must rely on function approximators torepresent the value function and policy mappings. Commonly used numericalapproximators, such as neural networks or basis function expansions, have twomain drawbacks: they are black-box models offering little insight into themappings learned, and they require extensive trial and error tuning of theirhyper-parameters. In this paper, we propose a new approach to constructingsmooth value functions in the form of analytic expressions by using symbolicregression. We introduce three off-line methods for finding value functionsbased on a state-transition model: symbolic value iteration, symbolic policyiteration, and a direct solution of the Bellman equation. The methods areillustrated on four nonlinear control problems: velocity control underfriction, one-link and two-link pendulum swing-up, and magnetic manipulation.The results show that the value functions yield well-performing policies andare compact, mathematically tractable, and easy to plug into other algorithms.This makes them potentially suitable for further analysis of the closed-loopsystem. A comparison with an alternative approach using neural networks showsthat our method outperforms the neural network-based one.

Quick Read (beta)

loading the full paper ...