Unsupervised Basis Function Adaptation for Reinforcement Learning

Abstract

When using reinforcement learning (RL) algorithms it is common, given a largestate space, to introduce some form of approximation architecture for the valuefunction (VF). The exact form of this architecture can have a significanteffect on an agent's performance, however, and determining a suitableapproximation architecture can often be a highly complex task. Consequentlythere is currently interest among researchers in the potential for allowing RLalgorithms to adaptively generate (i.e. to learn) approximation architectures.One relatively unexplored method of adapting approximation architecturesinvolves using feedback regarding the frequency with which an agent has visitedcertain states to guide which areas of the state space to approximate withgreater detail. In this article we will: (a) informally discuss the potentialadvantages offered by such methods; (b) introduce a new algorithm based on suchmethods which adapts a state aggregation approximation architecture on-line andis designed for use in conjunction with SARSA; (c) provide theoretical results,in a policy evaluation setting, regarding this particular algorithm'scomplexity, convergence properties and potential to reduce VF error; andfinally (d) test experimentally the extent to which this algorithm can improveperformance given a number of different test problems. Taken together ourresults suggest that our algorithm (and potentially such methods moregenerally) can provide a versatile and computationally lightweight means ofsignificantly boosting RL performance given suitable conditions which arecommonly encountered in practice.

Quick Read (beta)

loading the full paper ...