Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

Abstract

In this paper we discuss policy iteration methods for approximate solution ofa finite-state discounted Markov decision problem, with a focus onfeature-based aggregation methods and their connection with deep reinforcementlearning schemes. We introduce features of the states of the original problem,and we formulate a smaller "aggregate" Markov decision problem, whose statesrelate to the features. The optimal cost function of the aggregate problem, anonlinear function of the features, serves as an architecture for approximationin value space of the optimal cost function or the cost functions of policiesof the original problem. We discuss properties and possible implementations ofthis type of aggregation, including a new approach to approximate policyiteration. In this approach the policy improvement operation combinesfeature-based aggregation with reinforcement learning based on deep neuralnetworks, which is used to obtain the needed features. We argue that the costfunction of a policy may be approximated much more accurately by the nonlinearfunction of the features provided by aggregation, than by the linear functionof the features provided by deep reinforcement learning, thereby potentiallyleading to more effective policy improvement.

Quick Read (beta)

loading the full paper ...