On the Expressivity of Neural Networks for Deep Reinforcement Learning

Abstract

We compare the model-free reinforcement learning with the model-basedapproaches through the lens of the expressive power of neural networks forpolicies, $Q$-functions, and dynamics. We show, theoretically and empirically,that even for one-dimensional continuous state space, there are many MDPs whoseoptimal $Q$-functions and policies are much more complex than the dynamics. Wehypothesize many real-world MDPs also have a similar property. For these MDPs,model-based planning is a favorable algorithm, because the resulting policiescan approximate the optimal policy significantly better than a neural networkparameterization can, and model-free or model-based policy optimization rely onpolicy parameterization. Motivated by the theory, we apply a simple multi-stepmodel-based bootstrapping planner (BOOTS) to bootstrap a weak $Q$-function intoa stronger policy. Empirical results show that applying BOOTS on top ofmodel-based or model-free policy optimization algorithms at the test timeimproves the performance on MuJoCo benchmark tasks.

Quick Read (beta)

loading the full paper ...