Biologically inspired architectures for sample-efficient deep reinforcement learning

Abstract

Deep reinforcement learning requires a heavy price in terms of sampleefficiency and overparameterization in the neural networks used for functionapproximation. In this work, we use tensor factorization in order to learn morecompact representation for reinforcement learning policies. We show empiricallythat in the low-data regime, it is possible to learn online policies with 2 to10 times less total coefficients, with little to no loss of performance. Wealso leverage progress in second order optimization, and use the theory ofwavelet scattering to further reduce the number of learned coefficients, byforegoing learning the topmost convolutional layer filters altogether. Weevaluate our results on the Atari suite against recent baseline algorithms thatrepresent the state-of-the-art in data efficiency, and get comparable resultswith an order of magnitude gain in weight parsimony.

Quick Read (beta)

loading the full paper ...