Deep reinforcement learning on Atari games maps pixel directly to actions;internally, the deep neural network bears the responsibility of both extractinguseful information and making decisions based on it. Aiming at devoting entiredeep networks to decision making alone, we propose a new method for learningpolicies and compact state representations separately but simultaneously forpolicy approximation in reinforcement learning. State representations aregenerated by a novel algorithm based on Vector Quantization and Sparse Coding,trained online along with the network, and capable of growing its dictionarysize over time. We also introduce new techniques allowing both the neuralnetwork and the evolution strategy to cope with varying dimensions. Thisenables networks of only 6 to 18 neurons to learn to play a selection of Atarigames with performance comparable---and occasionally superior---tostate-of-the-art techniques using evolution strategies on deep networks twoorders of magnitude larger.