We propose a novel reinforcement learning algorithm,QD-RL, that incorporatesthe strengths of off-policy RL algorithms into Quality Diversity (QD)approaches. Quality-Diversity methods contribute structural biases bydecoupling the search for diversity from the search for high return, resultingin efficient management of the exploration-exploitation trade-off. However,these approaches generally suffer from sample inefficiency as they call uponevolutionary techniques. QD-RL removes this limitation by relying on off-policyRL algorithms. More precisely, we train a population of off-policy deep RLagents to simultaneously maximize diversity inside the population and thereturn of the agents. QD-RL selects agents from the diversity-return ParetoFront, resulting in stable and efficient population updates. Our experiments onthe Ant-Maze environment show that QD-RL can solve challenging exploration andcontrol problems with deceptive rewards while being more than 15 times moresample efficient than its evolutionary counterparts.