Abstract
In the past few years, a considerable amount of research has been dedicatedto the exploitation of previous learning experiences and the design of Few-shotand Meta Learning approaches, in problem domains ranging from Computer Visionto Reinforcement Learning based control. A notable exception, where to the bestof our knowledge, little to no effort has been made in this direction isQuality-Diversity (QD) optimisation. QD methods have been shown to be effectivetools in dealing with deceptive minima and sparse rewards in ReinforcementLearning. However, they remain costly due to their reliance on inherentlysample inefficient evolutionary processes. We show that, given examples from atask distribution, information about the paths taken by optimisation inparameter space can be leveraged to build a prior population, which when usedto initialise QD methods in unseen environments, allows for few-shotadaptation. Our proposed method does not require backpropagation. It is simpleto implement and scale, and furthermore, it is agnostic to the underlyingmodels that are being trained. Experiments carried in both sparse and densereward settings using robotic manipulation and navigation benchmarks show thatit considerably reduces the number of generations that are required for QDoptimisation in these environments.