Abstract
Cultural accumulation drives the open-ended and diverse progress incapabilities spanning human history. It builds an expanding body of knowledgeand skills by combining individual exploration with inter-generationalinformation transmission. Despite its widespread success among humans, thecapacity for artificial learning agents to accumulate culture remainsunder-explored. In particular, approaches to reinforcement learning typicallystrive for improvements over only a single lifetime. Generational algorithmsthat do exist fail to capture the open-ended, emergent nature of culturalaccumulation, which allows individuals to trade-off innovation and imitation.Building on the previously demonstrated ability for reinforcement learningagents to perform social learning, we find that training setups which balancethis with independent learning give rise to cultural accumulation. Theseaccumulating agents outperform those trained for a single lifetime with thesame cumulative experience. We explore this accumulation by constructing twomodels under two distinct notions of a generation: episodic generations, inwhich accumulation occurs via in-context learning and train-time generations,in which accumulation occurs via in-weights learning. In-context and in-weightscultural accumulation can be interpreted as analogous to knowledge and skillaccumulation, respectively. To the best of our knowledge, this work is thefirst to present general models that achieve emergent cultural accumulation inreinforcement learning, opening up new avenues towards more open-ended learningsystems, as well as presenting new opportunities for modelling human culture.